Job Description
Site Reliability Engineer - (Hybrid, Orlando FL)
Optomi, in partnership with a leading enterprise organization, is seeking a Site Reliability Engineer (SRE) to join a cloud-focused engineering team supporting large-scale, customer-facing systems. This role requires onsite presence two days per week in Orlando, FL. The ideal candidate is a strong cloud engineer with AWS expertise, hands-on Terraform experience, solid scripting skills, and the confidence to communicate clearly with stakeholders and executive leadership during high-pressure situations.
What the Right Candidate Will Enjoy!
- Working in a modern cloud environment with primary focus on AWS and exposure to GCP and Azure!
- Supporting enterprise-scale systems with real business impact!
- Participating in incident bridge calls and collaborating directly with leadership!
- Maintaining and improving existing Infrastructure as Code environments!
- Joining a small, highly collaborative SRE/DevOps-focused team!
- Having autonomy, trust, and visibility while contributing to critical initiatives!
Experience of the Right Candidate:
- Strong hands-on experience supporting AWS cloud environments.
- Experience working with GCP and/or Azure in an enterprise setting.
- Hands-on experience maintaining and modifying existing Terraform infrastructure.
- Comfortable scripting and troubleshooting code-related issues (Python, Bash, Node.js, or similar).
- Experience using monitoring and observability tools such as Splunk, CloudWatch, Grafana, or AppDynamics.
- Ability to clearly communicate technical issues to both technical and non-technical audiences.
- Confidence speaking on calls with large groups, including stakeholders and leadership.
- Experience working in on-call or incident-response environments.
Responsibilities of the Right Candidate:
- Maintain, support, and optimize cloud infrastructure across AWS, GCP, and Azure environments.
- Work with existing Terraform and Atlantis configurations to support infrastructure needs.
- Troubleshoot infrastructure, application, and CI/CD-related issues.
- Participate in incident bridge calls and provide clear status updates to leadership.
- Support load balancers, containerized workloads, and cloud-native services.
- Collaborate with application teams to identify whether issues are infrastructure- or code-related.
- Utilize monitoring and alerting tools to ensure system performance and reliability.
- Communicate effectively with engineers, stakeholders, and executives during incidents and projects.
Monitoring, Tooling & Cloud Exposure:
- AWS services including EC2, ECS, EKS, Fargate, Lambda, API Gateway, S3, ALB/ELB, VPC, IAM, and KMS.
- Google Cloud Platform services including App Engine, Kubernetes, Cloud Functions, and IAM.
- Infrastructure as Code using Terraform (existing configurations).
- Monitoring and observability tools including Splunk, CloudWatch, Grafana, and AppDynamics.
- Configuration and automation tools such as Chef, Ansible, Rundeck, and Vault.
- Message queuing technologies including RabbitMQ and Pub/Sub.
Preferred Qualifications:
- Experience supporting load balancers and high-traffic systems.
- Background in SRE or DevOps-oriented teams.
- Experience working in hybrid cloud and on-prem environments.
- Strong Linux or Windows systems administration background.
- Enterprise experience supporting customer-facing applications.
