Search

Site Reliability Engineer

Optomi
locationOrlando, FL 32885, USA
PublishedPublished: 6/14/2022
Technology
Full Time

Job Description

Site Reliability Engineer - (Hybrid, Orlando FL)

Optomi, in partnership with a leading enterprise organization, is seeking a Site Reliability Engineer (SRE) to join a cloud-focused engineering team supporting large-scale, customer-facing systems. This role requires onsite presence two days per week in Orlando, FL. The ideal candidate is a strong cloud engineer with AWS expertise, hands-on Terraform experience, solid scripting skills, and the confidence to communicate clearly with stakeholders and executive leadership during high-pressure situations.

What the Right Candidate Will Enjoy!

  • Working in a modern cloud environment with primary focus on AWS and exposure to GCP and Azure!
  • Supporting enterprise-scale systems with real business impact!
  • Participating in incident bridge calls and collaborating directly with leadership!
  • Maintaining and improving existing Infrastructure as Code environments!
  • Joining a small, highly collaborative SRE/DevOps-focused team!
  • Having autonomy, trust, and visibility while contributing to critical initiatives!

Experience of the Right Candidate:

  • Strong hands-on experience supporting AWS cloud environments.
  • Experience working with GCP and/or Azure in an enterprise setting.
  • Hands-on experience maintaining and modifying existing Terraform infrastructure.
  • Comfortable scripting and troubleshooting code-related issues (Python, Bash, Node.js, or similar).
  • Experience using monitoring and observability tools such as Splunk, CloudWatch, Grafana, or AppDynamics.
  • Ability to clearly communicate technical issues to both technical and non-technical audiences.
  • Confidence speaking on calls with large groups, including stakeholders and leadership.
  • Experience working in on-call or incident-response environments.

Responsibilities of the Right Candidate:

  • Maintain, support, and optimize cloud infrastructure across AWS, GCP, and Azure environments.
  • Work with existing Terraform and Atlantis configurations to support infrastructure needs.
  • Troubleshoot infrastructure, application, and CI/CD-related issues.
  • Participate in incident bridge calls and provide clear status updates to leadership.
  • Support load balancers, containerized workloads, and cloud-native services.
  • Collaborate with application teams to identify whether issues are infrastructure- or code-related.
  • Utilize monitoring and alerting tools to ensure system performance and reliability.
  • Communicate effectively with engineers, stakeholders, and executives during incidents and projects.

Monitoring, Tooling & Cloud Exposure:

  • AWS services including EC2, ECS, EKS, Fargate, Lambda, API Gateway, S3, ALB/ELB, VPC, IAM, and KMS.
  • Google Cloud Platform services including App Engine, Kubernetes, Cloud Functions, and IAM.
  • Infrastructure as Code using Terraform (existing configurations).
  • Monitoring and observability tools including Splunk, CloudWatch, Grafana, and AppDynamics.
  • Configuration and automation tools such as Chef, Ansible, Rundeck, and Vault.
  • Message queuing technologies including RabbitMQ and Pub/Sub.

Preferred Qualifications:

  • Experience supporting load balancers and high-traffic systems.
  • Background in SRE or DevOps-oriented teams.
  • Experience working in hybrid cloud and on-prem environments.
  • Strong Linux or Windows systems administration background.
  • Enterprise experience supporting customer-facing applications.
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...