Skip to content

Site Reliability Engineer (DevOps) / Team Lead (Remote)

Job Category:

Information Tech

Job Level:


Open Date:




Close date:


Client Industry:

TECHNOLOGY (IT & IT peripherals)

Job Description and Qualification:


  • Plan and Create Operational Excellence for our Cloud Service Hosting environments and Kubernetes deployments. You will create the standards for Site Reliability Engineering which includes:
    • Automation Best Practices using various tools necessary for the objectives (i.e. Ansible, Terraform, Packer, etc.). Utilize Event-driven automation such as StackStorm, etc. needed to reduce toil and reactive operations
    • Observability Toolstack (Monitoring, Logging and Alerting)
      • Setup best practice, configuration and deployment blueprints for Observability like ELKStack, Prometheus, etc.
    • Provide best practices for GitOps methodology including improvement in our Aytra GitOps process
    • Provide DevSecOps implementation plan to embed security checks in the CI/CD processes
  • Optimize on-call rotations and processes
  • Improve reliability, quality, and time-to-market of our suite of software solutions
  • Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
  • Provide primary operational support and engineering for multiple large distributed software applications
  • Provide L3 support for customers that deals with Kubernetes and company's Cloud Hosting Services
  • Create a SRE Turn-key Toolstack as part of our deployments which can be offered to our Customers
  • Provide Mentorship to fellow colleagues and SRE team across the different locations via remote conference meetings

Minimum Requirements:

  • 5+ Years experience in automation and DevOps/SRE practice dealing with infrastructure, configuration, deployment and operational efficiencies (observability, monitoring and logging)
  • 3+ Years experience in CI/CD and GitOps methodology
  • 3+ Years Experience with Docker and Kubernetes dealing with building, provisioning, configuration as well as the operational aspect of the deployment
  • Knowledge and experience providing day 1 (build-configure-deploy) and day 2 (Ops) activities from development to production
  • Must have hands-on experience with the ability to lead and provide mentorship
  • 8+ Years experience with Linux Operating System
  • 3+ Years experience dealing with Public and Private Cloud Environments