Job description
| Senior Site Reliability Engineer Start: ASAP Duration: 6 months + Location: 3-days per week in central London Rate: negotiable DoE, inside IR35 We’re seeking a skilled Site Reliability Engineer to join a high-performing technology team within a leading professional services environment. You’ll design and maintain reliable, secure, and scalable cloud infrastructure while driving automation and DevOps best practices. Key Responsibilities - Build and manage CI/CD pipelines, release automation, and infrastructure as code (IaC). - Develop resilient cloud environments and optimize monitoring, alerting, and performance. - Maintain observability tools (Prometheus, Grafana, Datadog, Splunk, etc.). - Manage incident response, troubleshooting, and root cause analysis. - Collaborate with teams to improve reliability, scalability, and deployment efficiency. - Document and standardize technical processes and runbooks. Skills & Experience - 4+ years in SRE, DevOps, or related roles. - Advanced Kubernetes (EKS, GKE, AKS, or RKE); proficient with Kubectl and Helm. - Strong containerization (Docker, microservices with Java/Spring Boot). - CI/CD tools: Jenkins, GitHub Actions, Azure DevOps, ArgoCD. - IaC: Terraform or Pulumi (module development preferred). - Observability & monitoring: Prometheus/Grafana, Datadog, OpsGenie, or similar. - Familiar with Git workflows, Python/Go scripting, and security tools (Vault, Qualys, etc.). |