Senior Site Reliability Engineer - MoJ - G7
Government Digital & Data -
This is a great opportunity for an experienced Site Reliability Engineer (SRE), or an experienced DevOps Engineer looking to move into SRE space, to work as part of the HMPPS Digital Support Team, responsible for growing the operational maturity, quality and performance of our portfolio of Digital Services alongside our growing Prison and Probation digital product teams. The role is pivotal in continuing to provide a high quality service to our colleagues across HMPPS, and forms a key part of Prison and Probation digital delivery practice.
Prison and Probation digital services exist to create the tools that support HMPPS so that they can provide decent, safe and productive places to live and work, and support prisons and Probation to protect the public and reduce reoffending by rehabilitating the people in our care through education and employment. Our Live Services mission is to ensure that the HMPPS Digital’s underpinning technology and services operate for those accessing our digital services throughout the HMPPS estate.
This role aligns against the Senior DevOps engineer from the Government Digital and Data Framework.
To help picture your life at MoJ Justice Digital please take a look at our blog and our Digital and Technology strategy 2025
Key Responsibilities:
The Site Reliability Engineering team is responsible for the overall development of reliability engineering in Digital Prisons Services.
This will include being able to:
- help to identify and promote best practice in reliability engineering
- design, build and test systems and processes to support software development and deployment
- work in a multidisciplinary manner across the programme by working with developers, technical architects, product managers and others, to provide robust, resilient and scalable platforms
- help to ensure the programme has the right processes in place, including identifying and measuring important metrics to drive continual improvement
- work with colleagues on identification of technical risks in relation to the infrastructure, as well as plans to resolve or mitigate the risks
- communicate concerns, risks and issues with the broader team and senior management
- prioritise and deliver recommendations and improvements in response to incident reviews
- set an example for and encourage open, positive, and constructive communication both within the team and when communicating with other digital teams
- cultivate and maintain relationships with other teams within Justice Digital, the MoJ's Department responsible for Digital Services.
- collaborate effectively with other site reliability engineers and developers
- work with teams, Cyber Security and Information Assurance to ensure the ongoing integrity and security of our service and infrastructure
- provide coaching and mentoring to more junior colleagues and line manage a small group of less experienced SRE engineers
- help with hiring, taking part in recruitment of other SRE engineers and technical staff
If this feels like an exciting challenge, something you are enthusiastic about, and want to join our team please read on and apply!
Person Specification
We’re interested in people who:
- have experience of working with technologies that underpin digital services such as databases, web servers, DNS, CDNs, reverse proxies, message queues and load balancers
- have an understanding of version control (ideally with Git)
- are familiar with container orchestration technologies such as Kubernetes, ECS or Cloud Foundry; or serverless application design such as AWS Lambda
- have worked with public cloud providers such as Azure, AWS or Google Cloud in a production system
- have an understanding of SRE principles such as capacity planning, SLOs and SLIs and how to design and support resilient, large scale, high performance services in a production environment
- can deploy monitoring tools to ensure systems are appropriately monitored and instrumented to enable teams to identify and respond to operational issues quickly and effectively
- are familiar with at least one programming language (we mainly use Node.js, Java and Kotlin)
- have a strong preference for automation and experience of using Infrastructure as Code tools such as Terraform
- enjoy learning and helping others
Willingness to be assessed against the requirements for BPSS clearance.
The Civil Service is committed to attract, retain and invest in talent wherever it is found. To learn more please see the Civil Service People Plan and the Civil Service D&I Strategy.
Person specification
Please refer to attached Job Description