Job Details
Primary Function of the Position
Reporting to the Site Reliability Engineer Team Lead, the Site Reliability Engineer will be responsible for ensuring the reliability, scalability and performance of our systems.
The responsibilities include:
- Develop the Site Reliability Engineering culture across the team by applying best practices, approaches and code.
- Apply automation and propose/implement software to any tasks or parts of the system that would deliver benefit.
- Monitor application performance – identifying, and implementing, improvements to application performance and stability.
- Collaborate with the design and implementation of the desired pipelines and process for deployment to production environment.
- The SRE will work closely with Platform and Software domains to ensure continuous improvement of performance and stability whilst adhering to standards.
- Undertake ad-hoc projects and other activities as required.
Key Accountabilities and Activities
1 |
Contribute to the SRE function including:
|
2 |
Integration with Domains including:
|
3 |
Liaise and support other teams on work items including:
|
4 |
Build and guide successful SRE efforts including:
|
5 |
Undertake ad-hoc projects and other activities as required. |
Experience and Skills
Essential
- Experience and demonstratable knowledge of SRE best practices
- Expert in Git and Gitops
- Expert in logging and monitoring solutions (Prometheus, Grafana etc.)
- Demonstratable knowledge of Cloud
- Expert knowledge of Kubernetes
- Proficient ability to communicate in English (Written and Verbal)
- Understanding of non-functional testing
- Significant DevOps experience
Desirable
- Proven ability to work independently and collaboratively in a fast-paced technical environment.
- Demonstratable knowledge of the telecommunications industry and technologies.
- Proven experience and ability to provide support to direct reports.
- Golang skills and experience