Senior Site Reliability Engineer - Reading, UK

Reliability Engineer in Reading, ENG

Posted 2019-06-11

Forcepoint is looking for a Site Reliability Engineer to join the Cloud Operations SRE team. This is a unique opportunity to join a newly formed team who will focus on world-class monitoring/alerting, platform performance, availability, reliability, and capacity planning. The right candidate will have a software development mindset and will automate as much as possible to avoid repetitive tasks. The individual will work closely with Engineering teams to optimize the deployment and monitoring of mission critical, customer-facing systems across private and public cloud environments.

The successful candidate is customer focused, a self-starter, able and willing to work with geo-dispersed teams. This role will also be responsible for mentoring less-experienced staff.


Monitor and debug issues across the platforms (applications, networks, databases)
Administer, maintain, automate systems to ensure reliability, resiliency, scalability, and security
Deploy, maintain, and enhance monitoring solutions and provide technical resolutions and root cause analysis for high severity incidents
Work closely with Engineering and Software Development teams to design, deploy, and operate components/services that are automated, resilient, and scalable
Create, update, and maintain documentation for all configurations for the production environment
Develop and deliver timely reports on service metrics including but not limited to availability, capacity, performance, and latency across all production systems

Skills & Qualifications

Bachelor’s Degree in Computer Science or equivalent experience related to Information Technology
3+ years’ experience as an Engineer managing IaaS / PaaS environments in Public Cloud
Demonstrated experience managing Linux (RHEL/CentOS) platforms at scale
Experience with configuration and automation toolsets such as Ansible, Terraform and Puppet
Hands-on experience with monitoring platforms such as Graphite, Prometheus, Data Dog, and CloudWatch
Knowledge of local and wide-area networks
Demonstrated experience with scripting languages such as Bash and Python
Solid understanding of incident management, change management, and problem management
A strong problem-solving mindset with a focus on creating automated solutions

Nice to Have
Certifications for Public Cloud (AWS/AZURE)

Ready to be Seen?

Apply now to have the opportunity to be considered for similar jobs at leading companies in the Seen network for FREE.

Be Seen in a new Reliability Engineer job

Skip the search

Zero stress and one profile that can connect you directly to 1000s of companies.

Best-fit jobs—for you

We’ll take it from there. After you tell us what you’re looking for, we’ll show you off to matches.

Free Career Coaching

Boost your interview skills, map your tech career and seal the deal with 1:1 career coaching.

You get tech. We get you.

Join now and Be Seen.