S&P Global

Site Reliability Engineer - Charlottesville, VA or Remote

Reliability Engineer in Remote , Charlottesville, VA

Posted 2019-11-11

Site Reliability Engineering (SRE) is an engineering discipline that draws from software and systems engineering to define, measure and achieve reliability objectives. SRE embraces DevOps philosophies and leverages custom code, automation, tooling, support processes and service management frameworks to achieve reliability objectives. The SRE mindset considers reliability a first-class feature of any service and prioritizes engineering and automation over manual intervention.

S&P Global's Site Reliability Engineering teams are responsible for keeping our products and services available to customers and employees located around the world. We achieve this through software, system and process engineering to maintain service level objectives, limit human intervention and minimize the level of effort associated with support (a.k.a. "toil"). SRE teams at S&P Global are generally responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response and capacity planning of our products and services.

Our SRE teams value:
End-user focus: As engineers and consumers of services, we deeply value the quality of our users' experience. We recognize that a solution is only as good as the quality of service it provides.
Passion for coding and automation: We leverage technology to improve reliability and make our lives easier. We are experienced problem-solvers and are proficient in scripting and programming languages. We look for people who enjoy problem-solving, writing code and exploring automation.
Curiosity: We compulsively search for the underlying cause of issues and ways to improve reliability.
Honesty: We value honesty and transparency over placing blame. We promote a blameless culture throughout the organization.

About You
You have 5+ years of experience in software or systems engineering.
You have experience monitoring, supporting and tuning a production application stack.
You value your time and have experience with scripting and automation frameworks.
You want to support full-stack solutions, including applications, servers, networks, data pipelines and data platforms.
You have excellent troubleshooting skills.
You demonstrate an objective, data-driven approach to problem-solving.
You demonstrate excellent collaboration and communication skills.
You take a practical and iterative approach to improvement, making small changes and testing for effect.
You have experience working across silos in change-controlled environment.
You have experience working with a globally-distributed workforce.
You have experience with cloud hosting technologies (E.g., AWS, Azure, Google).
You may have some experience with containerization platforms (Docker, Kubernetes)

Develop, maintain and report on Service Level Objectives (SLOs).
Develop and support monitoring and automation to defend SLOs.
Resolve Incidents (outages and service disruptions), including participation in on-call rotations.
Perform root cause analysis and formal postmortem write-ups for service disruptions.
Perform capacity planning to assure future reliability and efficiency as utilization grows.
Develop and test disaster recovery plans.
Implement changes and support releases in a controlled environment.
Develop and maintain runbooks, share knowledge and cross-train members of SRE and Development teams.
Consult with Development teams during service design and in advance of releases.
Conduct production readiness reviews to ensure services meet SRE onboarding requirements.

Bachelor's degree or higher in computer science, math, engineering or related disciplines.
AWS technical certifications helpful.

Technologies Leveraged
AWS, VMWare, f5 Big-IP, HAProxy, Windows Server, Linux, IIS, Apache HTTP Server, SQL Server, Oracle, MySQL, Apache NiFi, .NET, Javascript, Python, Powershell, Perl, redis, Memcached, Kafka, Active Directory, Elasticsearch, Logstash, Kibana, Google Analytics, AppDynamics, Solarwinds, DataDog, Prometheus, Graphana, Azure DevOps, Visual Studio, ServiceNow, Kubernetes, Docker, git, Selenium, Jenkins, Ansible

Ready to be seen?

Apply now to have the opportunity to be considered for similar jobs at leading companies in the Seen network for FREE.

Company summary

At S&P Global, we don’t give you intelligence—we give you essential intelligence. The essential intelligence you need to make decisions with conviction. Together, the divisions of S&P Global -- including S&P Global Ratings, S&P Global Market Intelligence, S&P Dow Jones Indices, and S&P Global Platts -- are the foremost providers of essential intelligence for the capital and commodities markets.

Whether it's analyzing data, innovating technology, or developing market insights, we're always looking for intellectually curious, progressive and passionate thinkers to join our team. With over 20,000 employees working across 95 offices around the globe, the diverse and vibrant community at S&P Global is unlike one you’ll find anywhere else.


401K matching; 4 weeks PTO; paternal leave; Tuition refund program; scholarship program for your children; commuter benefits; various discounts on products and services; gym membership discount.

Tech Stack

AWS, DevOps, Azure, Kubernetes, Ansible, Git, Jenkins, Python, Linux, C/C++, Java, C#, ASP.NET, Scala, RESTful APIs

Be seen in a new Reliability Engineer job

Skip the search

Zero stress and one profile that can connect you directly to 1000s of companies.

Best-fit jobs—for you

We’ll take it from there. After you tell us what you’re looking for, we’ll show you off to matches.

Free Career Coaching

Boost your interview skills, map your tech career and seal the deal with 1:1 career coaching.

You get tech. We get you.

Join now and be seen.