Troubleshoot high severity e-commerce, infrastructure and legacy business applications/websites performance and availability issues and manages the incident lifecycle to resolutions.
Lead root cause analysis/investigations through identifying, analyzing and remediating service(s) performance and availability issues to ensure maximum service uptime and availability. Conducting Blameless Post Incident Review is expected.
Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health. You’re expected to be on- call and have strong written communication skills and be able to develop working relationships with coworkers.
Experience in balancing service reliability, metrics, sustainability, technical debt, and operational toil for live services running at scale.
Work across multiple project teams simultaneously to support rapid development efforts.
Solve complex, business critical issues that impact bottom line financial numbers and customer loyalty/experience.
Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
Contribute positively to open source projects developed by DSG and join existing communities. Navigate this broader ecosystem and structure projects with upstream/ downstream opportunities in mind.
Identify and integrate with third-party solutions where it makes the most sense.
Use data to understand the availability, reliability, and sustainability of our software.
Bring experience, pragmatism, empathy, and composure to interactions with teams outside of the RE organization.
Work frequently with Product teams on shared goals and cross-team projects.
Balance planned and reactive work using basic project planning techniques and technical roadmaps.
Work and collaborate across teams such Application services, Capacity Planning, Hardware, Network, and Datacenter Operations.
Participate in building advanced tooling for testing, monitoring, administration, and operations of multiple clusters across multiple environments.
Experience negotiating SLIs, SLOs, and SLAs with product owners.
3-5+ years of applying reliability engineering principals to distributed services.
Understanding of and comfort with the GNU/Linux operating system.
Proficiency in high-level languages such as Ruby, Python, and Bash.
Exposure to system-level languages such as Go, C/C++.
Familiarity with configuration management software such as Puppet, Chef, Ansible, or Salt.
Source control, branching, & merging: git/svn/etc (Repository Management)
Networking basics: TCP vs UDP, basic troubleshooting, HTTP – load balancing, firewall, private networks, multi-tier design, scale-out, persistent data
Databases – at a minimum understands the basics – select/insert
Familiarity with standard infrastructure concepts like load balancers, firewalls, object storage and where/when they might be used.
Service Management – Incident Response, Change, and Problem Management.
Experience with Kubernetes and Docker.
Cloud computing concepts (not necessarily provider specific) – VMs vs Docker Containers, block storage vs object storage, infra automation vs install automation.
Experience operating a platform, software as a service, or shipping software.
Experience as an open-source contributor.
Intellectual curiosity, problem solving and openness is key to its success. Mindset for solving production systems issues and understanding root cause while providing “Detective work” and automating away toil – doesn’t like boring repetitive tasks. Enjoys digging into new problems.
Knows when to ask for help and when to dig more on their own
Can work on different tasks in different systems week to week
Capable of driving and focusing on results given in some cases given an ill-defined problem, such as “this is slow”, and developing metrics and making measurable improvements
GENERAL TECHNICAL SUMMARY
Valuable Technologies Like: WebSphere Commerce, WebSphere eXtreme Scale, WebSphere Application Server, WebSphere Message Broker, WebSphere MQ, Order Management, Web Services, Tomcat, Apache, TCP, UDP, Load Balancers, (Repository Management git/svn/), Puppet, Chef, Ansible, Salt, VM, Dockers Containers
Valuable Methodologies Like: ITIL, Agile, , SCRUM, Reliability Engineering
Valuable Databases/OS Systems Like: Oracle, DB2, SQLServer, Windows, UNIX, Linux, SYSTEMi
Valuable Monitoring Tools Like: IBM Monitoring, SCOM,CA Spectrum, AppDynamics, Soasta, Foglight
Service Management Tools Like: Remedy, Service Now, Jira, Pivotal Tracker, Xmatters
Apply now to have the opportunity to be considered for similar jobs at leading companies in the Seen network for FREE.
Everything we do at DICK’S Sporting Goods is about being the leader in sports commerce. The way we will achieve this is by respecting the athlete behind every transaction and providing our knowledge, guidance and expertise.
That means making it easy for athletes to find the products that they want and need, and providing the tools and insights to help them improve their game. The technologies that we build are critical in accomplishing these goals through non-stop innovation and improvement.
3 O'Clock Summer Fridays
Pop-up events with athletes
Link to benefits: https://benefityourliferesources.com/
Onsite Daycare Opening Fall 2020
Any all - Java, .NET, Android, iOS, React, Angular, Cloud
We do our best to make our interview process as easy as possible. As part of our process you can expect to meet many of the leaders of the technology team, take a coding challenge and have the chance to visit our headquarters in Pittsburgh, PA
Zero stress and one profile that can connect you directly to 1000s of companies.
We’ll take it from there. After you tell us what you’re looking for, we’ll show you off to matches.
Boost your interview skills, map your tech career and seal the deal with 1:1 career coaching.
Join now and be seen.