Lead Site Reliability Engineer - Incident Commander

| Hybrid
Sorry, this job was removed at 11:33 a.m. (EST) on Saturday, March 28, 2020
Find out who's hiring in Greater NYC Area.
See all Developer + Engineer jobs in Greater NYC Area
Apply
By clicking Apply Now you agree to share your profile information with the hiring company.

SUMMARY

Our industry is starting to go through a transformational shift and we intend to lead it. As talent becomes the main differentiator between failure and success, organizations must attract, engage and develop their people more than ever. To do so, they need powerful and sophisticated tools, which take the pain out of HR management and empower employees & people leaders. That's where we come in.

Lifion by ADP is expanding our startup style operation in NYC in order to accelerate new technical innovation across UI, Search, Platform Technology, IaaS, Big Data, Social, etc. The concept and vision behind the strategy is "Innovate like a Startup" with the goal of delivering highly automated, intelligent and predictive solutions to the market. Our goal is to have specialized teams of superstars focused in these areas to keep pace with market trends and quickly incubate and deliver capabilities that dramatically increase the value of our solutions for clients.

The incident commander is responsible for managing incident to its resolution as quickly as possible, coordinating with teams, communicating outward and planning next steps. During a major outage or incident the IC must make decisions, delegate to appropriate teams, and create multiple backup plans in order to minimize the time to resolution.

The IC will have superb listening and delegation skills. Deferring tasks to appropriate teams and listening to their expertise as input for next steps. This person must be able to weigh alternatives and keep options for multiple paths to avoid delay in moving the restoration effort forward.

The Incident Manager is also responsible for keeping a clear communication line to senior stakeholders and those not immediately in the triage effort. Additionally the IC will work with the teams to document and analyze the issue post-mortem to prevent future incidents. 

REQUIREMENTS

  • Excellent communication skills, both verbal and written. 
  • A high-level knowledge of incident management best practices and systems
  • Problem-solving skills
  • The ability to make quick, confident decisions
  • Listening and synthesis skills
  • Previous experience with major incidents (either as a participant or an observer)
  • Leadership skills—the ability to take command in a high-stress situation
  • Solve problems relating to mission critical services and create solutions to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.
  • Understand the operational complexity of a microservice architecture
  • Increasing efficiency by identifying and addressing performance bottlenecks
  • Define, track, review and report on Service Level Objectives (SLOs), Service Level Indicators (SLIs), System Availability, and the progress and outcomes related to reliability initiatives.
  • Capable of decision making and Leadership without oversight. As well as influencing others without hierarchy (both upwards and laterally)
  • Ability to manage incidents and keep everyone calm and focused on solving issues. Removing folks who distract the immediate service restoration. This should be true regardless of the level of person causing distraction. 
  • Planning backups, rollbacks, and next steps before and during an incident. 

PREFERRED QUALIFICATIONS

  • At least 5 years combined of experience in software engineering and automated test engineering
  • Fluency in one or more languages, such as Go, Javascript, or Python
  • Strong production experience with cloud native services (AWS, Azure, GCP)
  • Familiarity with Git SCM and one or more repository managers such as Github, Gitlab, Stash, Bitbucket, or Gerrit.
  • Experience with lightweight development methodologies such as Agile - Scrum and / or Kanban
Read Full Job Description
Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.

Location

We have multiple NYC locations, each in easy commuting distance to the subways and metro transportation. Each location is just steps away from shopping, galleries, coffee, and great food!

Similar Jobs

Apply Now
By clicking Apply Now you agree to share your profile information with the hiring company.
Learn more about ADPFind similar jobs