Senior Incident Manager

Job Type:Full Time

The vision of the Azure Production Infrastructure Engineering group is to make it easy for everyone to create, consume, and manage planetary-scale, reliable cloud production services and infrastructure to achieve more. As a team, we bring together significant and complementary capabilities with tooling, infrastructure, monitoring and insights in new ways to increase our perspective. Our diversity of knowledge and experience comes together for the benefit of our users, our colleagues, our business, and ourselves.

As part of Azure SRE charter, we are responsible for driving multiservice outages to resolution in a timely and effective manner through coordination of internal Azure service teams and key stakeholders, including Subject Matter Experts (SMEs) and Service(s) leaders. This requires excellent technical, analytical and problem-solving skills, ability to collaborate with varied stakeholders, and great written and spoken communication.

Apart from driving incidents to resolution, you are responsible for building and evolving the practice of Incident Management across Azure, using Post Incident Reviews, developing processes and systems to leverage the related metrics to identify and drive process and procedural improvements globally. Your work in this role will use cutting edge technologies and industry concepts to directly prevent millions of minutes of downtime for customers worldwide. You will also be expected to drive complex, multi-team projects that may result from incidents and crisis that you manage.


As an Incident Manager SRE, you will directly deliver impact to the Azure platform direction as part of a holistic, engineering driven response to emergent issues, including:

  • Engineer solutions to complex problems in a collaborate environment
  • Engineer solutions proactively and in response to issues which prevent customer, improve incident mitigation time, reduce toil and increase scalability
  • Participate in a global on-call rotation responsible for remediating the most critical outages impacting Azure customers (approximately 50% time on call expectation)
  • Mentor and assist in developing junior team members


Basic Qualifications:

  • Bachelor’s degree or equivalent work experience
  • Strong design, scripting, problem solving and debugging skills
  • Strong collaboration skills; working across teams and organizations is necessary to be successful
  • Experience managing complex projects spanning multiple teams and organizations
  • Executive presence and communications skills
  • Must be able to participate in a multi-location on-call rotation
  • Candidates must pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter

Preferred Qualifications:

  • Software Engineering/Development experience
  • Knowledge of Microsoft Azure, AWS, GCP or similar cloud computing platforms
  • Expertise building, delivering and supporting extensible, high scale service platforms
  • Expertise in debugging and remediating issues in large-scale distributed systems
  • Experience as an incident/crisis manager leading real time and post incident response to service outages
  • PMP, ITIL, Six Sigma with demonstrated application towards service improvement


Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings: Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.

Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.