Escalation Engineer

Last updated 21 minutes ago
Job Type:Full Time

  • Be a part of an organisation that puts the customer at the of every decision we make
  • Clear and defined career progression, training & certifications + much more
  • Subsidised Healthcare, Sign-on bonus, Stock and Shares, Paid commuting to work & mobile phone


    Amazon has built a reputation for excellence with a mission to be the earth’s most customer-centric company, a company that customers from all over the globe will recognize, value, and trust for both our products and our service. Amazon Web Services (AWS) is carrying on that tradition while leading the world in cloud technologies.

    The Escalation and Event Management (E2M) team is part of the broader AWS Support organisation and is dedicated to managing critical escalations, customer facing communications, and handling large-scale customer impacting events. E2M’s purpose is to drive operational excellence and improvements to the overall customer experience.


    E2M is looking for people who are detailed, analytical thinkers as well as creative problem solvers, with a strong bias for action. You are someone who is not constrained by the notion of “how things are usually done”, and you are equally comfortable operating in the minute detail, as well as with coordinating efforts at the forty thousand foot view. You confidently act as an advocate of your customer. You are comfortable working on highly technical initiatives to consistently improve the AWS customer experience. You are someone who excels at working in a dynamic environment while collaborating with some of the smartest people in the industry, and you get excited about owning critical infrastructure services that serve global customers, every second of the day!

    Finally, you are passionate about technology with a desire to learn more and do more with AWS.


    As members of the AWS Support Escalation & Event Management (E2M) team, we work to identify widespread and systemic customer facing problems for Amazon Web Services. We are responsible for monitoring internal tools to identify customers impacting issues. When a problem is identified, we ensure the appropriate parties are engaged to drive the resolution of the problem and act as an advocate of the customer to both report on and manage the customer experience. Because of our unique role as Escalation Engineers, we have front-and-center limitless exposure to all things AWS, including numerous leading edge technologies.

    Every day will bring new and exciting challenges that include elements of:
  • Real-time monitoring of telemetry and incoming alarms
  • Detect and respond to internal services experiencing customer impacting events
  • Provide critical incident response/management focused on customer communications for AWS Service Teams
  • Drive down mean time to engagement and communication for all incident types
  • Monitor and manage communications during high impact events via relevant channels
  • Facilitate Post-Mortem/Root Cause Analysis after each event to mitigate problem recurrence
  • Prioritize, manage and own issues impacting AWS customers from detection to resolution
  • Provide crisp and timely communication on developing issues to relevant stakeholders
  • Work with key stakeholders across AWS to improve the customer experience and develop mechanisms that support operational excellence
  • Analyze data trends on internal tickets, customer contacts, social media, and network monitors to identify potential issues
  • Build a broad understanding of AWS architecture and service inter-dependencies
  • Maintain composure in dynamic and high pressure situations
  • Other duties as required by the organization


  • 5+ years of experience with incident management for mission critical services
  • 5+ years of experience in Systems (Windows/Linux) operations and/or Networking with an emphasis on monitoring and alarming
  • 5+ years of experience building or supporting customer solutions in the cloud
  • 5+ years of experience leading and managing critical incident internal communications
  • Bachelor’s degree in Information Science / Information Technology, Computer Science, Engineering, Mathematics, Physics, or a related field (or 6+ years of relevant work experience)


Candidates that have been most successful after joining our team have demonstrated capabilities in one or more of these areas:
  • Industry specific accredited certification(s)
  • Experience with Python, Ruby, PERL, Node.js or shell scripting
  • Knowledge of ITIL/Lean Processes
  • Excellent written and oral English communication skills
  • Ability to review complex details regarding ongoing issues/events and convey the key details to senior stakeholders to facilitate real-time decision making
  • Effective prioritization and time management skills
  • Ability to work in ambiguous environments
  • Demonstrated critical thinking and logical problem solving skills
• Familiarity with AWS application architecture with a focus on high availability and fault tolerant design

Amazon Web Services is an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.