External Resources

Incident Responsearrow-up-right / External Resources

This is a collection of external resources that may be useful for learning more about elements of incident response. Please feel free to submit PRs to add new resources if you find something particularly interesting.

Incident Response Procedures

Articles

  • PagerDuty Incident Response Guidearrow-up-right

    This is the full (slightly sanitized) version of PagerDuty's internal

    incident response documentation, and it is very comprehensive. It is

    an excellent resource for seeing how to apply our general principles

    to a specific service.

  • Remote Incident Responsearrow-up-right

    This article by Ryan Frantz with help from Dr. Laura Maguire discusses

    the unique challenges of dealing with incident response with a

    distributed team.

Talks and Videos

Books

Incident Retrospectives (aka Postmortems)

Articles

  • Each Necessary, But Only Jointly Sufficientarrow-up-right

    This 2012 blog post from John Allspaw provides a short description of

    why the idea of a "root cause" is a fundamentally flawed idea, and why

    learning must be the driving force behind incident analysis, not fixing.

  • Etsy Debriefing Facilitation Guidearrow-up-right

    This is the incident retrospective guide used by Etsy and open-sourced

    in 2016; it's an excellent resource for conducting your own debriefings

    and the basis for a lot of similar guides throughout the industry.

  • The Infinite Howsarrow-up-right

    This article by John Allspaw talks about the issues with the

    commonly used "Five Whys" system of incident analysis, and does an

    excellent job providing an overview of an alternative approach.

Talks and Videos

  • Incidents As We Imagine Them Versus How They Actually Arearrow-up-right

    This is a talk by John Allspaw at PagerDuty Summit 2018 which is an

    excellent summary of the thorny issues around doing incident response

    and how what actually happened often gets oversimplified in a desire

    to make incidents fit in standardized boxes. If you watch nothing else

    about incident analysis, watch this.

  • Who Destroyed Three Mile Island?arrow-up-right

    This talk by Nickolas Means at LeadDev Austin 2018 talks about the 1979

    Three Mile Island disaster and is an excellent walkthrough of the

    difference between first stories and second stories, and the dangers of

    hindsight and outcome bias.

Books

Last updated