Methodology · v1
How we analyze incidents
Every analysis on this site follows a documented method. This page is that documentation. If you read an incident and want to know why we grouped the evidence the way we did, this is the answer.
Frameworks we use
No single framework is sufficient for serious failure analysis. We combine four, each doing a different job.
Five Whys
Origin: Sakichi Toyoda, Toyota Production System. Useful for: descending past the first plausible cause into the structural conditions beneath it. Limit: tends to produce linear chains when real failures are rarely linear.
We use Five Whys as the opening move on every incident. It is a forcing function that prevents us from stopping at the most visible technical fact. We do not treat the result as the complete explanation.
Ishikawa (fishbone) diagrams
Origin: Kaoru Ishikawa, 1960s. Useful for: enumerating categories of contributing factors in parallel, not in sequence. The classical “6M” categorization is too narrow for our scope; we use a modified version with nine branches matching our taxonomy.
Swiss Cheese Model
Origin: James Reason, 1990. Useful for: distinguishing active failures (sharp end) from latent conditions (blunt end). Every incident with a human at the moment of failure has latent conditions well upstream that lined up with the active error. The model forces us to ask: which defenses existed, and which had holes?
CAST and STAMP
Origin: Nancy Leveson, MIT. Useful for: systems where the failure was emergent and no single component is the root cause. CAST (Causal Analysis based on Systems Theory) treats safety as a control problem: which control loops existed, and where did they fail to maintain the required constraints?
We use CAST-style analysis on our most complex incidents — Flash Crash, SVB, Fukushima, 737 MAX. On simpler incidents, it is overkill.
Sources policy
An analysis is only as credible as its sources. We hold ourselves to three rules.
Primary before secondary
We prefer official post-incident reports, regulatory filings, court records, and direct operator statements to journalism about those sources. Where we cite secondary sources, we link to the primary if it exists.
Cite everything factual
Every non-trivial factual claim carries a footnote. The reader should be able to trace any statement to its source in one click.
Disclose uncertainty
When sources conflict, we say so. When a fact is plausible but unconfirmed, we mark it. “We do not know” is a respectable sentence.
Taxonomy
All tags on the site come from the rootlogic taxonomy v1, a controlled vocabulary covering primary causes (9 categories), failure modes, blast radius, severity, domain, and contributing factors. The taxonomy is machine-readable and versioned. Every incident record pins the taxonomy version it was classified against.
Revisions
Incident analyses are revised when new information surfaces. Every incident page shows its last-updated date and links to a changelog. We do not silently edit incidents — substantive changes are disclosed.
If you believe an analysis is incorrect, please submit a correction. We take corrections seriously and publish the revision history.
The definitive description of our method is available as a downloadable PDF (methodology.pdf) and a plaintext file (methodology.txt) from this page’s header. Both are CC BY-SA 4.0.