# Read-interpretability-claims — Reading checklist

Source: Black Box Notes, Methodology §02. https://blackboxnotes.com/methodology/read-interpretability-claims/
Licence: CC-BY 4.0.
Version: 2026-05-22.

For each interpretability claim the publication encounters, answer the eight questions
below. A claim that fails questions 1, 2, 3, 4, or 6 is generally not strong enough to
support audit-relevant conclusions.

## 1. What method, specifically?
- [ ] Method named with specificity (not "interpretability" as a category)
- [ ] Method paper / artefact referenced

## 2. At what scale?
- [ ] Scale of demonstration recorded (parameter count, training corpus, deployment scale)
- [ ] Scale of application recorded
- [ ] Extrapolation between the two justified, or flagged

## 3. On which inputs?
- [ ] Input distribution characterised
- [ ] Distribution shift to production noted
- [ ] Curated-vs-production gap explicit

## 4. With what failure rate?
- [ ] Failure regime named
- [ ] Calibration on the deployment distribution reported

## 5. Reproducible by an independent observer?
- [ ] Artefact published? Y / N
- [ ] If proprietary, alternate verification path noted
- [ ] If unverifiable, claim treated as internal-practice claim only

## 6. Demonstrated by intervention, or merely by correlation?
- [ ] Intervention experiment present? Y / N
- [ ] If N: claim flagged as correlational

## 7. What is the explanatory ceiling?
- [ ] Mechanism-identifiable vs behaviour-explained boundary explicit
- [ ] Training-data origins of mechanism acknowledged

## 8. Who benefits if this claim is believed?
- [ ] Source of the claim disclosed
- [ ] Vendor / institutional incentive noted in piece

## Composite reading
- [ ] Claim survives questions 1-4 minimum
- [ ] Claim survives questions 5-7 for audit-relevant use
- [ ] Question 8 addressed in piece's framing
