Black Box Notes

On opacity, auditability, and the limits of trust in modern AI systems.

Methodology · 01

How we audit an agentic stack

The publication's standing walk-through for an end-to-end audit of an agentic system in production. Six phases, the documents we ask for at each, the failure modes that recur, and the downloadable checklist a working auditor can copy into their own engagement.

What we mean by "audit"

For the purposes of this page, an audit of an agentic stack is the process by which a party other than the system's operator establishes a defensible account of what the system has been doing, how it does it, what it does poorly, and under what conditions it has produced consequential outputs that the operator cannot explain. The audit produces a written report. The report's claims rest on artefacts the operator has produced, on traces the system itself has emitted, and on direct testing the auditor has conducted under controlled conditions.

Most of what is called an "AI audit" in 2026 is not this. Most published audits are questionnaire-style attestations against a published framework — useful for procurement and inadequate for the harder questions. The walk-through below is for the harder engagements. Where a regulator's expectation matches our walk-through we say so. Where it does not we say that too.

Phase 1 — Scope and access

The first phase ends with a written engagement letter and an evidence request that the operator has formally accepted. We do not begin substantive work until both exist. The engagement letter answers four questions:

  1. Boundary. Where the system ends. An agentic stack typically composes models, tool wrappers, orchestration logic, memory stores, and a user-facing interface. The boundary defines which of those layers the audit covers.
  2. Scope. What the audit is asking. Production behaviour over a defined time window? Model-level safety properties? Regulatory alignment? Procurement attestation? The mix determines everything that follows.
  3. Access. What the operator will give us. Trace logs at what fidelity. Configuration files at what version. Production access (read-only, read-write, none). Engineering-team interviews under what privilege.
  4. Output. What the audit will publish. A confidential report to a named recipient. A summary letter for procurement. A public attestation. A regulator filing. The output determines the writing voice and the redaction discipline.

Engagements where the operator will not commit to one of the four are engagements we decline. We have done so. We will continue to.

Phase 2 — Architecture mapping

The second phase produces a diagram. Not a marketing diagram — a working architecture diagram that distinguishes the model layer from the orchestration layer from the tool layer from the storage layer, names the components in each, and identifies the boundaries at which the system makes externally consequential decisions. The diagram is built from operator documentation, supplemented by what the auditor's own instrumentation reveals.

Three patterns recur:

  • The "obviously a model" pattern. The operator's diagram shows a single model surrounded by glue code. The auditor's diagram shows the model wrapped in a router, a fallback chain, an evaluator, a retrieval layer, and a moderation step. Each layer is itself a candidate auditable surface.
  • The "agent of agents" pattern. Production routes a request through a planner, a worker, a critic, and an escalator, each of which is itself an LLM call with its own prompt and parameters. The audit surface multiplies; the failure modes compound.
  • The "memory is the model" pattern. The system's behaviour depends materially on a retrieval-augmented memory store that has been populated by upstream processes the auditor cannot see. The most consequential decisions are made by the retrieval step, which is rarely included in the operator's diagram.

Phase 3 — Trace and instrumentation review

The third phase establishes whether the system's emitted traces support an audit at all. The auditor reviews a representative sample of production traces against the following questions:

  • Does each trace include the input prompt and the model response, verbatim?
  • Does each trace include the retrieval context the model received?
  • Does each trace include the tool calls the model made, with their arguments?
  • Does each trace include the tool responses the model received?
  • Does each trace include the final action the system took as a result?
  • Are the traces correlated to user identifiers, session identifiers, and consequential outcomes?

"Yes" to all six is rare. The publication's view is that "yes to five" is the minimum threshold for an audit of consequential behaviour; "yes to four" produces a report whose claims are necessarily hedged. Below "yes to four," the auditor's job becomes the documentation of what cannot be audited, which is itself a legitimate output.

Recurring failure

The most common trace-level defect we encounter is the absence of retrieval context. The operator captures the input and the output but not the retrieval that the model conditioned on. Without retrieval context the audit cannot answer the question "would the model have produced this output, absent this retrieval?" — which is the question that drives most consequential failure analysis.

Phase 4 — Targeted re-execution

The fourth phase moves from passive review to active testing. The auditor selects a sample of recent production traces (typically: 10 high-stakes outputs the operator has flagged, 20 randomly drawn outputs from the past quarter, and 10 outputs the auditor has selected on the basis of an outlier signal) and re-runs them against the current production stack, the production stack as it existed at the time of the original output, and a controlled testbed the auditor maintains.

Re-execution surfaces four classes of finding:

  • Reproducible: the system produces the same output. The auditor moves on.
  • Drift: the system produces a different output. The auditor establishes whether the drift is explainable by a configuration change the operator has documented.
  • Undocumented drift: the system produces a different output and the operator cannot explain why. This becomes the report's principal finding.
  • Re-execution refused: the system cannot be re-run because the original retrieval context, model version, or tool dependency is no longer reproducible. This is a finding in its own right.

Phase 5 — Failure-mode interviews

The fifth phase is conversational. The auditor interviews the operator's engineering team about the system's known failure modes, the system's recent incidents, the operator's incident-response playbook, and the operator's escalation rules for consequential output. The interviews are structured but not scripted; the auditor's purpose is to surface the gap between what is documented and what the team knows.

The publication's experience is that this phase produces the most useful material in the report. The team always knows things the documentation does not. The audit's job is to capture that knowledge in a form the operator can act on.

Phase 6 — Report

The sixth phase produces the document. The report's structure mirrors the audit's phases: scope, architecture, traces, re-execution, failure modes, recommendations, appendices. Each finding cites the artefact it rests on. The recommendations are scored by remediation cost on the operator's side and material risk reduction on the downstream user's side. The publication does not score "criticality" without a defended numeric anchor.

The publication's standing position is that audit reports should be readable by non-engineering stakeholders without losing the engineering detail that allows the operator to act. The two-audience problem is real; we resolve it by including a short executive summary that points to specific paragraphs in the technical body, rather than substituting for them.

What this walk-through does not cover

It does not cover red-teaming for safety properties of a foundation model in isolation. That is a different engagement with a different methodology, and the publication's standing position is that it should be conducted by an organisation whose principal expertise is in red-teaming rather than by a working systems auditor. It does not cover compliance attestation against a single regulatory framework — the EU AI Act conformity assessment, for example — because compliance attestation has its own statutory machinery that an audit walk-through cannot substitute for.

Changelog

  • 2026-05-22. Initial publication.

Continue: How we read interpretability claims →