The system cannot prove what it did.
Most production AI agents run without a tamper evident log of prompts, tool calls, retrieved context, and generated outputs. Where logs exist, they are kept for incident triage, not for external audit. An insurer or a supervisor who arrives after the fact cannot reconstruct the decision with certainty, and so cannot attribute harm to a specific behaviour of the system rather than to the user or to an adjacent service.
Closing the verification gap requires three things. A durable record of the inputs the agent saw, the actions it took, and the model versions that were active at the time. A cryptographic or otherwise tamper evident mechanism for preserving that record. And a retention schedule that outlives the commercial life of the system, so that late arriving claims can still be investigated.