Explainable AI Is Sensemaking Theory: Plausibility, Not Accuracy

I kept reading XAI and sensemaking papers in the same semester this spring. It was not planned. I was working through the AI governance stack for my comps preparation and reviewing my Weick (1995) notes for the theory chain at the same time. Two separate study tracks, one accidental collision.

Every XAI paper had the same pattern. SHAP values were more accurate than LIME. Attention mechanisms gave more faithful explanations than SHAP. Counterfactual explanations aligned better with actual model decisions than attention. The field was racing toward a single objective: explaining what the model actually did, down to the feature weight, the gradient, the decision boundary. Accuracy was the metric. Faithfulness to model internals was the goal. Everything else was noise.

And then Weick (1995) kept sitting in the back of my head saying something completely different. Weick defines sensemaking as the process through which people construct meaning from ambiguous experience. Among the seven properties he specifies, the critical one for this argument is the seventh: sensemaking is driven by plausibility rather than accuracy. People do not need the truest explanation. They need an explanation that enables them to act. An accurate but incomprehensible explanation will be ignored. A simplified but plausible one will be adopted, used, and refined through further action.

I started seeing the connection everywhere. Explainable AI is a sensemaking problem dressed up as a technical accuracy problem. And the field has been optimizing for the wrong thing since 1995.

Think about who actually uses XAI tools. A doctor receives a SHAP force plot showing that the model flagged the patient as high risk because feature X contributed 0.3 and feature Y contributed negative 0.15. The numbers are accurate. The explanation faithfully represents the model's internal calculation. But the doctor is not running a regression in their head. They are diagnosing a patient based on years of clinical experience and pattern recognition that no model captures. The SHAP values are accurate but they are not plausible within the doctor's clinical frame. The explanation that gets adopted is the one that matches how the doctor already thinks about similar cases. That explanation might be simplified. It might miss interaction effects. It might even be technically wrong about what the model did. But it is plausible, and plausibility is what drives adoption.

The XAI literature has documented aspects of this pattern without using the sensemaking vocabulary. Jussupow et al. (2024) showed that algorithm aversion responds to object properties, interface design, task context, and identity implications, not just algorithmic performance. A doctor does not reject a diagnostic AI because it is inaccurate. They reject it because the explanation does not fit their diagnostic schema, because the interface makes it feel like they are delegating judgment, because using it threatens their professional identity. These are not accuracy problems. They are plausibility problems. The explanation is faithful to the model but unfaithful to the doctor's sensemaking process.

The same dynamic plays out in regulation. The GDPR right to explanation is often treated as a technical challenge: how do you make an opaque model explainable in a legally meaningful way. But the deeper question is what makes an explanation meaningful. Regulators do not need a feature importance ranking. They need an explanation that is plausible enough to support a legal argument, to assign responsibility, to justify a decision in court. Seidel, Frick, and vom Brocke (2025) describe how regulators resolve the Collingridge dilemma through prospective sensemaking, using abstraction to keep rules technology-neutral and elaboration to create legal certainty. The same logic applies to explanation itself. A regulatory explanation must be abstract enough to fit the case and specific enough to be actionable. Technical accuracy alone, like a ranked list of SHAP values, satisfies neither condition. It is too specific to be general and not specific enough to be actionable.

I think XAI has been solving the wrong problem. The field defined explanation quality in terms of fidelity to model internals, which is precisely the accuracy criterion that Weick identified as secondary in human sensemaking. When a person looks at an explanation, they are not evaluating its mathematical correctness. They are asking whether it fits what they already know, whether it tells a coherent story, and whether it helps them move forward. Those are plausibility criteria. And sensemaking theory has been articulating this since 1995.

This has real design implications. If you build an XAI system for a hospital, optimize for narrative coherence, not feature attribution precision. Give the doctor a story about why this patient is different from similar patients, not a ranked list of coefficients. If you design for a regulatory audit, build explanations that can support a legal argument, not one that requires a data scientist to interpret. The right metric is not how close the explanation is to the model's internal representation. It is how well the explanation enables the user to act.

Maitlis (2005) identified four forms of organizational sensemaking that vary on two dimensions: leader engagement and issue diversity. Guided sensemaking has high engagement and focused issues. Restricted sensemaking has high engagement but diverse issues. Fragmented sensemaking has low engagement and diverse issues. Minimal sensemaking has low engagement and focused issues. These four forms map directly onto XAI deployment contexts. A hospital rolling out a diagnostic AI with strong clinical leadership and a focused set of use cases is doing guided sensemaking. The same AI deployed across multiple departments with different patient populations and weak oversight produces fragmented sensemaking, where each unit constructs its own interpretation of the tool and whether it can be trusted. An explanation designed for one context will fail in another. A detailed technical report that works under guided sensemaking with strong leadership and shared frames will be useless in a fragmented context where each unit needs a different plausible story.

I wrote about the broader theory of organizational sensemaking in my post on why sensemaking is not decision-making. The central point there was that people act first and interpret later, and that adoption is never just an evaluation problem. The same applies to explanation. People do not evaluate explanations like they weigh options in a decision matrix. They encounter an explanation, decide almost immediately whether it fits, and either act on it or ignore it. The fit is plausibility, not accuracy.

Seidel, Frick, and vom Brocke (2025) add an important dimension. Most sensemaking theory is retrospective: people interpret what has already happened. But regulators writing rules for AI that does not yet exist must make sense of a future they have not experienced. The same applies to anyone designing explanations for AI systems that are still being built. An explanation designed at deployment must remain plausible as the model changes, as users gain experience, as new edge cases emerge. That is prospective sensemaking in action. The explanation must be abstract enough to stay stable across model updates and specific enough to be useful now. The abstraction-elaboration distinction from Seidel and colleagues applies to explanation design as much as it applies to regulation.

I am not sure the XAI community will adopt this framing easily. The field is deeply invested in accuracy metrics. Every conference introduces a new method that outperforms the last on fidelity benchmarks. But those benchmarks measure something different from what users actually need. If Weick is right, and I believe he is, the explanation that wins in practice will not be the most accurate one. It will be the most plausible one. The one that allows the user to act without getting stuck on the gap between what the model does and what the user understands. That is a different engineering problem. And it is the one we should have been solving all along.