Trust Calibration for AI Is Not Trust Calibration for Humans

I was reading Lee and See's 2004 framework again last week, and for the first time the word that kept bothering me was "observe." Their model of trust calibration depends on the operator's ability to observe the system. You watch performance, you update your trust, and over time your trust converges toward the system's actual capability. Overtrust means you trusted beyond what you observed. Undertrust means you trusted less than what you observed. Calibrated trust means the two matched. The entire mechanism is built on a single assumption: that you can observe the system well enough to calibrate. For the industrial automation Lee and See were studying, this assumption holds. A nuclear power plant operator can see whether the alarm system triggered correctly. A pilot can feel whether the autopilot held altitude. The system's behavior is observable, deterministic, and bounded. You can calibrate trust in a system whose performance you can monitor because the system's behavior is stable enough that past performance predicts future performance.

I cannot observe a large language model the same way. I cannot see its reasoning process. I cannot predict its performance on a prompt I have not tried. Its behavior is non-deterministic: ask the same question twice and get two different answers, neither necessarily right. And its capabilities are evolving faster than my ability to evaluate them. GPT-4 today is not the same system it was six months ago, and I have no transparent record of what changed. When I calibrate trust in an LLM based on past performance, I am generalizing from a sample that does not represent the system's actual capability distribution. The distribution shifted while I was forming my estimate.

This is not a minor boundary condition. It is a fundamental break in the calibration mechanism that Lee and See defined. Their framework assumes three things that are observable: the system's performance, the system's process, and the system's purpose. For a mechanical system, all three are available. For an AI system, none of them are. Performance on one task does not predict performance on another. The process is hidden in a neural net. Purpose is defined by the deploying organization, not the system itself, and the system's behavior does not always align with that purpose in ways the user can detect. The calibration input that Lee and See require is precisely the input that AI systems do not provide.

McKnight et al. (2002) add another layer. They distinguish four types of trust: disposition to trust, institution-based trust, trusting beliefs, and trusting intentions. Institution-based trust is the belief that structural guarantees create safety for dependence on a specific party. I wrote about how zero-trust architecture is institution-based trust made operational, and I still think that argument holds. But AI systems test institution-based trust in a way that traditional web services did not. When I trust a website because it has SSL, the structural guarantee is about the encryption of the connection. The website's behavior under that connection is not at issue. When I trust an AI system because a regulatory framework says it was audited, the structural guarantee covers the audit, but the system's behavior after the audit is exactly what I need to calibrate on, and that behavior is opaque, non-deterministic, and changing. The institution-based trust mechanism still works for the infrastructure. It does not work for the output.

Vanneste and Puranam (2025) make this sharper. They argue that humans may not be able to achieve identification-based trust with AI agents because AI lacks intentionality in the human sense. Identification-based trust, from Lewicki and Bunker, is the deepest layer: you trust because you share values and understand each other's goals. Vanneste and Puranam point out that this requires the trustee to have something resembling intention, and current AI systems do not. The user might feel identification with a system that writes in a familiar tone or mirrors their preferences, but that feeling is not grounded in the system's actual motivational structure because the system does not have one. This means that the highest layer of the trust hierarchy, the layer that makes the fastest and most resilient trust possible between humans, is structurally unavailable for AI trustors. You can build disposition, institution, and maybe even believing beliefs about an AI system. But the deepest calibration mechanism, the one where trust becomes self-reinforcing because you understand what the other party is trying to do, is locked out.

Glikson and Woolley (2020) reviewed trust in AI and found that most studies still treat trust as unidirectional: from the human to the system. The system is a passive target of trust, not an active participant. But if the AI system can act on its own, make decisions, escalate, or hold back information, then trust needs to flow both ways, and the calibration mechanism needs to account for the system's own model of the human. This is not a metaphor. Glikson and Woolley observed that trust in AI is different from trust in other technologies precisely because AI can behave in ways that look intentional. The user responds to the behavior as if it came from an agent, which means the user applies human trust heuristics to a system that does not operate by the same rules. The calibration breaks because the user is calibrating against a model of agency that the system does not actually have.

Zalmanson et al. (2022) demonstrated something related in the context of privacy and disclosure. Social trust cues on a platform can increase private information disclosure, which is beneficial for personalization and also raises risk. The platform creates an appearance of safety and responsiveness, and users disclose more. I think this mechanism is more dangerous with AI because the trust cues are generated by the system itself. When a conversational AI produces a response that sounds empathetic, confident, and aware of context, the user has no way to distinguish genuine reliability from performative reliability. The system does not have empathy. It has a model that predicts empathetic-sounding token sequences. The user is responding to a cue that looks like the social cues Zalmanson's research studied, but the cue is manufactured. Calibration requires that the trust signal correlate with the trustworthiness of the target. When the signal is produced by the system's output layer rather than by an honest indicator of the system's internal state, the correlation is weak or negative.

Mammadov et al. (2026), writing in IEEE Transactions on Engineering Management, address a problem I had not seen named before: trust evaluation under data imbalance. When the training data for a system is imbalanced, the system's trustworthiness varies across subgroups in ways that are invisible from aggregate accuracy metrics. This is a calibration problem at the data layer. The operator calibrates trust based on average performance, but the system's performance for specific subpopulations or specific task types may be dramatically different. The user cannot observe this imbalance from the outside. The trust calibration fails not because the user is irrational but because the signal the user receives is an average that conceals the distribution. I cannot verify the specific findings of this paper from my local files, so I am hedging here, but the argument is consistent with what Lee and See would predict: if the observable signal does not reflect the system's actual capability distribution, calibration cannot converge.

So here is where the framework actually breaks. Lee and See's three calibration states, overtrust, undertrust, and calibrated trust, assume that the trustor can observe the trustee's behavior well enough to adjust. McKnight's typology assumes that institution-based trust provides structural guarantees that substitute for direct observation when the trustee is unknown. Mayer et al. assume that the trustee's ability, benevolence, and integrity are inferable from behavior. All of these mechanisms were designed for a world where the trustee is either a human or a mechanical system whose behavior is observable, deterministic, and bounded. AI systems violate all three. The behavior is observable only at the input-output level. The process is opaque. The capabilities change without notice. The system produces trust cues that do not correlate with its internal reliability. And the deepest form of trust that humans use for fast, resilient relationships is structurally unavailable because the system does not have the intentionality that identification-based trust requires.

I keep coming back to the distinction between trust and delegation that I wrote about earlier. Trust is an attitude. Delegation is a behavioral decision. You can trust a system and still not delegate to it. You can delegate to a system you do not trust because organizational policy requires it. The calibration problem for AI sits right at the gap between trust and delegation. Organizations are asking whether they can trust AI systems. But the question that matters for organizational outcomes is whether they should delegate specific decision rights to AI systems, and that question requires calibrating trust at a granularity that current frameworks do not support. You need to calibrate trust in the system's ability to handle this specific task, in this specific context, at this specific point in the system's capability lifecycle. Not overall trust. Task-specific, context-specific, time-specific trust. And you need to recalibrate every time the system changes, which in the current deployment environment means continuously. I wrote about why hallucination is a calibration problem rather than a technical one and why trust repair requires more vulnerability than the AI industry is willing to show. This post is about why the calibration framework itself needs rebuilding.

What would new calibration mechanisms look like? I think they need to operate at the level of the interaction, not the level of the system. Lee and See's framework calibrates trust in the system. But for AI, what matters is trust in the system's output for this task in this context at this time. That requires structural information that most current AI systems do not provide: confidence signals tied to specific output domains, transparency about which training distributions the system is drawing from, version histories that let users track what changed and when, and structural separations between the system's reliable and unreliable output ranges. Not a general accuracy number. A domain-specific, versioned, auditable reliability profile that the user can calibrate against.

McKnight et al. gave us institution-based trust as the mechanism for situations where you cannot evaluate the trustee directly. For AI, the institution is the deployment context: the organization, the regulatory framework, the audit mechanism, the version control system. None of these exist in a mature form right now. The EU AI Act is a start, but it regulates risk categories, not calibration mechanisms. What IS research needs is a theory of AI trust calibration that replaces the assumption of observability with the assumption of opacity, replaces the assumption of determinism with the assumption of non-determinism, and replaces the assumption of stability with the assumption of continuous capability evolution. Lee and See gave us the right framework for the wrong world. The world we actually live in has non-deterministic, opaque, evolving trustees that produce trust cues they do not mean. Calibrating trust in that world requires a different mechanism entirely.