After AI Hallucinates, How Does It Earn Trust Back?

Air Canada's chatbot told a passenger in February 2024 that they could book a regular ticket and request a bereavement discount retroactively. That was not the airline's actual policy. When the passenger relied on the chatbot's word and followed up, Air Canada refused the discount. The company's defense was that the chatbot was a separate legal entity acting on its own. The tribunal disagreed, and the payout was ordered. I read about this case three separate times this semester, and each time I kept coming back to the same question. What would it take for that passenger to trust any Air Canada system again, not just the chatbot, but any system at the airline's boundary? And more broadly, when an AI system makes a mistake, a hallucination, a bad recommendation, a confident wrong answer, what mechanism actually repairs the trust that was broken?

I started with Mayer et al. (1995), because that is where the trust literature starts. Mayer et al. define trust as the willingness to be vulnerable to another party based on the expectation that the party will perform a particular action important to the trustor. They separate trust from trustworthiness, which has three dimensions: competence, the ability to perform; benevolence, the willingness to act in the trustor's interest; and integrity, the adherence to principles the trustor finds acceptable. When an AI system hallucinates, it damages perceived competence most directly. The system was wrong, and the user now knows it can be wrong. But here is the problem for AI trust repair. Benevolence and integrity do not transfer to machines the way competence does. A machine does not have a moral orientation toward the user's welfare. It does not have principles it chooses to follow. It has a training distribution and a loss function. So when trust is damaged in a human relationship, you can rebuild it by demonstrating goodwill, by making amends, by showing that you value the relationship. When an AI hallucinates, there is no goodwill to demonstrate and no relationship to value. The only dimension available for repair is competence. That changes the entire mechanism.

Lee and See (2004) give the IS field a better framework for thinking about this. Their trust in automation model identifies three states. Overtrust, or misuse, happens when humans rely on automation beyond its actual capability. Undertrust, or disuse, happens when humans reject automation that could improve performance. Calibrated trust is the goal, where reliance is matched to the system's actual capability. The critical insight for trust repair is that the goal is not maximum trust. The goal is accurate trust. When a system hallucinates, the user's trust may be too high (they were relying on a system that was wrong) or too low (they now refuse to rely on a system that has been fixed). Repair is not about restoring the previous level of trust. It is about recalibrating to the system's actual capability after the error.

This is why I think the AI industry keeps misunderstanding its own trust problem. I keep seeing product announcements that say something like "we reduced hallucination rates by 40%, so you can trust our model now." That framing assumes that trust repair is about raising the average competence signal. But calibration does not work that way. Lee and See (2004) frame overtrust as requiring transparency about limitations, not just improved average performance. A user who was burned by a hallucination needs to know not just that the model got better overall, but specifically when it is likely to be wrong and when it is likely to be right. Average accuracy improvements do not help a user who cannot distinguish the cases where the model is reliable from the cases where it is not. That distinction matters because the user's calibration baseline is not the same for everyone. One user's experience might involve a hallucination about a factual question, and they recalibrate to distrust factual answers. Another user's experience might involve a hallucinated citation, and they recalibrate to distrust the model's references specifically. The same accuracy improvement means different things to these two users because their calibration baselines shifted in different directions. What the user needs is not a better model on average. They need a model they can predict, one whose failure modes they can anticipate and verify.

I wrote about why treating trust, trustworthiness, reliance, and delegation as interchangeable constructs is a specification error rather than a semantic quibble in a previous post at <a href="/blog/specification-error-trust-variable">trust is not delegation</a>. The same problem shows up in the trust repair conversation. When a company says "we fixed the model, trust us again," they are treating trust repair as a competence demonstration problem. Show better accuracy, and trust returns. But trust in automation is not a simple function of accuracy. It is a function of whether the user can calibrate their reliance to the system's actual capability. The Air Canada case is instructive here because the chatbot's error was not about accuracy in the narrow sense. The model generated a plausible-sounding policy that did not exist. The user had no way to verify the claim without calling someone, and calling someone defeats the purpose of having a chatbot. The system design did not allow verification at the point of interaction. There was no structural cue that said "this answer is within my authority" versus "this answer is an improvisation." Calibration was structurally impossible. No amount of model fine-tuning would fix that, because the problem was not model accuracy. The problem was that the system sat at the organizational boundary with no transparency about its own limits.

I think the three mechanisms that actually repair trust after an AI failure are these. First, demonstrated improvement in the specific failure mode, not just overall accuracy. If the model hallucinated citations, the user needs to see that citation behavior improved, not that the model got better at math. Second, transparency about the system's limitations, which Lee and See (2004) explicitly identify as the intervention for overtrust. The user needs to know where the system is likely to fail so they can deploy their attention strategically. Third, user control over verification. The user needs a mechanism to check the system's output, to override it, to see the evidence behind a recommendation. Without verification mechanisms, the user can never achieve calibrated trust because they cannot distinguish the system's reliable outputs from its unreliable ones.

I think this is also why ChatGPT's accuracy improvements restored trust for some users and not others. The users who already had a calibrated mental model of ChatGPT's failure modes, they knew it was good at summarization but bad at math, they knew it could fabricate citations, they had their own verification habits, those users could incorporate an accuracy improvement into their existing calibration framework. The users who did not have that mental model, who treated ChatGPT as a single undifferentiated oracle and got burned, those users had no calibration framework to update. A general accuracy improvement did not give them the information they needed to rebuild a differentiated trust model. They needed transparency about specific capabilities and limits, and that is not what the accuracy announcement delivered.

The IS trust literature has been saying this for twenty years. Lee and See (2004) frame trust as a calibration challenge, not a maximization problem. The mechanism for appropriate trust is not more capability. It is better information about capability boundaries, delivered at the point of interaction so the user can decide when to rely and when to override. I wrote about the separation between trust and delegation in a previous post, and the principle is the same. Trust is not the goal. Calibrated reliance is the goal, and calibration requires information that most AI systems today do not provide. The industry talks about building trust. It should be talking about building calibratability, designing systems that give users the structural information they need to decide when to trust and when to verify.