M.010 The trust paradox of AI

Why more trustworthy AI might be bad for you

Jul 16, 2025

You are reading Molekyl, finally unfinished thinking on strategy, creativity and technology. Subscribe here if you want new posts sent directly to your inbox

One thing most of us value incredibly high when relying on someone or something is trust.

Without sufficient trust, I would never send my kids to kindergarten, I would not let a stranger cut me with a scalpel when I need surgery, and I would definitely not board a plane.

While we mostly associate trust with how we perceive other people, trust also extends to the tools we use in our lives.

I trust my phone to work when I need it. I trust my car to not break down when we are going on a car trip. And I trust my calculator to give me correct answers.

Trust is a lubricant that makes so much in the world run faster, smarter and better. Without it, things quickly start to crumble.

Therefore, it's not a surprise that one of the most discussed issues with AI, and especially LLMs, is if we can trust the answers they give us.

The current problem is that we can't.

LLMs are designed to give you answers, whether they know the answers or not. But not only that, LLMs also present wrongful answers in the same eloquent and convincing ways as they do with correct answers.

The tendency of LLMs to hallucinate directly affects our trust in these tools, and therefore how we interact with them. The AI-labs know this, and do their best to eliminate or reduce the danger of hallucinations.

These efforts seem to work, as the large language models of today hallucinate way less than a year or two ago. Tomorrow they will be even better.

But what if this ongoing quest to eliminate hallucinations and wrongful answers might actually be bad for us?

The paradox of trust in AI

To understand why, we must look to the concept of trust again.

Trust is the bridge between what we expect and what we get. When I trust something or someone, I'm essentially making a prediction about a future outcome or behaviour. And this prediction is based on either my past experiences, or the past experiences of others I already trust.

Trust is not something you can buy in the store, but something that has to be earned through consistent behaviour over time.

While trust can take a long time to build, it's also something that can be torn down quickly. One grave mistake, dishonesty or perceived lie that challenges our expectation, and the trust in someone or something can evaporate with the blink of an eye and take forever to rebuild.

The AI-labs know this, and strive to make model behaviour consistently good, and limit behaviour that can hamper users' trust. When we trust their models, we will use them more, use them for more tasks, and integrate them more deeply into our lives and work. Which is good for business.

The issue, however, is that there might be a downside in models becoming more trustworthy. And that downside is that many of us could become less advanced users of LLMs if the trust issue is fully resolved.

To illustrate why, we can look to a study by Dell'Aqua from 2022, with the telling title: "Falling asleep at the wheel". In the study, 181 recruiters were tasked to evaluate a set of job applications. The recruiters were then randomly selected into three groups. One group evaluated the applications the old fashioned way. Another got a state of the art AI-recruitment helper. While the third group got a lower quality version of the AI to help them. Which group do you think performed the best?

The intuitive answer is the group with the best AI. After all, a human + a better AI should be better than a human + a worse AI. It turns out that this wasn’t so. The best group was the recruiters working with the worse AI. Why? Because those with the best AI seemingly trusted the system too much, and delegated too much responsibility and judgment to the AI. They fell asleep behind the wheel. The group with the worse AI learned that the AI couldn't be fully trusted, stayed sharp, and most importantly, stayed in control.

Or in different words: The more we trust an AI, the more we inadvertently delegate judgment and thinking to it, and the less we are in control.

But won't AI just get better?

One rebuttal is that it is only a temporary problem. After all, AIs are getting better at more and more tasks by the day, and when they are perfect, the problem goes away. Right?

AI undoubtedly improves at an astonishing speed. Things that were science fiction a few years back are reality today.

But the frontier in AI is inherently jagged, and will likely continue to be so for years to come. A jagged frontier simply means that while LLMs are very good at something, they can simultaneously be very bad at something else.

If you ask your AI of choice to explain Hawking radiation in a way that a ten year old would understand, it will do this really well. Maybe even better than Stephen Hawking himself could have done.

But if you play rock, paper, scissors with the same model, and you win every time because it reveals its hand before you make your move, it will struggle to explain why you always win. It's a fun experiment, try it. Incredibly smart at some things, incredibly dumb at other things.

The biggest issue with the jagged frontier of AI is that it's not always easy to know where they are good and where they aren't. For areas where you yourself are an expert, it's easier. For other areas, very hard.

And this is where trust comes back into play. The less we observe that a model makes mistakes or delivers inaccuracies on one task, the more likely we are to trust it, and the more likely we are to extend this trust also to other tasks (where it may be less good). And the more likely we are to fall asleep at the wheel and outsource our thinking and judgment to the models when we shouldn’t.

So what can we do to make sure we stay awake when interacting with LLMs?

Not the answer, but an answer

Rather than waiting and hoping for AI to be good at everything, I think it's better to calibrate our views on what LLMs are and what they are not.

LLMs were not designed to give you the correct answer. But an answer.

With an LLM, you will always get an answer. Often it's correct. Other times it's one of many potential meaningful answers. And other times it's flat out wrong.

Instead of seeking “the right answer” when we interact with AI, maybe we should view answers like “one way to look at the issue at hand”.

This simple reframe can help nudge us into preserving our own agency in the interaction. If we treat an AI response as "the answer", we become passive. After all, our problem was solved. If we instead treat a response as "an answer" - one out of many possible - we are activated. And we remain in control.

I have spent much time interacting with LLMs for various tasks, and try my best to maintain such a perspective. One thing I have noticed though, is that this is much easier to do when I experience regular betrayals from the models.

Each time an LLM tells me something wrong about a topic I know well, easily changes its opinion when I challenge a response, or when its evaluation of my writing dramatically changes based on how I prompt the reviewer role, I am reminded to use my own judgment. And over time, each betrayal contributes to me being a more confident user of the systems, because I have to put more trust in myself.

The question then is what will happen as models become better and better, and these learning-from-betrayal moments become fewer. Will each of us then be less critical, and more sedated users of AI? Are users who never experienced the unreliable early LLMs less likely to naturally develop a healthy skepticism?

I fear that the answer to each of these questions is yes. And if it is, then less hallucination might indeed be bad news.

Maintaining healthy distrust

From all this, I think some built-in distrustful behaviors in AIs is healthy. Not so much that it makes the models useless, but enough to wake us up, sharpen our critical thinking and nudge us to use our own judgment when interacting with AI.

Since the AI-labs steadily improves the accuracy of their models, and are unlikely to optimize for occasional distrustful behaviour, it increasingly falls on us users to seek such moments. For example by actively pushing the models to change their opinions, or give us harsh feedback on something they praised in another chat. Just to remind ourselves that they give us one possible answer, and not necessarily the the definitive one.

As AIs become better and more trustworthy, I fear it will become increasingly difficult to maintain a healthy distrust, and to maintain human agency and judgment when interacting with AI. If so, the question is less when we can trust the AI, but rather when we cannot trust ourselves.

Molekyl