> And crucially, we made sure to tell the model not to guess if it wasn’t sure. (AI models are known to hallucinate, and we wanted to guard against that.)
Prompting an LLM not to confabulate won't actually prevent it from doing so. It's so disappointing to see an organization like this, that's mission is to inform the public, used AI not understanding the limitations and then making a claim like this.
> Of course, members of our staff reviewed and confirmed every detail before we published our story, and we called all the named people and agencies seeking comment, which remains a must-do even in the world of AI.
That sounds to me like they absolutely do understand the limitations of the technology they are using.
Criticism feels harsh. Of course models don't know what they don't know. Reporters can have the same biases. They could have worded it better "lowers the probability of hallucinating", but it is correct it helps to guard against it. It's just that it's not a binary thing.
> And crucially, we made sure to tell the model not to guess if it wasn’t sure. (AI models are known to hallucinate, and we wanted to guard against that.)
Prompting an LLM not to confabulate won't actually prevent it from doing so. It's so disappointing to see an organization like this, that's mission is to inform the public, used AI not understanding the limitations and then making a claim like this.
From that same article:
> Of course, members of our staff reviewed and confirmed every detail before we published our story, and we called all the named people and agencies seeking comment, which remains a must-do even in the world of AI.
That sounds to me like they absolutely do understand the limitations of the technology they are using.
Criticism feels harsh. Of course models don't know what they don't know. Reporters can have the same biases. They could have worded it better "lowers the probability of hallucinating", but it is correct it helps to guard against it. It's just that it's not a binary thing.
> we made sure to tell the model not to guess if it wasn’t sure
Fair enough, but it's kind of ridiculous that in 2025 this "hack" still helps produce more reliable results.
It definitely does mitigate the risk (pretty substantially in my experience!)