Note 011 January 20269 min

When a model should say 'I don't know' and how we force it to.

Models hallucinate most often at the seams — when the question is just outside the training data, when the document the answer should come from isn't in scope, when the citation would be too inconvenient to fetch. The interface usually hides this. Ours doesn't.

In Menzies.ai and now in Lazlow, we built a refusal layer in front of the generation step. Before any answer composes, the system checks whether the retrieved passages are within a confidence threshold against the query. If they aren't, the model is instructed not to attempt an answer at all — it returns a refusal with a one-line account of what it was looking for and why it didn't find it.

The threshold is tuned per domain. For Hansard, it's higher than for legislation, because Hansard contains formal language a model can pattern-match its way through; legislation needs an exact citation or it's worse than no answer. The tuning is empirical. We start with a guess and adjust until the false-refusal rate (the model saying it doesn't know when the answer is plainly there) and the false-confidence rate (the model answering when the answer isn't there) are both within tolerance.

Two side-effects worth noting. First: refusals improved user trust more than answers did. Users showed every refusal to a colleague and asked "what should this say?" — the refusal became a teaching moment in a way an answer never is. Second: refusals are slower to generate than answers, because they require a second pass over the retrieval results. We accepted the latency. The trade was correct.