When your AI assistant lies more than the cheaper one does

A Hacker News piece published this week surfaced a benchmark comparison that most families will never see but probably should: GPT-5.5, one of the most widely used AI assistants on the planet, produces false or fabricated information at roughly three times the rate of GLM-5.2, an open-weight model released under an MIT license. The gap was not close.

This is not a story about which chatbot is smarter. It is a story about what happens when a household uses AI to make decisions that have real consequences — and the tool they trust most happens to be the one most likely to make things up.

What's actually changing

For the last two years, the default consumer assumption has been simple: bigger brand, better output. OpenAI, Google, Anthropic — these names carry implicit quality signals the way "FDA approved" does on a supplement label. That assumption is cracking.

Open-weight models, meaning models whose parameters are publicly available and can be run locally or cheaply hosted, have been catching up to closed commercial models on accuracy benchmarks for several months. The GLM family, developed at Tsinghua University, has been part of that wave. What the Hacker News analysis highlights is that benchmark accuracy and hallucination rate are not the same thing — and hallucination rate is the metric that matters most when a real person asks a real question and acts on the answer.

Hallucination, in plain terms, is when an AI states something false with the same confident tone it uses to state something true. It does not stutter. It does not add a footnote. It just says the wrong thing as though it were obvious.

For most casual uses — drafting an email, summarizing a long article, brainstorming a grocery list — a hallucination is annoying at worst. But families increasingly use AI assistants to look up drug interactions, research contractor licensing requirements, interpret insurance policy language, and check food safety guidelines. In those contexts, a confident wrong answer is not a minor inconvenience. It is a liability.

What we'd actually do

Stop treating model brand as a quality signal. The assumption that a premium subscription equals higher accuracy is no longer reliable. Before you trust any AI answer on a health, legal, financial, or safety question, ask yourself: would I act on this if I found it on a random forum post? If not, verify it against a primary source — the CDC, your state's contractor licensing board, your actual insurance policy document. The AI is a starting point, not a source.

Build a short list of verification sources for your household's most common high-stakes questions. This takes about twenty minutes once. Write down where you actually go to confirm medication interactions (likely DailyMed or a licensed pharmacist), food safety guidelines (USDA FoodSafety.gov), and local emergency procedures (your county's emergency management site). Post it somewhere visible, or keep it in a shared notes app. When the AI gives you an answer that matters, you already know where to check it.

If you use AI for preparedness research specifically, cross-reference with dated government publications. Shelf life data, water purification ratios, medication storage temperatures — these are exactly the kinds of specific numerical facts AI models hallucinate most readily, because they are precise, they vary by context, and there is a lot of conflicting information in the training data. FEMA, the Red Cross, and the CDC all publish downloadable preparedness guides. Download them. They do not hallucinate.

Consider whether a local or open-weight model makes sense for your most sensitive queries. Running a smaller open-weight model locally is not a project for most households right now — it still requires comfort with command-line tools. But awareness is useful. Browser extensions and privacy-focused AI interfaces that route to open models are becoming more accessible. The point is not to chase the newest tool. The point is to know that "free and open" does not automatically mean "worse."

The bigger picture

The pattern here is not that AI is getting more dangerous. The pattern is that AI is getting more normal — woven into daily household decisions quietly enough that the failure mode becomes invisible. When a tool is new, people are skeptical. When a tool is familiar, people stop checking.

The goal for a prepared household is not to avoid AI tools. They are genuinely useful. The goal is to treat them the way you treat any fast, confident, occasionally unreliable source of information: with a clear-eyed understanding of where they tend to fail, and a habit of verification on the questions that actually matter.

Durability is not about predicting which model will lie to you next. It is about building the reflex to check before you act.

When your AI assistant lies more than the cheaper one does

What's actually changing

What we'd actually do

The bigger picture

Keep exploring this topic

Gemma 4 12B lands on consumer hardware — and that changes the household AI calculus

When your AI coding assistant becomes a liability: what families need to know

When the S&P 500 says no to AI giants, what does that tell households about the boom?