AI Hallucination Explained: Why AI Makes Things Up

You ask ChatGPT for ten idioms about the weather. It hands you a clean list. Six are real. Two are obscure but defensible. And two are completely invented — phrases that no native speaker has ever used, presented with the same calm confidence as raining cats and dogs. If you print that worksheet, your students will learn fake English, and you will have to un-teach it next week.

This is called AI hallucination, and it is the single biggest risk for any ESL teacher who uses AI to build materials. The good news: once you understand why it happens, spotting it becomes almost mechanical. This guide explains the cause, shows you the categories of hallucination that hit ESL materials hardest, and gives you a five-step verification routine you can run in under two minutes per worksheet.

What Is an AI Hallucination?

An AI hallucination is a confident, fluent, plausible-sounding answer that is factually wrong. The model is not lying — lying requires intent. It is doing exactly what it was built to do: predict the next likely word. When the most likely word happens to also be the correct word, you get a useful answer. When the most likely word is just a good guess that fits the pattern, you get a hallucination dressed in the same calm, authoritative tone.

This matters for teachers because we are trained to trust well-formatted information. A neat bullet list of phrasal verbs in a Times New Roman PDF looks like a textbook. Our brains tag it as authoritative. AI exploits that instinct without meaning to.

Why AI Models Hallucinate

Large language models like ChatGPT, Claude, and Gemini are not databases. They do not look anything up by default. Understanding the three structural reasons they invent things will help you predict which parts of your lesson materials are most likely to be wrong.

1. They Predict, They Don’t Verify

At the core, every large language model works the same way: given everything written so far, what word is most likely to come next? It runs that calculation billions of times to produce a paragraph. There is no internal step where it asks, is this actually true? Truth is not a variable in the equation. Plausibility is.

This is why AI is excellent at sounding like a textbook and terrible at being one. A textbook tone is a pattern. Patterns are exactly what these models reproduce.

Two women working together, both are looking at the laptop screen.

2. They Were Trained on Imperfect Data

The training corpus for most major models includes Wikipedia, Reddit, scraped blogs, forums, books, and roughly half the open internet. Some of that content is excellent. Some of it is wrong. Some of it is a student in 2014 misremembering a grammar rule in a forum thread. The model has no way to weight a peer-reviewed linguistics journal more heavily than a confidently incorrect Quora answer when both use the same vocabulary.

This is especially dangerous for niche topics. CEFR level descriptors, TOEIC scoring nuances, IELTS band descriptors for specific subskills — the open web has very few reliable sources on these, so the model fills in gaps with adjacent-sounding material.

3. They Fill Gaps with Confident Guesses

If you ask a model for ten examples of something and only eight strong examples exist in its training, it does not say, I only know eight. It produces ten. The last two will be smooth, grammatical, and invented. This is the most common hallucination pattern in ESL materials, and the one teachers should watch for first.

Common AI Hallucinations in ESL Materials

After running thousands of AI-generated worksheets past native-speaker review, the same five categories of error appear over and over. Memorize these — they are the failure modes you need to actively hunt for.

Invented Idioms and Phrasal Verbs

Ask for fifteen idioms about money and you will reliably get three or four that no native speaker has ever heard. The structure is correct. The metaphor sounds plausible. They are still fake. To purse the coin, to burn a sterling hole, to sweep the till — examples like these appear constantly in raw AI output. Always cross-check unfamiliar idioms against a real corpus.

Invented Grammar Rules

AI is generally strong on standard grammar but invents rules at the edges. It will sometimes claim that the past perfect must be used in conditional sentences when only the past simple is required, or invent restrictions on stative verbs that real grammars do not impose. These are the most dangerous hallucinations because they sound exactly like the explanations students remember from school.

Fake Citations and Sources

This is the most well-documented hallucination type. Ask any major model for a peer-reviewed study supporting a teaching method and it will hand you a beautifully formatted citation with a real-looking journal name, a real-sounding author, a plausible year — that does not exist. Lawyers have been sanctioned for filing briefs full of these. Teachers should never include AI-generated citations in materials without verifying every single one.

a cell phone with a lit up screen in the dark

Wrong CEFR or Proficiency Levels

Ask for a B1-level reading passage and you will frequently receive a B2 text with a B1 label. The model has a fuzzy sense of “easier” and “harder” but no real grasp of the official descriptors. The same goes for TOEIC and IELTS band targeting — the labels often do not match the actual difficulty. Always check sentence length, lexical density, and the specific structures the level requires.

Hallucinated Cultural and Historical Facts

Reading passages about specific places, events, or holidays will often contain confident, specific, fabricated details. A passage on the Dragon Boat Festival might invent a date the holiday was “first internationally recognized” in some specific year. Real-sounding numbers are a red flag — if a model gives you a precise statistic or date for anything cultural, verify it before printing.

How to Spot Hallucinations in Your AI-Generated Lessons

The fastest mental shortcut: confidence is not evidence. The smoother and more authoritative a passage sounds, the more carefully you should check it. Hallucinations almost never look uncertain — that is the point. They are wearing the same outfit as the truth.

Indonesian College Students Wearing Batik Confused Thinking Pose

Watch for these warning patterns in any output you plan to put in front of students:

Specific numbers without sources — exact percentages, dates, or statistics
Round-numbered lists — exactly 10 idioms, exactly 15 phrasal verbs (the last few are often padded)
Named studies or named experts — verify each one exists before citing
Obscure cultural details — names of festivals, historical figures, or regional foods
Quotations attributed to anyone — AI invents quotes constantly, even from famous people

Five Quick Checks Before You Print That Worksheet

This is a routine you can run in under two minutes per worksheet. Once it becomes a habit, you will catch most hallucinations before they reach a single student.

Search any unfamiliar idiom or phrase in quotation marks on Google. If you get fewer than a few thousand results, treat it as invented until proven otherwise. Real idioms have millions of hits.
Verify every citation by name. Paste the journal name and the title into a search engine. If nothing comes back, the citation is fake.
Cross-check grammar claims against Cambridge Grammar or the Cambridge Dictionary. If a rule does not appear there, do not teach it.
Re-check the CEFR level by counting. A1 sentences are under 10 words. B1 averages 12-15. C1 routinely exceeds 20. If a “B1” passage averages 25-word sentences, it is mislabeled.
Search any cultural or historical fact against a stable source. Wikipedia, BBC, the Library of Congress, or government tourism boards are reliable cross-checks. Never trust a specific date or statistic from AI alone.

Computer screen displaying interface controls — the AI tools ESL teachers run for lesson prep — Every AI tool teachers reach for runs the same prediction loop — useful, but never a substitute for verification.

When AI Is Reliable (and When It Isn’t)

Not all AI tasks carry the same hallucination risk. Some are nearly bulletproof. Others are minefields. The pattern is simple: AI is most reliable when it transforms text you already have, and least reliable when it produces specific facts from scratch.

Low risk — go ahead and use it:

Simplifying a reading passage you provided to a lower level
Generating comprehension questions from a text the AI can see
Rewriting your existing rubric in clearer language
Brainstorming warm-up activity formats
Translating instructions into students’ L1

High risk — verify before using:

Lists of idioms, phrasal verbs, or vocabulary by frequency
Historical or cultural facts in reading passages
Citations or research support for a teaching method
Exact CEFR, TOEIC, or IELTS level claims
Quotations attributed to specific people
Statistics, percentages, or specific dates

Teaching Students About AI Hallucination

If your students use AI for homework — and they do, whether they tell you or not — then AI hallucination is a literacy issue you need to teach directly. Most students assume AI is essentially a smarter search engine. They submit AI-generated essays full of invented facts and are genuinely shocked when called out.

A 15-minute classroom demonstration is enough. Ask the class to pick a topic. Have them request ten facts about it from any AI. Then verify each fact together on the board. Watching two or three confident, well-formatted sentences fall apart under a Google search teaches more about AI than any lecture. It also makes students far better critical readers of every AI-generated text they encounter for the rest of their lives.

Children in a Classroom. In the back of a classroom, are children about 11 years old with a female teacher talking about the

The Future Is AI-Assisted, Not AI-Replaced

AI is genuinely useful for ESL teachers. It cuts hours off lesson prep, drafts decent rubrics, simplifies passages reliably, and explains concepts to students in their L1 better than most translation tools. Used well, it can give you back ten hours a week.

But it is a confident, fluent intern with no quality control. Treat every AI output as a first draft from someone who has never been fact-checked in their life. Run the five-step verification routine. Trust the routine more than your impression of the text. Hallucinations look exactly like the truth until you check — that is the entire problem, and it is also the entire solution.

The teachers who will get the most out of the next decade of AI tools are not the ones who use AI the most. They are the ones who learn fastest to tell the difference between an AI being useful and an AI being plausible.

Zdroje

Looking for free ESL worksheets? Browse our full collection — printable resources for every level and age group. Browse Free Worksheets →

Students can access free worksheets and reading materials at 18KEnglish.com.

Need help ranking on Google? RankOnRepeat manages your entire blog — keyword research, writing, and publishing — so your business shows up when customers search. See how it works →

🇹🇼 Also in Chinese: This topic is available on 18KEnglish.com — bilingual worksheets for Chinese-speaking students.