Write ESL Tests and Curriculum With AI: Prompt Guide

Most ESL teachers I know spend more time building tests and curriculum documents than they spend teaching. A unit plan eats a Sunday. A mid-term assessment eats two evenings. A scope and sequence for a new course can eat an entire holiday break. AI does not erase that work, but it can compress it from days into hours — if you treat it like a junior co-writer who needs a brief, not a magic button you press.

This guide walks through the workflow I actually use to write ESL tests and curriculum with AI. It is not about which chatbot is best. It is about the prompts, the order of operations, and the quality checks that turn AI output from generic mush into materials you can hand to a real class on Monday morning.

Why AI Changes the Process, Not the Output

The first mistake teachers make is asking AI for a finished product. “Write me a B1 reading test” produces something that looks plausible and falls apart on inspection: passages that drift outside the level, distractors that are obviously wrong, vocabulary that does not match the unit. The output is fast but unusable.

The teachers who get value out of AI flip the workflow. They use AI for the parts of curriculum and test design that are slow but mechanical — generating distractors, rephrasing prompts, drafting rubric language, expanding outlines, checking level consistency — and they keep the high-judgment work for themselves: choosing the construct, sequencing topics, deciding what counts as evidence of learning. AI writes; you decide.

Start With a Scope and Sequence, Not a Lesson

The single biggest leverage point is the scope and sequence — the document that lists what gets taught in what order across a term or year. Most ESL teachers skip this step because it is tedious and abstract. AI is unusually good at it, precisely because it is pattern work.

The Scope-and-Sequence Prompt

Here is the prompt skeleton I use. Paste it into your chatbot of choice, fill in the brackets, and iterate from the output:

You are an experienced ESL curriculum designer. Build a [16-week] scope and sequence for a [CEFR B1] [general English] course meeting [twice a week, 90 minutes]. For each week, give me: (1) topic theme, (2) target grammar point, (3) target lexical set of 12-15 items, (4) one productive skill focus (speaking or writing), (5) one receptive skill focus (reading or listening), (6) the can-do statement learners should be able to demonstrate by the end. Sequence grammar so that later points scaffold on earlier ones. Recycle vocabulary themes across weeks 4, 8, and 12 for revision.

What makes this prompt work is the constraints. The duration, level, frequency, and structure of the output are all specified. Without them, AI defaults to a vague generic syllabus that could fit any class anywhere. With them, you get a usable first draft you can edit in an hour instead of building from scratch in a week.

Children in a Classroom. In the back of a classroom, are children about 11 years old with a female teacher talking about the

Aligning to CEFR, TOEIC, or Local Standards

If you teach to a specific framework — CEFR, TOEIC bands, IELTS targets, or a national curriculum like the 108 課綱 in Taiwan — name it explicitly in every prompt. AI handles CEFR well because the descriptors are widely documented. It handles local frameworks less well, so for those you should paste in the actual can-do statements or band descriptors as reference. The pattern is: give AI the rubric, then ask it to align content to that rubric, rather than trusting it to remember.

Building Unit Plans From the Scope

Once the scope exists, each unit plan becomes a follow-up prompt rather than a fresh problem. The chatbot already has the context — the level, the recycling pattern, the can-do statements — so you can ask for unit details without re-explaining the course.

Expand Week 5 into a full unit plan. Give me: lesson 1 (presentation of new grammar in context), lesson 2 (controlled and freer practice, leading to a productive task). For each lesson, include warm-up, main stages with timing in minutes, materials needed, and one differentiation tip for stronger learners and one for weaker learners. Keep total time at 90 minutes per lesson.

The differentiation request is the part most teachers forget to ask for and most coursebooks do not provide. AI generates plausible differentiation suggestions instantly — extension questions for fast finishers, scaffolds for struggling readers, sentence frames for low-confidence speakers. Some are obvious, some are useful, all are easier to edit than to invent from a blank page.

Writing Test Items With AI

Test writing is where AI saves the most time and where teachers most often misuse it. The rule of thumb: AI should generate items, you should select and edit them. Never ship an AI-written test that has not been read line by line.

A tattooed adult studying with notes and textbooks at a sunlit wooden desk.

Multiple-Choice Items

Distractor writing is the slowest part of multiple-choice authoring. AI is genuinely strong here if you constrain it:

Write 10 multiple-choice items testing the present perfect vs. past simple distinction at CEFR B1. Each item should have one correct answer and three distractors. Distractors must be plausible — they should reflect typical learner errors, not random wrong words. For each item, briefly note which learner error each distractor targets (L1 interference, overgeneralization, time-marker confusion, aspect confusion).

That last sentence is the trick. By asking AI to justify each distractor, you force it to produce diagnostic items rather than filler. You also get a built-in error analysis you can use later to teach the same students after the test.

Reading Comprehension

For reading passages, ask AI to write to a specified word count, lexical band, and topic. Then check the result against a readability tool — most chatbots cannot reliably hit a target Lexile or Flesch-Kincaid score on the first try. A two-pass workflow works well: generate a passage, paste it into a readability checker, then prompt AI to revise specific sentences that fall outside the band.

For comprehension questions, mix item types deliberately. A good 10-item set might include three literal-detail questions, three inference questions, two vocabulary-in-context items, one main-idea question, and one author’s-purpose item. State that breakdown in the prompt and AI will follow it.

Listening Tasks

AI cannot record audio for you, but it is excellent at writing scripts. Ask for a dialogue or monologue of a specific length, with target structures embedded naturally, and with the kind of redundancy and false starts that real speech contains. Then either record it yourself or use a text-to-speech tool. The advantage over commercial coursebook audio is that the topic and language match exactly what your class has been studying.

gray and white click pen on white printer paper

Writing Prompts and Rubrics

For productive skills, AI shines at generating rubric language. The hardest part of a rubric is writing band descriptors that are specific enough to be useful but flexible enough to apply across many student responses. Try this prompt:

Write a 4-band analytic rubric for a CEFR B1 opinion essay (180-220 words). Include four criteria: task fulfillment, organization, language range, and language accuracy. For each criterion, write band descriptors at 4 (above expected), 3 (meets expected), 2 (below expected), and 1 (well below expected). Each descriptor should be a single sentence with concrete observable features.

Then go further: ask AI to write three sample student responses at different band scores, justify the score, and explain the gap to the next band. This gives you anchor scripts for inter-rater reliability when you train colleagues — a thing schools usually pay external consultants to produce.

Quality Control Checks Every Single Time

AI confidently produces tests with broken items. Before any AI-generated assessment touches a student, run it through these checks:

Key check: work every item yourself. AI sometimes marks the wrong option as correct, especially in grammar items with two defensible answers.
Level check: scan vocabulary for words above the target band. Replace anything that breaks the level.
Cultural check: reading and listening texts often default to a generic American suburban context. Edit for your learners’ reality, not the chatbot’s.
Distractor plausibility: remove any distractor that is obviously absurd. “None of the above” and silly options waste item space.
Construct alignment: ask whether each item actually tests what the unit taught. AI sometimes drifts into adjacent grammar or vocabulary not covered in class.
Length and timing: count words, estimate reading time at 100-150 wpm for the target level, and adjust so the test fits the period.

Captured in a metropolitan Atlanta, Georgia primary school, seated amongst his classmates, this photograph depicts a young Af

Building a Term-Long Question Bank

The compound benefit of AI for assessment shows up across a term. Each unit, ask AI to generate twice as many items as you need. Use half on the unit test, keep half in a question bank tagged by skill, level, and topic. By the end of a term you have hundreds of pre-vetted items you can shuffle into review tests, makeup tests, or alternative versions for academic-integrity reasons.

Store the bank in a simple spreadsheet — one row per item, columns for stem, options, key, skill, level, topic, source unit, and a flag for whether it has been used yet. This is the kind of data infrastructure that used to take a department a year to build. With AI in the loop, an individual teacher can build it across two terms of normal lesson prep.

A Realistic End-to-End Workflow

Here is what an actual week looks like for a teacher using this system:

Sunday, 30 minutes: open the scope and sequence. Confirm the week’s grammar, vocabulary, and skill focus. Prompt AI to draft two lesson plans following the unit template.
Sunday, 30 minutes: edit the plans for your class — swap topics, adjust timing, add the names of activities your students already know.
Monday morning, 15 minutes: generate one warm-up variation and one exit-ticket question for each lesson, drawn from the week’s content.
Wednesday, 20 minutes: prompt AI to draft 20 multiple-choice items for the upcoming unit test, with the diagnostic-distractor instruction above.
Thursday, 30 minutes: review items, kill the weak ones, edit the survivors. Add a reading task and a short writing prompt with the rubric AI drafted.
جمعه: print, photocopy, teach, test, repeat.

The total AI-assisted prep time is roughly 2 to 3 hours a week for a course that previously took 6 to 8. That difference is what AI actually buys you — not better materials than a senior teacher would produce, but professional-quality materials in the time slot a working teacher actually has.

turned-off MacBook Pro between cup of coffee, iPhone, notebook, and pen

What AI Still Cannot Do

It helps to be specific about the limits. AI cannot tell you whether your sequencing makes sense for your particular learners, because it does not know them. It cannot judge whether a writing prompt will be culturally appropriate for a class of teenagers in Taipei versus adult professionals in Hanoi. It cannot reliably field-test items for difficulty — only real student responses can do that. It cannot defend its assessment decisions to a parent or an inspector. And it cannot replace the judgment call about whether a learner is actually ready to move to the next level.

What it does, when used carefully, is collapse the writing-out-of-your-head time that has always been the bottleneck in good language teaching. The pedagogical thinking stays with you. The typing, the patterning, the distractor-generating, the rubric-phrasing — that work can finally be delegated.

Where to Start This Week

If this is new to you, do not try to AI-ify your whole course at once. Pick one assessment you are dreading — a midterm, a placement test, a unit quiz — and write only that with AI this week. Use the prompts above as starting points and rewrite them in your own voice as you learn what your chatbot responds to. Within three or four cycles you will have a personal prompt library and a workflow that fits your subject, your level, and your school’s constraints.

The teachers who get the most out of AI are not the most technical ones. They are the ones who already know what good ESL assessment looks like and use AI to produce it faster. If you already have that knowledge, you have the harder half of the skill set. The prompting is the easy part — and now it is in your hands.