Agentic Design Pattern: Reflection

Most LLM answers fail in a boring way: they're almost right. A missing constraint, a hand-wavy claim, a tone that's just off. Reflection is the fix. You make the model produce a draft, grade its own work against a rubric, and rewrite. That's the whole pattern.

The loop

01
Draft
Answer the question. Don't overthink it.
02
Critique
Grade the draft against a rubric. Be strict.
03
Revise
Fix the issues. Don't invent new ones.

01
Draft
Answer the question. Don't overthink it.
02
Critique
Grade the draft against a rubric. Be strict.
03
Revise
Fix the issues. Don't invent new ones.

Repeat until it passes — or you hit the cap

You're not asking the model to be perfect on the first try. You're asking it to edit itself like a human would.

When to reach for it

Use it

Output has hard constraints (format, policy, tone)
"Mostly right" is still a failure — customer emails, requirements docs, code
You can write a rubric for what "good" looks like

Skip it

Low-stakes work — brainstorming, casual writing
Latency or cost matters more than polish
You don't have a rubric (reflection without one is just vibes)

The three roles

You only need three. Same model can play all three — what changes is the system prompt.

Generator

Writes the first draft

System Prompt

You are a writer. Produce a single, complete draft of the user's request.

Rules:
- No clarifying questions, no hedging, no apologies.
- Write as if this is your final answer.
- Stay within the format the user asked for.

Critic

Scores the draft against a rubric, calls out problems

System Prompt

You are a strict editor. Score the draft against this rubric:
clarity, accuracy, completeness, tone.

For each criterion, return PASS or FAIL with one sentence of evidence.
"Mostly right" is FAIL. Be specific — name the line or claim that fails.

Return JSON only:
{
"verdict": "PASS" | "FAIL",
"issues": [{ "criterion": "...", "evidence": "..." }]
}

Reviser

Rewrites using only the critique

System Prompt

You are a rewriter. Apply the critic's issues to the draft.

Rules:
- Fix only what the critique names. Do not add new claims.
- Do not change sections the critique did not flag.
- Return the revised draft only — no commentary.

Want the revised draft critiqued too? Loop it: feed the Reviser's output back to the Critic, stop when the verdict is PASS, and cap the loop at 2–3 iterations so a stubborn Critic can't spin forever.

Failure modes

No rubric.
The Critic ships vibes instead of issues. Define "good" before you start the loop.
No max iterations.
A stubborn Critic spins forever. Cap it at 2–3 — that's almost always enough.
Critic too soft.
Everything passes on iteration 1? Your rubric is too easy. Tighten it.
Critic too harsh.
Nothing ever passes — you burn tokens and ship the last draft anyway. Add a "good enough" threshold.
Reviser hallucinates.
Tell it explicitly: fix the critique, don't add new claims.
Cost blowup.
Each pass is 2–3 LLM calls. Use this for write-once content, not real-time chat.

When to reach for it

The three roles

Failure modes

Dan’s AI Assistant

Hi! I’m Dan’s AI Assistant