Agentic Design Pattern: Prompt Routing

A single prompt asked to "do everything" ends up mediocre at everything. Prompt Routing fixes that. A small, fast classifier reads the incoming request, decides what kind of work it is, and hands it to the prompt, model, or tool built for that job. The router stays dumb on purpose — its only job is to pick a lane.

The hop

01
Classify
A small model labels the request: which lane does it belong in?
02
Dispatch
Send to the specialist prompt, model, or tool for that label.
03
Respond
The specialist answers. The router never sees the output.

01
Classify
A small model labels the request: which lane does it belong in?
02
Dispatch
Send to the specialist prompt, model, or tool for that label.
03
Respond
The specialist answers. The router never sees the output.

You're not building one bigger brain. You're building a triage nurse in front of a hallway of specialists.

When to reach for it

Use it

Your agent handles clearly different request types (lookup vs. analysis vs. generation)
One prompt is getting bloated with "if the user asks X do Y, if Z do W" branches
Specialists outperform generalists — coding model for code, search for facts, cheap model for chat
Cost or latency matters enough that sending everything to the biggest model is wasteful

Skip it

All requests look the same — a router with one lane is just overhead
Your specialist set isn't actually specialized — same model, same prompt, different label
Latency budget can't afford the extra hop and a single capable model is good enough
You can't write crisp labels — a fuzzy taxonomy makes the router worse than no router

The three roles

The router and specialists are usually different models. The router is cheap and fast; the specialists are whatever they need to be.

Router

Classifies the request into one lane

System Prompt

You are a request classifier. Read the user's message and return exactly one label from this list:

- code: questions about writing, debugging, or explaining code
- research: factual questions that need up-to-date sources
- writing: drafting prose, emails, or long-form content
- chat: small talk, greetings, anything off-topic

Rules:
- Return JSON only: { "label": "...", "confidence": 0-1 }
- If no label fits, return { "label": "chat", "confidence": 0 }.
- Do not answer the question. Do not explain your choice.

Specialists

One prompt per lane, tuned for that lane only

System Prompt

You are a [code | research | writing | chat] specialist.

Rules:
- Assume the router already decided this request belongs to you.
- Do not second-guess the classification. Do not ask "did you mean...".
- Use the tools, sources, and tone appropriate for your lane only.
- If the request truly doesn't fit, return { "escalate": true, "reason": "..." } so the orchestrator can re-route.

Fallback

Catches low-confidence picks and escalations

System Prompt

You are a generalist. You only run when:
- The router's confidence is below 0.5, OR
- A specialist escalated with { "escalate": true }.

Rules:
- Answer directly using a capable general-purpose model.
- Log the request so the taxonomy can be improved later — every fallback is a sign the lanes are wrong, not a sign the system is broken.

Want to skip the router on obvious cases? Add a deterministic pre-filter — regex for URLs goes straight to research, code fences go straight to code — and only call the router when the pre-filter is unsure.

Failure modes

Fuzzy labels.
If "research" and "writing" overlap, the router flips between them and the user sees inconsistent behavior. Write labels that a stranger could apply.
Router too smart.
If you let the router reason about the answer, it starts solving the problem itself — and then doing it again in the specialist. Keep the router's prompt boring.
No confidence threshold.
A low-confidence pick is worse than no pick. Route < 0.5 confidence to the fallback, not the most-likely lane.
Specialists drift.
Over time, each specialist accretes "just in case" instructions until it's a generalist again. Audit the prompts quarterly and delete anything the lane doesn't need.
Missing escape hatch.
If the specialist can't escalate, a misrouted request gets answered confidently in the wrong lane. Always give specialists a way to say "not mine".
Latency tax.
Every request now pays for two model calls. Use a cheap, fast model for the router — and pre-filter the obvious cases deterministically.

When to reach for it

The three roles

Failure modes

Dan’s AI Assistant

Hi! I’m Dan’s AI Assistant