Classify 10,000 rows with one AI formula
Classification is the most common AI-in-spreadsheets workload. It's also where most people overthink it — sentiment as positive/negative/neutral, support tickets as billing/technical/feature, leads as hot/warm/cold. The hard part isn't the model; it's writing a prompt that holds up across 10,000 unpredictable rows.
This post is the playbook: where classification beats hand-rolled rules, the prompt patterns that don't drift, and the cost math.
The one-line version
=AI_CLASSIFY(A2, "positive, negative, neutral")
That's it. A2 is the text. The second argument is the category list. The formula returns one category, exactly as written, no commentary. Drag down to 10,000 rows; come back when it's done.
When AI classification beats rules
- Free-text in, structured label out. "I love it!" →
positive. Regex won't handle paraphrasing; AI does. - Domain knowledge embedded. Classify SaaS plans as starter/growth/enterprise just by reading the customer's company description.
- Noisy / multilingual data. Tickets that arrive in 7 languages with typos. Rules can't span that; LLMs can.
- Iteration speed. Adding a new category means editing one string. With rules it means writing a new regex bundle and re-running the whole pipeline.
When rules do beat AI: exact string matching, high-volume hot paths where 50ms latency matters, regulated domains where the auditor wants deterministic logic. For the other 80% of business workloads, =AI_CLASSIFY wins.
Five prompt patterns that hold up
1. Single-category, clean list
=AI_CLASSIFY(A2, "billing, technical, feature_request, account, other")
The shortest pattern. Always include an "other" bucket — without it, the model will hallucinate borderline rows into the most-frequent category.
2. Single-category with instructions
=AI_CLASSIFY(A2,
"hot, warm, cold",
"Hot = explicitly asking for a demo or pricing. Warm = engaged but no buying signal. Cold = generic question or info-only.")
The third argument is a free-form instruction string. Use it to nail down boundary cases before the model has to guess.
3. Multi-tag (tagging instead of classification)
=AI_TAG(A2,
"bug, ui-issue, performance, mobile-only, dataloss, security",
"Apply up to 3 tags. Be conservative — don't tag if not clearly present.",
3)
Use AI_TAG when items can have multiple labels. Returns comma-separated.
4. Open-ended classification (no fixed list)
=AI(
"In 1-3 words, what category does this customer complaint fall into? Examples: shipping delay, broken product, billing dispute.",
A2)
For exploration. The output isn't constrained — you'll get drift across rows. Useful as a first pass before settling on a fixed taxonomy.
5. Hierarchical classification (two passes)
=AI_CLASSIFY(A2, "support, sales, billing, spam") // → B2
=IF(B2="support",
AI_CLASSIFY(A2, "bug, how-to, feature_request, account"),
"")
Two-stage classification cuts cost in half because the narrower second-tier prompt only runs for the relevant first-tier bucket.
The cost math
Average classification prompt: ~150 input tokens (category list + instructions + the row text), ~5 output tokens (one category label).
| Model | 1,000 rows | 10,000 rows | 100,000 rows |
|---|---|---|---|
| Gemini 2.5 Flash | $0.02 | $0.20 | $2.00 |
| GPT-4o mini | $0.04 | $0.40 | $4.00 |
| Claude Haiku 4.5 | $0.07 | $0.70 | $7.00 |
Classification is the cheapest LLM workload there is. Don't optimize prematurely — use Gemini Flash by default, swap up if accuracy lags.
Three real-world examples
Customer support triage
Pipe Zendesk ticket subjects + first message into column A. Classify into {billing, technical, feature_request, account, spam} in column B. Route to the right team via Zapier on column B's value. Setup time: 20 minutes. Replaces a 2-hour-per-day human triage step.
Lead scoring from form responses
"Tell us about your company" free-text field → classify as {enterprise, mid_market, smb, individual, junk}. Sort the leads list by that column. Sales talks to the top of the list first.
Survey theme extraction
Open-ended NPS responses → tag with {pricing, performance, feature_gap, support_quality, ux, integrations} via AI_TAG. Pivot table tells you what people actually complain about — without 4 hours of manual reading.
The thing nobody tells you about LLM classification
The model isn't picking from your list — it's predicting the next tokens. If your list overlaps semantically, it'll pick the wrong one consistently.
If you classify with "happy, satisfied, content", you'll get noise. Those aren't different categories — they're synonyms. Either collapse them or differentiate clearly: "happy_about_product, happy_about_price, happy_about_support, generic_positive".
Same trap: temporal labels ("this_week, this_month, this_quarter") where rows span multiple time windows. The model has no clock. Pre-compute the time bucket in another column and classify on the bucket.
Running 10,000 rows: the Bulk runner
Dragging =AI_CLASSIFY down 10,000 rows works, but Sheets will recalculate the entire column whenever anything in the workbook changes. That's expensive.
Open the sidebar → Bulk tab. Set input range A2:A10001, prompt template "Classify {value} as positive, negative, or neutral", output starting cell B2. Hit Run. Results are written as static values — no recalc storms.
You can stop and resume; the partial output stays. Ten thousand rows finish in ~25 minutes against Gemini Flash at default parallelism.
Combining classification with the chat agent
Once you have a classified column, the agent gets useful:
- "Of the rows tagged
technical, what are the most common keywords?" - "Plot the daily count of each category over the last 90 days."
- "Show me 10 example rows from each cluster so I can sanity-check the labels."
The agent reads your sheet, picks the right chart, and writes it inline. No formula gymnastics.
Try AI_CLASSIFY on your data
14 AI formulas, Bulk runner for high-volume jobs, chat agent for analysis. Lifetime license — from $49.
Get gptsheet — from $49\n