Classify 10,000 rows with one AI formula

Use caseMay 14, 2026 · 7 min read

Classification is the most common AI-in-spreadsheets workload. It's also where most people overthink it — sentiment as positive/negative/neutral, support tickets as billing/technical/feature, leads as hot/warm/cold. The hard part isn't the model; it's writing a prompt that holds up across 10,000 unpredictable rows.

This post is the playbook: where classification beats hand-rolled rules, the prompt patterns that don't drift, and the cost math.

The one-line version

=AI_CLASSIFY(A2, "positive, negative, neutral")

That's it. A2 is the text. The second argument is the category list. The formula returns one category, exactly as written, no commentary. Drag down to 10,000 rows; come back when it's done.

When AI classification beats rules

Free-text in, structured label out. "I love it!" → positive. Regex won't handle paraphrasing; AI does.
Domain knowledge embedded. Classify SaaS plans as starter/growth/enterprise just by reading the customer's company description.
Noisy / multilingual data. Tickets that arrive in 7 languages with typos. Rules can't span that; LLMs can.
Iteration speed. Adding a new category means editing one string. With rules it means writing a new regex bundle and re-running the whole pipeline.

When rules do beat AI: exact string matching, high-volume hot paths where 50ms latency matters, regulated domains where the auditor wants deterministic logic. For the other 80% of business workloads, =AI_CLASSIFY wins.

Five prompt patterns that hold up

1. Single-category, clean list

=AI_CLASSIFY(A2, "billing, technical, feature_request, account, other")

The shortest pattern. Always include an "other" bucket — without it, the model will hallucinate borderline rows into the most-frequent category.

2. Single-category with instructions

=AI_CLASSIFY(A2,
  "hot, warm, cold",
  "Hot = explicitly asking for a demo or pricing. Warm = engaged but no buying signal. Cold = generic question or info-only.")

The third argument is a free-form instruction string. Use it to nail down boundary cases before the model has to guess.

3. Multi-tag (tagging instead of classification)

=AI_TAG(A2,
  "bug, ui-issue, performance, mobile-only, dataloss, security",
  "Apply up to 3 tags. Be conservative — don't tag if not clearly present.",
  3)

Use AI_TAG when items can have multiple labels. Returns comma-separated.

4. Open-ended classification (no fixed list)

=AI(
  "In 1-3 words, what category does this customer complaint fall into? Examples: shipping delay, broken product, billing dispute.",
  A2)

For exploration. The output isn't constrained — you'll get drift across rows. Useful as a first pass before settling on a fixed taxonomy.

5. Hierarchical classification (two passes)

=AI_CLASSIFY(A2, "support, sales, billing, spam")  // → B2
=IF(B2="support",
   AI_CLASSIFY(A2, "bug, how-to, feature_request, account"),
   "")

Two-stage classification cuts cost in half because the narrower second-tier prompt only runs for the relevant first-tier bucket.

The cost math

Average classification prompt: ~150 input tokens (category list + instructions + the row text), ~5 output tokens (one category label).

Model	1,000 rows	10,000 rows	100,000 rows
Gemini 2.5 Flash	$0.02	$0.20	$2.00
GPT-4o mini	$0.04	$0.40	$4.00
Claude Haiku 4.5	$0.07	$0.70	$7.00

Classification is the cheapest LLM workload there is. Don't optimize prematurely — use Gemini Flash by default, swap up if accuracy lags.

Three real-world examples

Customer support triage

Pipe Zendesk ticket subjects + first message into column A. Classify into {billing, technical, feature_request, account, spam} in column B. Route to the right team via Zapier on column B's value. Setup time: 20 minutes. Replaces a 2-hour-per-day human triage step.

Lead scoring from form responses

"Tell us about your company" free-text field → classify as {enterprise, mid_market, smb, individual, junk}. Sort the leads list by that column. Sales talks to the top of the list first.

Survey theme extraction

Open-ended NPS responses → tag with {pricing, performance, feature_gap, support_quality, ux, integrations} via AI_TAG. Pivot table tells you what people actually complain about — without 4 hours of manual reading.

The thing nobody tells you about LLM classification

The model isn't picking from your list — it's predicting the next tokens. If your list overlaps semantically, it'll pick the wrong one consistently.

If you classify with "happy, satisfied, content", you'll get noise. Those aren't different categories — they're synonyms. Either collapse them or differentiate clearly: "happy_about_product, happy_about_price, happy_about_support, generic_positive".

Same trap: temporal labels ("this_week, this_month, this_quarter") where rows span multiple time windows. The model has no clock. Pre-compute the time bucket in another column and classify on the bucket.

Running 10,000 rows: the Bulk runner

Dragging =AI_CLASSIFY down 10,000 rows works, but Sheets will recalculate the entire column whenever anything in the workbook changes. That's expensive.

Open the sidebar → Bulk tab. Set input range A2:A10001, prompt template "Classify {value} as positive, negative, or neutral", output starting cell B2. Hit Run. Results are written as static values — no recalc storms.

You can stop and resume; the partial output stays. Ten thousand rows finish in ~25 minutes against Gemini Flash at default parallelism.

Combining classification with the chat agent

Once you have a classified column, the agent gets useful:

"Of the rows tagged technical, what are the most common keywords?"
"Plot the daily count of each category over the last 90 days."
"Show me 10 example rows from each cluster so I can sanity-check the labels."

The agent reads your sheet, picks the right chart, and writes it inline. No formula gymnastics.

Try AI_CLASSIFY on your data

14 AI formulas, Bulk runner for high-volume jobs, chat agent for analysis. Lifetime license — from $49.

Get GPTSheet — from $49\n