Claude vs ChatGPT vs Gemini for Google Sheets
Most blog posts compare LLMs in the abstract — context window size, MMLU score, system prompt steerability. None of that matters when the model is being called from cell B2 to clean up a product description.
We tested OpenAI gpt-4o-mini, Anthropic claude-haiku-4-5, and Google gemini-2.5-flash on the four tasks people actually run in spreadsheets: classification, extraction, translation, and edit. Here's what we learned.
TL;DR
| Task | Winner | Why |
|---|---|---|
| Classification (sentiment, intent, tagging) | Gemini 2.5 Flash | Tied on accuracy, ~⅓ the cost. |
| Extraction (entities, structured data) | GPT-4o mini | Best at strict JSON without hallucinating fields. |
| Translation | Claude Haiku 4.5 | Cleanest output — no romanization, no alternate suggestions glued on. |
| Edit / rewrite | Claude Haiku 4.5 | Most faithful to original tone and length constraints. |
The price grid (per 1M tokens, list prices)
| Model | Input | Output |
|---|---|---|
gemini-2.5-flash | $0.075 | $0.30 |
gpt-4o-mini | $0.15 | $0.60 |
claude-haiku-4-5 | $0.25 | $1.25 |
For a typical spreadsheet call (200 input tokens, 30 output tokens), the all-in cost per call is roughly:
- Gemini 2.5 Flash: $0.000024
- GPT-4o mini: $0.000048
- Claude Haiku 4.5: $0.000088
At 100,000 calls per month, that's $2.40 vs $4.80 vs $8.80. Choose accordingly.
Test 1: Classification (1,000 customer support tickets)
Formula: =AI_CLASSIFY(A2, "billing, technical, feature_request, other")
Hand-labeled ground truth on 1,000 tickets. Accuracy:
- Gemini 2.5 Flash — 94.1%
- GPT-4o mini — 93.8%
- Claude Haiku 4.5 — 93.5%
Statistical noise. All three are tied. Gemini wins on price by a 2× margin and a 3.5× margin vs Claude. For binary / 4-way classification at scale, default to Gemini Flash.
Test 2: Extraction (5 fields from messy text)
Formula: =AI_EXTRACT(A2, "person name, company, email, phone, role") over 500 LinkedIn-style bios.
Scored on whether all 5 fields parsed cleanly into a comma-separated row:
- GPT-4o mini — 91.4%
- Claude Haiku 4.5 — 89.2%
- Gemini 2.5 Flash — 82.6%
Gemini hallucinates field labels more often (returning "name: ..., company: ..." instead of bare values). It's correctable with a tighter system prompt — gptsheet's built-in extraction prompt does this — but OpenAI is still the easier win here.
Test 3: Translation (English → Spanish, 300 product descriptions)
Formula: =AI_TRANSLATE(A2, "Spanish")
Hand-graded on fluency + faithfulness, blind to model. Claude won on a frequent failure mode: the other two like to append "(also: alternative wording)" or romanize Asian-language outputs. Claude returns just the translation.
If you need translation at high volume and want zero post-processing, Claude. If you can spare a wrapper to strip "Translation: " prefixes, Gemini gets you there for ⅓ the cost.
Test 4: Edit (shorten product descriptions to 140 chars)
Formula: =AI_EDIT(A2, "shorten to under 140 characters, preserve key benefits")
Claude was the most disciplined. GPT-4o mini frequently went 145–160 chars. Gemini cut too aggressively and lost product features. For length-bounded rewrites, Claude is worth its premium.
Latency
Wall-clock time for a single cell call (median, US East to provider):
- Gemini 2.5 Flash — ~600 ms
- GPT-4o mini — ~800 ms
- Claude Haiku 4.5 — ~900 ms
None of these will time out a Sheets cell (30 s cap). At bulk-runner speeds (~10 req/sec parallelism), Gemini gets you 1M rows in ~28 hours vs ~37 hours for Claude. Material if you're running scheduled jobs; irrelevant if you're typing in a cell.
The cross-provider play (and why most add-ons can't do it)
The right answer for most teams isn't "pick one". It's:
- Default to Gemini for classification + tagging.
- Switch to GPT-4o-mini when you need strict JSON or vision.
- Switch to Claude for translation, editing, and long-form rewriting.
Credit-pack add-ons usually lock you to one provider. gptsheet is BYOK — you paste keys for all three and the formula picks the right one per use case. The Chat Agent even has a cross-provider model picker so you can switch mid-conversation.
The cheapest model that gets the job done changes every 6 months. Lock-in to one provider is a cost — sometimes a hidden one.
What we'd pick today (May 2026)
- Heavy classification / tagging workload? Gemini 2.5 Flash. Hard to beat on $/call.
- Extraction into structured tables? GPT-4o mini. Strict-JSON discipline matters here.
- Translation or rewriting at scale? Claude Haiku 4.5. The output is clean enough to ship.
- Vision-required tasks? GPT-4o (no comparable Sheets-friendly vision model from Anthropic or Google as of writing).
- Web search inside a formula? OpenAI Responses API +
web_search_preview. The other two require external tooling.
Read more: BYOK vs credit packs: the hidden cost of AI in spreadsheets.
Use all three from one sidebar
gptsheet supports OpenAI, Anthropic, and Gemini natively. Paste your keys once, switch providers per formula.
Get gptsheet — from $49