Claude vs ChatGPT vs Gemini for Google Sheets

ComparisonMay 14, 2026 · 9 min read

Most blog posts compare LLMs in the abstract — context window size, MMLU score, system prompt steerability. None of that matters when the model is being called from cell B2 to clean up a product description.

We tested OpenAI gpt-4o-mini, Anthropic claude-haiku-4-5, and Google gemini-2.5-flash on the four tasks people actually run in spreadsheets: classification, extraction, translation, and edit. Here's what we learned.

TL;DR

Task	Winner	Why
Classification (sentiment, intent, tagging)	Gemini 2.5 Flash	Tied on accuracy, ~⅓ the cost.
Extraction (entities, structured data)	GPT-4o mini	Best at strict JSON without hallucinating fields.
Translation	Claude Haiku 4.5	Cleanest output — no romanization, no alternate suggestions glued on.
Edit / rewrite	Claude Haiku 4.5	Most faithful to original tone and length constraints.

The price grid (per 1M tokens, list prices)

Model	Input	Output
`gemini-2.5-flash`	$0.075	$0.30
`gpt-4o-mini`	$0.15	$0.60
`claude-haiku-4-5`	$0.25	$1.25

For a typical spreadsheet call (200 input tokens, 30 output tokens), the all-in cost per call is roughly:

Gemini 2.5 Flash: $0.000024
GPT-4o mini: $0.000048
Claude Haiku 4.5: $0.000088

At 100,000 calls per month, that's $2.40 vs $4.80 vs $8.80. Choose accordingly.

Test 1: Classification (1,000 customer support tickets)

Formula: =AI_CLASSIFY(A2, "billing, technical, feature_request, other")

Hand-labeled ground truth on 1,000 tickets. Accuracy:

Gemini 2.5 Flash — 94.1%
GPT-4o mini — 93.8%
Claude Haiku 4.5 — 93.5%

Statistical noise. All three are tied. Gemini wins on price by a 2× margin and a 3.5× margin vs Claude. For binary / 4-way classification at scale, default to Gemini Flash.

Test 2: Extraction (5 fields from messy text)

Formula: =AI_EXTRACT(A2, "person name, company, email, phone, role") over 500 LinkedIn-style bios.

Scored on whether all 5 fields parsed cleanly into a comma-separated row:

GPT-4o mini — 91.4%
Claude Haiku 4.5 — 89.2%
Gemini 2.5 Flash — 82.6%

Gemini hallucinates field labels more often (returning "name: ..., company: ..." instead of bare values). It's correctable with a tighter system prompt — GPTSheet's built-in extraction prompt does this — but OpenAI is still the easier win here.

Test 3: Translation (English → Spanish, 300 product descriptions)

Formula: =AI_TRANSLATE(A2, "Spanish")

Hand-graded on fluency + faithfulness, blind to model. Claude won on a frequent failure mode: the other two like to append "(also: alternative wording)" or romanize Asian-language outputs. Claude returns just the translation.

If you need translation at high volume and want zero post-processing, Claude. If you can spare a wrapper to strip "Translation: " prefixes, Gemini gets you there for ⅓ the cost.

Test 4: Edit (shorten product descriptions to 140 chars)

Formula: =AI_EDIT(A2, "shorten to under 140 characters, preserve key benefits")

Claude was the most disciplined. GPT-4o mini frequently went 145–160 chars. Gemini cut too aggressively and lost product features. For length-bounded rewrites, Claude is worth its premium.

Latency

Wall-clock time for a single cell call (median, US East to provider):

Gemini 2.5 Flash — ~600 ms
GPT-4o mini — ~800 ms
Claude Haiku 4.5 — ~900 ms

None of these will time out a Sheets cell (30 s cap). At bulk-runner speeds (~10 req/sec parallelism), Gemini gets you 1M rows in ~28 hours vs ~37 hours for Claude. Material if you're running scheduled jobs; irrelevant if you're typing in a cell.

The cross-provider play (and why most add-ons can't do it)

The right answer for most teams isn't "pick one". It's:

Default to Gemini for classification + tagging.
Switch to GPT-4o-mini when you need strict JSON or vision.
Switch to Claude for translation, editing, and long-form rewriting.

Credit-pack add-ons usually lock you to one provider. GPTSheet is BYOK — you paste keys for all three and the formula picks the right one per use case. The Chat Agent even has a cross-provider model picker so you can switch mid-conversation.

The cheapest model that gets the job done changes every 6 months. Lock-in to one provider is a cost — sometimes a hidden one.

What we'd pick today (May 2026)

Heavy classification / tagging workload? Gemini 2.5 Flash. Hard to beat on $/call.
Extraction into structured tables? GPT-4o mini. Strict-JSON discipline matters here.
Translation or rewriting at scale? Claude Haiku 4.5. The output is clean enough to ship.
Vision-required tasks? GPT-4o (no comparable Sheets-friendly vision model from Anthropic or Google as of writing).
Web search inside a formula? OpenAI Responses API + web_search_preview. The other two require external tooling.

Use all three from one sidebar

GPTSheet supports OpenAI, Anthropic, and Gemini natively. Paste your keys once, switch providers per formula.

Get GPTSheet — from $49