Extract entities from text in Google Sheets (with AI)

Use caseMay 15, 2026 · 7 min read

Half the data you receive is structured wrong. LinkedIn bios mixed with company names. Customer emails buried inside ticket bodies. Phone numbers wrapped in marketing copy. Excel's =REGEXEXTRACT handles the easy 30%. The other 70% is where AI extraction earns its keep.

The one-line version

=AI_EXTRACT(A2, "person name, company, email, phone")

A2 is your messy source cell. The second argument is what you want extracted, in plain English. Returns comma-separated values that you can split with Sheets' built-in =SPLIT if you want them in separate columns.

Patterns that work

Single entity, single column

=AI_EXTRACT(A2, "company name")

Lead-routing classic: stripped-down company name from a free-text "About us" field.

Multiple entities, one row

=AI_EXTRACT(A2, "name, role, company, location")

Returns "Jane Smith, CTO, Acme Inc., San Francisco". Wrap in =SPLIT(AI_EXTRACT(...), ", ") to fan into columns.

Multi-value extraction

=AI_EXTRACT(A2, "all email addresses")

Returns a comma-separated list — useful when one row might contain 0, 1, or many of the entity. A row with three emails becomes "alice@x.com, bob@x.com, carol@x.com".

Domain-specific entities

=AI_EXTRACT(A2, "SaaS product names mentioned")
=AI_EXTRACT(A2, "medical conditions explicitly named")
=AI_EXTRACT(A2, "legal case citations in Bluebook format")

This is where AI dramatically outclasses regex. "All SaaS product names" is impossible to write as a pattern. The model knows what a SaaS product looks like.

Real-world examples

Lead enrichment from contact form text

Free-text "Tell us about your need" field → extract "role, team size, current tool". Now sortable, filterable, segmentable.

Support ticket triage signals

Long support ticket body → extract "customer name, account ID, error message, impacted feature". Pre-populates your help-desk form.

Resume parsing

Pasted resume text → extract "current title, current company, years of experience, top 3 skills". Build a hiring shortlist sheet in 20 minutes.

Invoice line items

OCR'd invoice text → extract "vendor, invoice number, due date, total amount". Feeds into your AP workflow.

When extraction fails (and how to fix it)

Hallucinated fields

Ask for a phone number from text that has none, and some models will invent one. Mitigation: add the suffix ". If a field is missing, return an empty value for that field."

Field-order drift

You ask for "name, email, company" and one row returns them as "company, email, name". Fix: be explicit about output format. Use AI_TABLE instead when shape matters:

=AI_TABLE("Extract structured info", "name, email, company", A2)

Inconsistent formatting

Email returned as "alice@x.com (preferred)" or with surrounding punctuation. Mitigation: add "Return raw values with no extra punctuation or labels."

Cost

Average extraction call: ~300 input tokens + ~50 output tokens. At GPT-4o-mini list prices:

For extraction specifically, we recommend OpenAI's gpt-4o-mini over Gemini Flash. Why: it's the most disciplined about strict-JSON output.

Stacking with other formulas

Extraction's real power is as a preprocessing step. Pipe its output into the next AI formula:

# A2 = long support ticket
# B2 = extracted entities
=AI_EXTRACT(A2, "product name, issue summary")

# C2 = classify the extracted issue
=AI_CLASSIFY(B2, "bug, how-to, feature_request, billing")

# D2 = translate the summary for the EU team
=AI_TRANSLATE(B2, "German")

Each formula is cheap. The composed pipeline is what's powerful.

Try AI_EXTRACT on your data

Lifetime license, BYOK, from $49.

Get gptsheet — from $49\n