Fuzzy match two lists in Google Sheets with embeddings (AI_MATCH)

FeatureMay 15, 2026 · 6 min read

Two lists. One has "Acme Inc.", "Globex Corporation", "Soylent Corp". The other has "acme", "Globex Co.", "Soylent Industries". You need to match them up. =VLOOKUP can't — it does exact strings only. =FUZZY.MATCH doesn't exist in Sheets. So you'd reach for Python + rapidfuzz, or pay for a Clearbit-style enrichment service.

Or you use embeddings, which beat both. =AI_MATCH turns each cell into a high-dimensional vector, computes cosine similarity, and returns the best match. It handles synonyms, misspellings, abbreviations, and reordering — none of which traditional fuzzy matching catches.

The one-line version

=AI_MATCH(A2:A100, B2:B500)

A2:A100 is the column you want to match. B2:B500 is the lookup range. For each row in A, the formula returns the best match from B. Spills down vertically.

Why embeddings beat traditional fuzzy match

Mismatch type	String fuzzy (Levenshtein)	Embeddings
Typo: "Globex" vs "Globx"	✓ matches	✓ matches
Suffix: "Acme" vs "Acme Inc."	~ partial	✓ matches
Reorder: "Cole, Tim" vs "Tim Cole"	✗ misses	✓ matches
Synonym: "car" vs "automobile"	✗ misses	✓ matches
Abbreviation: "NYC" vs "New York"	✗ misses	✓ matches

Tuning the match

Confidence threshold

=AI_MATCH(A2:A100, B2:B500, 0.8)

Third argument is the minimum cosine similarity (0–1). Below the threshold, the match is treated as "no match" and the cell returns blank. Higher = stricter. 0.8 is a reasonable starting point for company names. Bump to 0.85+ for stricter use cases like dedup.

Return the similarity score

=AI_MATCH(A2:A100, B2:B500, 0.7, TRUE)

Fourth argument = TRUE returns two columns per row: the matched value + the similarity score. Useful for QA — sort by score to find borderline matches that need human review.

Multiple matches per row (top-K)

=AI_MATCH(A2:A100, B2:B500, 0.7, FALSE, 3)

Fifth argument = number of matches to return. Returns comma-separated. Useful when one row legitimately matches multiple lookup entries.

Real-world uses

Dedup a customer list after a merger

Two CRM exports. Same companies, different names ("Acme Inc." vs "Acme Corp."). =AI_MATCH with threshold 0.85 surfaces duplicates that =VLOOKUP would miss.

Map free-text categories to a canonical taxonomy

Survey responses with industries typed in by hand. Map each to your canonical NAICS list. Threshold 0.6 + return-score = TRUE so you can spot-check anything below 0.75.

Match job titles across resumes and JDs

"Senior Software Engineer" matches "Sr. SWE", "Sr. Software Developer", "Software Engineer III". Useful for ATS workflows that fuzzy-rank applicants.

Customer ↔ support-ticket linking

Tickets without customer IDs but with company names in the body. Match against your customer table.

Cost

=AI_MATCH uses OpenAI's text-embedding-3-small. One embedding per cell in both ranges; cosine similarity is computed locally.

Per row embedded: ~$0.000002 (yes, 6 zeros)
1,000 search rows × 1,000 lookup rows = 2,000 embeddings = $0.004
10,000 × 10,000 = $0.04

This is one of the cheapest LLM operations in spreadsheets. Use liberally.

Limits

2,048 row combined cap. Search + lookup combined. For larger jobs, split into chunks or use the Bulk runner with batching.
OpenAI key required. Embeddings on this scale only OpenAI offers cheaply right now. Anthropic / Gemini don't have a direct equivalent at the same price.
Semantic vs literal. "Apple Inc." and "Apple the fruit company" score high together. Always sanity-check with your threshold before trusting bulk output.

Try AI_MATCH on your data

All 14 core formulas + AI_MATCH, from $49 lifetime.

Get GPTSheet — from $49\n