Fuzzy match two lists in Google Sheets with embeddings (AI_MATCH)
Two lists. One has "Acme Inc.", "Globex Corporation", "Soylent Corp". The other has "acme", "Globex Co.", "Soylent Industries". You need to match them up. =VLOOKUP can't — it does exact strings only. =FUZZY.MATCH doesn't exist in Sheets. So you'd reach for Python + rapidfuzz, or pay for a Clearbit-style enrichment service.
Or you use embeddings, which beat both. =AI_MATCH turns each cell into a high-dimensional vector, computes cosine similarity, and returns the best match. It handles synonyms, misspellings, abbreviations, and reordering — none of which traditional fuzzy matching catches.
The one-line version
=AI_MATCH(A2:A100, B2:B500)
A2:A100 is the column you want to match. B2:B500 is the lookup range. For each row in A, the formula returns the best match from B. Spills down vertically.
Why embeddings beat traditional fuzzy match
| Mismatch type | String fuzzy (Levenshtein) | Embeddings |
|---|---|---|
| Typo: "Globex" vs "Globx" | ✓ matches | ✓ matches |
| Suffix: "Acme" vs "Acme Inc." | ~ partial | ✓ matches |
| Reorder: "Cole, Tim" vs "Tim Cole" | ✗ misses | ✓ matches |
| Synonym: "car" vs "automobile" | ✗ misses | ✓ matches |
| Abbreviation: "NYC" vs "New York" | ✗ misses | ✓ matches |
Tuning the match
Confidence threshold
=AI_MATCH(A2:A100, B2:B500, 0.8)
Third argument is the minimum cosine similarity (0–1). Below the threshold, the match is treated as "no match" and the cell returns blank. Higher = stricter. 0.8 is a reasonable starting point for company names. Bump to 0.85+ for stricter use cases like dedup.
Return the similarity score
=AI_MATCH(A2:A100, B2:B500, 0.7, TRUE)
Fourth argument = TRUE returns two columns per row: the matched value + the similarity score. Useful for QA — sort by score to find borderline matches that need human review.
Multiple matches per row (top-K)
=AI_MATCH(A2:A100, B2:B500, 0.7, FALSE, 3)
Fifth argument = number of matches to return. Returns comma-separated. Useful when one row legitimately matches multiple lookup entries.
Real-world uses
Dedup a customer list after a merger
Two CRM exports. Same companies, different names ("Acme Inc." vs "Acme Corp."). =AI_MATCH with threshold 0.85 surfaces duplicates that =VLOOKUP would miss.
Map free-text categories to a canonical taxonomy
Survey responses with industries typed in by hand. Map each to your canonical NAICS list. Threshold 0.6 + return-score = TRUE so you can spot-check anything below 0.75.
Match job titles across resumes and JDs
"Senior Software Engineer" matches "Sr. SWE", "Sr. Software Developer", "Software Engineer III". Useful for ATS workflows that fuzzy-rank applicants.
Customer ↔ support-ticket linking
Tickets without customer IDs but with company names in the body. Match against your customer table.
Cost
=AI_MATCH uses OpenAI's text-embedding-3-small. One embedding per cell in both ranges; cosine similarity is computed locally.
- Per row embedded: ~$0.000002 (yes, 6 zeros)
- 1,000 search rows × 1,000 lookup rows = 2,000 embeddings = $0.004
- 10,000 × 10,000 = $0.04
This is one of the cheapest LLM operations in spreadsheets. Use liberally.
Limits
- 2,048 row combined cap. Search + lookup combined. For larger jobs, split into chunks or use the Bulk runner with batching.
- OpenAI key required. Embeddings on this scale only OpenAI offers cheaply right now. Anthropic / Gemini don't have a direct equivalent at the same price.
- Semantic vs literal. "Apple Inc." and "Apple the fruit company" score high together. Always sanity-check with your threshold before trusting bulk output.
Try AI_MATCH on your data
All 14 core formulas + AI_MATCH, from $49 lifetime.
Get gptsheet — from $49\n