Real-world agents: person-to-account matching
Can AI agents solve one of the messiest CRM problems?
Here’s a fun pop quiz—see if you can spot the likely problem area in this outbound workflow I’ve been building.
Using the Cognism API, search at scheduled intervals for contacts matching a particular segment definition.
Check if the contact and account already exist in Salesforce, and if not, create them.
Add the contact to a Salesforce campaign for further follow-up.
Keen observers will have homed in on the account check as the stickiest part of this workflow.
How do we check whether the account already exists? Based on the name? The website?
Keep in mind that even if your CRM data is perfect (😂), the real-world topography is extremely complicated. Companies often have
multiple office locations
multiple valid website domains
variations in their names
complex corporate relationships and hierarchies
related entities in different countries
etc.
Seemingly simple questions—like, “does this account exist in our CRM?” or “which of these possible account matches is the best match for a given contact?”—are surprisingly hard for rules-based systems to answer well.
However, most human workers could answer these questions easily with a bit of training and common-sense.
This makes it a perfect agentic use case.
Why an agent makes sense here
You could simply search by domain to find an account, but it creates the risk of false negatives if your parameters are too narrow or of choosing the wrong option from multiple potential matches.
An agent, on the other hand, offers several benefits.
Intelligent queries
An agent can intelligently decompose the company name and website into components and search for possible variants.
For example, let’s say Cognism provides the following data about a company:
{
"companyName" : "Acme Insurance",
"website" : "acmeinsurance.com"
}
Now, accounts in Salesforce CRM might use any of the following variants:
Company Name
Acme Insurance Co.
The Acme Insurance Co.
Acme Mutual
The Acme Insurance Company
etc.
Domains:
acmeinsurance.ca
acmeinsurance.co.uk
acme.com/insurance
acme.fr
etc.
An exact match search would miss all of them, and even legacy data hygiene tools with “fuzzy matching” capabilities are quite limited in what they can do.
But an LLM has the semantic intelligence to separate the brand keyword from the generic category term—as a human would—and construct a better SOQL query.
(Name LIKE '%Acme Insurance%' OR Name LIKE '%Acme%') OR (Website LIKE '%acmeinsurance.com%' OR Website LIKE '%acme.com%' OR Website LIKE '%acme.%')
Evaluating matches
The query above casts a wider net, which is good, but it’s also likely to catch false positives and risks matching a contact to the wrong account.
Unlike rigid rules-based systems, the agent can evaluate each option set holistically, ruling out false positives and—in the case of multiple viable options—determining which match is best.
The agent considers things like:
Similarity of company name and website to the Cognism-provided data
Proximity of the company billing address to the contact’s location
Account hierarchy relationships
etc.
Supplementary web research
CRM data can be ambiguous, but often a quick visit to the company website is enough to offer clarity. For example, it quickly shows where a corporate entity is named differently than a brand name.
An agent can use a web browse tool and gain this additional context, just like a human would.
AI thrives on “adaptive” work
This account matching problem is a perfect example of what I call “adaptive work”.
These are tasks that require just enough discretion and judgement that you can’t fully automate them with rules-based workflows.
At the same time, it’s not work that requires much creativity, original thinking, or unique insight.
It’s just the unglamorous hygiene work that keeps things moving.
This is a sweet spot for agentic AI.
Revenue operations (my day-to-day field) is flush with examples of adaptive work, and CRM account hygiene has long been the bane of my existence.
So I was excited to take a crack at solving this with an agent.
Workflow structure
For context, here’s the layout of the workflow in Zapier, with some intermediate data prep steps cropped out.
This is a “worker” Zap that fetches a search result record from a queue (Zapier table), redeems the full data from Cognism, then performs the Salesforce check/create steps.
Step 15 is where we call the account-matching agent.
You’ll notice that this is not a full “agentic” solution but an agent step scoped to a very specific task (find the right account) inside a mostly deterministic workflow.
This maximizes predictability and reliability by using rules-based automation where we can and invoking the creativity / flexibility of an LLM only where we need it.
Agent setup
I’m using a native Zapier agent here. The Agents module of the Zapier platform is relatively new, but I chose it for this use case because:
It’s tightly integrated with Zapier and handles synchronous agent processing within a single Zap (rather than invoking a third-party agent and needing a separate Zap to receive the results)
It already has access to our CRM via the Zapier Salesforce connection
It supports structured outputs that can be easily re-used later in the Zap
I wanted to try it out!
So despite some limitations, it makes sense for this use case.
It’s also quite easy to set up and offers a handy inline pill UX for specifying tool use.
Here’s a visual of the configuration:
And here’s the full prompt:
You are an expert CRM operations professional who excels at analyzing CRM data and assessing questions of data hygiene.
You will be provided with a JSON file containing details about a person and the company they belong to. Company details are in the "account" section.
Given this input, follow the process below.
## STEP 1: PREPARE SEARCH TERMS
Prepare to search for the account by generating a precise set of search terms based on the input company's name and domain.
Follow this exact process:
1. Identify company name keywords: For example, from the company name (e.g., "Acme Insurance"), identify the primary brand name ("Acme") and the full name ("Acme Insurance").
2. Identify domain strings: From the domain (e.g., acmeinsurance.com), identify the full domain and the "root brand keyword plus dot" variant. The root brand keyword is the primary brand name. For example, from "acme.fr", the root keyword is "acme" and the variant is "acme.".
c. Build the Query: Construct a Salesforce SOQL WHERE clause using ONLY the terms generated above. Combine name and website searches with OR.
* Good Example: For "Acme Insurance" with domain acmeinsurance.com, the correct WHERE clause would be
(Name LIKE '%Acme Insurance%' OR Name LIKE '%Acme%') OR (Website LIKE '%acmeinsurance.com%' OR Website LIKE '%acme.%') * Do not invent acronyms or other strings. *
Do not search for single, generic words (like '%insurance%'), as this is too broad.
## STEP 2: EXECUTE SALESFORCE SEARCHES
Search Salesforce accounts using the SOQL where clause constructed in the past step along with the tool [Salesforce: Find Records by Query] to find potential matches.
##STEP 3: ANALYZE MATCHES
Analyze all potential account matches found in Salesforce, considering factors such as billing address, company name variations, parent/child account relationships, and other relevant data.
## STEP 4: FORMULATE A HYPOTHESIS
Formulate a hypothesis regarding the likelihood of a match or the ambiguity of the data.
If there are multiple potential matches, consider factors such as alignment between the contact's location and the account billing address, or other similar details.
Don't apply rigid rules, but use human-like intuition and common sense to determine which account they should be matched with.
If the evidence provides high confidence (95%+) for a MATCH or NO_MATCH, proceed directly to ## STEP 7: FINAL OUTPUT.
If the evidence is ambiguous or insufficient, your goal is to gather more data. Proceed to Step 5.
## STEP 5: SEARCH FOR ADDITIONAL DATA (IF NEEDED)
If the data is ambiguous or insufficient to make a high-confidence decision (at least 95% confidence for a match), perform additional searches to gain more clarity.
- If there are multiple matches that seem equally valid: check in Salesforce which accounts have parent/child relationships in order to determine the most likely match.
- If there are no matches at all: visit the company's website using [Visit website] to gather additional information. Focus on the text in the 'About Us', 'Contact', or 'Legal' sections of the website. Look for phrases like 'is a subsidiary of', 'part of the... family of companies', or email or copyright notices that mention a different corporate name.
## STEP 6: RE-EVALUATE
After gathering new data, return to Step 4 to re-evaluate your hypothesis with the new evidence.
## STEP 7: FINAL OUTPUT
Your single and only final output for this entire task are the fields below. Do not write any other text, narration, or explanation. You will conclude all work by providing the field output and nothing else.
### REQUIRED FIELDS
- decision: MATCH | NO_MATCH | NEEDS_REVIEW
- match_account_id: string | null
- confidence_score: number
- reasoning: The best match is [company name] because [a concise justification for your decision].
### EXAMPLE: SUCCESSFUL OUTPUT
- decision: MATCH
- match_account_id: 00123000012A3bYAAM
- confidence_score: 99
- reasoning: The best match is Acme Insurance because the account's website field contains a domain that matches the email address of the contact. The contact is based in the same state as the account's billing address.
Results
The output overall has been strong so far, which is very encouraging.
All matches have been justifiable
It has prioritized multiple potential matches in a satisfying way
I’m not aware of any false negatives (where it failed to find a suitable match that a human would have found with reasonable effort)
Here are some notable examples.
Strong prioritization ✅
In this test search for Fidelity International, the agent successfully identified a broad group of matches based on the brand name, eliminated improbable ones, and successfully chose the best of the valid options based on the contact’s proximity.
You can see here a nice thing about Zapier’s agent interface: the ability to chat with the agent responsible for a specific run to ask follow-up questions and get it to explain its thinking or prior steps. I’ve found this sort of dialogue critical to improving agent reliability.
Using the web to resolve ambiguity ✅
In this example, the agent failed to find any matches in our CRM for “PetSafe Brands” and used web search to find an alternate corporate name, “Radio Systems Corporation,” which DID match a CRM account.
My heart sank when I first saw this result, as it seemed like a case of hallucination.
But sure enough, when I visited the website, I saw “Radio Systems Corporation” in the footer.
The agent nailed it.
Issues and Potential Improvements
I want to be sure I also highlight issues and potential improvements when discussing things I’ve built.
If I didn’t, it would perpetuate the AI hype mirage and could be discouraging for anyone else who tries a similar approach and runs into problems.
Technical unreliability
Issue:
About 10-20% of the time, the agent simply got “stuck” during its SOQL searches and timed out. I believe this is an underlying issue in the Zapier control flow, not an intrinsic limit of agents in general.
Unfortunately, in these cases the logs do not preserve the original query, making it difficult to diagnose what happened.
When asking the agent directly, it simply referred to vague “technical errors with Salesforce” but could not provide insight.
My suspicion is that the SOQL query was too broad.
Possible solution:
We could decompose the task even further
An AI step prepares the SOQL query based on the company name and domain
A standard Zap action performs the SOQL query
The results are passed in to the agent along with the initial context.
The agent evaluates and does additional SOQL queries only if needed
This would avoid the black-box timeout error (at least any SOQL errors would be transparent in the Zap step) and create more control.
We could also add a retry mechanism that scans the queue for records stuck in “processing” and either retries the account match or tries a simpler matching mechanism based on domain only.
Hallucinated rationale
Issue:
Although the agent has so far selected the most reasonable account, it’s shown a tendency to stretch the truth to make matches seem stronger than they are.
In one example, it stated that “the billing address is in the same country as the contact’s location” when this was incorrect.
When questioning the agent on the discrepancy, it acknowledged the error and seemed to suggest it was influenced by the example in the prompt.
Possible solution:
Unsure how to fully mitigate this one, although strengthening the guard-rails in the prompt and stressing that the example is merely illustrative and not a template to follow could help.
Future Improvements
I think this shows great promise, and I’d love to keep developing the concept into a full-fledged account hygiene solution that supports de-duplication, parent-child account matches, M&A / bankruptcy hygiene, and more.
This could be run on-demand or via scheduled batch.
One feature I tried but removed for simplicity in the V1 is ZoomInfo tool integration, which would provide the agent with a structured source of truth for corporate hierarchies, M&A events, and more.
This is great! A problem that is such a great fit for AI.
What do you think about CRM clean up? Could you do that without blowing up API calls?