I definitely think an approach like this has application for a broader mass cleanup. But you’re right — it wouldn’t be scalable to query one record at a time.
Ideas I’ve been playing with:
- extract the full DB to a staging table with just the relevant metadata and have the LLM iterate over that, forming a list of approved matches, then running an upsert or merge batch depending on the operation
- using a tool like ringlead with broad match parameters to create the match sets as a first cut, export, and then have the LLM evaluate and approve each one
- same idea as above but with native Salesforce duplicate sets, which have some fuzzy matching filters. It seems to do a decent job when exposing duplicates / matches in the UI and the duplicate set records are exportable. The LLM can review and approve
This “extract the full DB to a staging table with just the relevant metadata and have the LLM iterate over that, forming a list of approved matches, then running an upsert or merge batch depending on the operation “
Is exactly what I’m thinking to but I’m still thinking through scope. Like how much data do I actually need from Salesforce?
The good news is storage is much cheaper in Google cloud than Salesforce.
The other thing I’m thinking about it scale and cost. You’re Zapier flow looks task heavy, do you worry about burning through tasks?
For very well defined batch operation like this I wouldn’t use Zapier. You’re right it’s too task heavy and would be expensive. Assuming one is not a developer, what about using Cursor or Claude Code to build your own control flow that calls the LLM and orchestrates that work. If anything the Zap could just do the extract to cheap cloud storage, invoke the custom data hygiene agent, then batch load the output back to CRM. What do you think?
The flow in my post is definitely more task heavy - I could probably consolidate some code steps. But it is a lower volume signal and to some extent we are paying for fast time to launch and ability to prove value. I’d definitely want to find a cheaper way to build that pipeline at bigger scale.
Here’s where my head is at. I want to do this all in the same ecosystem to control for security especially with PII, so I’ve given myself the constraint of only working with Google Cloud. It’s not perfect but I really don’t want to have to think about data security.
I think this concept can still work in isolation. I’m still learning Vertex AI but it’s promising and Big Query has built in data transfers from Salesforce.
For other data sources like HubSpot this could work too but I’m not sure of data transfers.
This is great! A problem that is such a great fit for AI.
What do you think about CRM clean up? Could you do that without blowing up API calls?
Thanks Eric and great question!
I definitely think an approach like this has application for a broader mass cleanup. But you’re right — it wouldn’t be scalable to query one record at a time.
Ideas I’ve been playing with:
- extract the full DB to a staging table with just the relevant metadata and have the LLM iterate over that, forming a list of approved matches, then running an upsert or merge batch depending on the operation
- using a tool like ringlead with broad match parameters to create the match sets as a first cut, export, and then have the LLM evaluate and approve each one
- same idea as above but with native Salesforce duplicate sets, which have some fuzzy matching filters. It seems to do a decent job when exposing duplicates / matches in the UI and the duplicate set records are exportable. The LLM can review and approve
This “extract the full DB to a staging table with just the relevant metadata and have the LLM iterate over that, forming a list of approved matches, then running an upsert or merge batch depending on the operation “
Is exactly what I’m thinking to but I’m still thinking through scope. Like how much data do I actually need from Salesforce?
The good news is storage is much cheaper in Google cloud than Salesforce.
The other thing I’m thinking about it scale and cost. You’re Zapier flow looks task heavy, do you worry about burning through tasks?
Love it. Let’s think this through.
For very well defined batch operation like this I wouldn’t use Zapier. You’re right it’s too task heavy and would be expensive. Assuming one is not a developer, what about using Cursor or Claude Code to build your own control flow that calls the LLM and orchestrates that work. If anything the Zap could just do the extract to cheap cloud storage, invoke the custom data hygiene agent, then batch load the output back to CRM. What do you think?
The flow in my post is definitely more task heavy - I could probably consolidate some code steps. But it is a lower volume signal and to some extent we are paying for fast time to launch and ability to prove value. I’d definitely want to find a cheaper way to build that pipeline at bigger scale.
Here’s where my head is at. I want to do this all in the same ecosystem to control for security especially with PII, so I’ve given myself the constraint of only working with Google Cloud. It’s not perfect but I really don’t want to have to think about data security.
I think this concept can still work in isolation. I’m still learning Vertex AI but it’s promising and Big Query has built in data transfers from Salesforce.
For other data sources like HubSpot this could work too but I’m not sure of data transfers.
Sounds like a totally viable solution. I'd love to know how it goes, and if you're up for it, would be a great guest post.