Skip to main content

I have a table holding some addresses across several fields (street,city, district etc). I would like an easy way to find possible duplicates in this table, given that the addresses entered are never exactly the same. 

So, “123 East Wilton Blvd, Westsea, Long Beach, Calif’ and ‘Beach Towers, 123 East Wilton Boulevard, Long Beach, Calif’ would suggest a match.

I was wondering whether I could use an AI function to do this? Ideally when a new record is entered, the user would be prompted with the question - ‘This looks similar’

Anyone done anything like this? I don’t have the luxury of a postcode lookup as the addresses are worldwide.

Hey ​@bitstreams_red,

I’m afraid you would not be able to achieve that natively with AI.
Assuming you are working with a form, AI will not fetch other records to compare their values -specially not before the form submission.

However, if addresses are public data then you could store addresses in a linked record field to a different table, and have those shown to everybody on a dropdown when inputting address. 
Important: Airtable native forms will not allow for the creation of new linked fields, only selecting from existing ones. However, you can easily solve that using Fillout forms (free plan, direct integration with Airtable). It will allow you to select OR create new addresses.

For last (and probably far from ideal), do remember that you can use Airtable’s Dedupe extension to dedupe records with exact, similar, or fuzzy values. Of course this would be manual and ex-post.

Also, if you split up your address into: Street Address Line 1, City, County, State, Zip Code, you will have better chances of identifying duplicates.

For last, robust AI automations could be build -but I’m not sure it is really worth it/needed.

Hope this helps! Feel free to grab a slot if you need any assistance setting this up :D

Mike, Consultant @ Automatic Nation


@bitstreams_red 

As ​@Mike_AutomaticN mentioned above, your best bet for this would be to standardize the addresses as they’re being entered into your system, so you can catch the duplicates on their way into Airtable.

To do this, you would need to use Fillout’s advanced forms for Airtable, which offers hundreds of features that Airtable’s forms don’t offer.

In your case, you would need to enable these 3 Fillout features:

Hope this helps!

If you have a budget and you’d like to hire the best Airtable consultant to help you with this or anything else that is Airtable-related, please feel free to contact me through my website: Airtable consultant — ScottWorld


I can share a script which might help you, but I think Dedupe Extension with Fuzzy match is better for your case. In your example, it counts ‘Blvd’ and ‘Boulevard’ as different words
Script ignores case, order of words and small words. Require text field to mark duplicates

 

const config=input.config({ title:'Advanced Deduper',items:einput.config.table('table',{label:'Select table'}),
input.config.view('myView',{label:'Select view', parentTable:'table'}),
input.config.field('CHECK',{label:'field to CHECK', parentTable:'table'}),
input.config.field('MARK',{label:'field to MARK', parentTable:'table'})]})
const {table,myView,CHECK,MARK}=config
const query=await myView.selectRecordsAsync({fields:cCHECK]})
const norm=r=>r.getCellValueAsString(CHECK).toLowerCase()
.split(' ').sort().filter(n=>n.length>2).map(m=>m.replace(/W/,'')).join('');
//remove line above for exact match. otherwise it ignores order, punctuation and 1-2 letter words
const valueMap=new Map(query.records.map(rec=>pnorm(rec),rec.id]))
const valueSet= ...valueMap.values()]
const others=query.recordIds.filter(id=>(!valueSet.includes(id)))
const othervals=new Set(others.map(id=>norm(query.getRecord(id))))
const dupes=n...valueMap.keys()].filter(val=>othervals.has(val))
const upd=query.records.filter(r=>dupes.includes(norm(r))).map(u=>({'id':u.id,'fields':{'MARK.name]:dupes.indexOf(norm(u)).toString()}}))
output.inspect(upd.map(u=>query.getRecord(u.id).getCellValue(CHECK))); output.text(`Found: ${upd.length} records`)
while (upd.length) await table.updateRecordsAsync(upd.splice(0,50))

 


Hi ​@bitstreams_red, I think you are potentially on the right track to think about using AI for this use case. With a bit more context into how your system works I can give you a more specific recommendation.

I’m going to assume that this is an internal tool and users are entering these addresses in the base layer or an interface (not thru a form). In this scenario you can create an automation where every time a new address is entered in the table (we’ll have to think about a good trigger here depending on your workflow),  you would find all addresses from the table. Then you would use an Airtable AI block to feed that list of addresses and their recordIDs and prompt the AI to check if any of that list of addresses match the address from the new record. (You can tweak your prompt here and give it examples like the one you shared). In this prompt, you should also specify the AI to return only a list of recordIDs of any of the matches (You can also have it return a JSON for more reliability; you’ll need a script for that). Then you can use an update block to paste that list of recordIDs in a linked record field linking itself(the same address table), which will give you the record of any matches. From here you can use lookups to see the address of the matches and make a decision on what to do next. 

 

A few points to consider:

  1. Scalability: If you have 10s of thousands of addresses, then this might not work as well. The find function in Airtable is limited in how many records it can find. Also the amount of data you will feed the AI prompt might be too much if you share 10s of thousands of records. You can fix this by filtering your list of all addresses to something that matches only the same city. 
  2. AI token usage: Depending on how many new addresses you will be entering each month, you could blow through all your AI credits. Just something to consider. You can always use a fuzzy dedupe script but since you are dealing with global addresses it will be tricky to get it to work as you expect. 
  3. Automation Trigger: Just be careful what you choose for your trigger here, make sure it only triggers the all the address fields have been fully entered. 

Let me know if this recommendation sounds like something you want to test out and I can give you more actionable details or even give you a sample base with the automation ready.


Reply