Help

Remove duplicates without having to look at them 1 by 1

Topic Labels: Automations Data Extensions
881 3
cancel
Showing results for 
Search instead for 
Did you mean: 
Samir_Patel
5 - Automation Enthusiast
5 - Automation Enthusiast

I have an airtable base that contains data from a dozen sources and lots of duplicates. The duplicates are easy to define based on phone number. With dedupe currently you have to look at each individual cluster of duplicates and select what you want to keep. That's fine for some scenarios but that should not be the only option. It would take hours for me to look at each cluster.

There should be a easy way to:
1- Select the fields where duplicates exist.
2- Select which record to keep (most completed fields, newest, oldest, etc).
3- Run it and be done.

I really don't want to have to export to excel and re-import because airtable can't handle such a simple function.

3 Replies 3

It's quite easy, and after 1-2 attempts and understanding how links/lookups working, it is can be made faster than writing this comment. Duplicate phone number field and turn it into a linked to a new table. In new table you have a list of phone numbers. add count field and filter >1 to look at duplicates. Then you can add lookups/rollups etc. For example, add rollup of date with MAX(values), then in first table add lookup of it, and add formula that shows if this lookup=date. Thus, all records where date is not maximum for that phone number, are duplicates that can be removed.

Thanks for the reply. You explained it however as someone who already knows how to do it. Could you break the steps down some more or point me to the relevant documentation that addresses this. 🙂

Alexey_Gusev
12 - Earth
12 - Earth

Okay, i'll try
Most are airtable basics so if you don't know basics, you should read them, and if you don't know some harder things, you can find them searching in help system.

I took a part of zipcodes table for example. it has primary_city with many duplicates and unique zip.
I added date field as

 

 

 

DATEADD('2020-01-01',zip,'days')

 

 

 

formula. So it is unique as well.

Alexey_Gusev_0-1705275965271.png

Let's take a 'newest' record for each 'primary_city' 
(NOTE: I missed a part with 'count' field, as it not needed here. you can add it yourself and filter to see a list of which 'primary_city' has duplicates and which is not)

So, duplicating primary city:

Alexey_Gusev_1-1705276112117.png

 

turning the duplicate into linked field

Alexey_Gusev_2-1705276149620.pngAlexey_Gusev_3-1705276169150.pngAlexey_Gusev_4-1705276193507.png

In new table(i called NEWTABLE) creating a rollup field (rollup is 'advanced lookup') of date field

Alexey_Gusev_5-1705276311055.png

now in main table create a lookup of this rollup field

Alexey_Gusev_6-1705276435414.png

i renamed it to maxdate

Alexey_Gusev_7-1705276488812.png

 

 

now create a formula where maxdate=date

Alexey_Gusev_8-1705276542039.png

 

 

Alexey_Gusev_9-1705276583134.png


And then group by this field

Alexey_Gusev_10-1705276633295.png


with collapse all

Alexey_Gusev_12-1705276909695.png

 



Here, the fragment of the table. Records under 1 are to be saved. Record under 0 are duplicates to be removed

Alexey_Gusev_11-1705276850942.png