Re: Remove duplicates without having to look at them 1 by 1

Samir_Patel · ‎Jan 13, 2024

I have an airtable base that contains data from a dozen sources and lots of duplicates. The duplicates are easy to define based on phone number. With dedupe currently you have to look at each individual cluster of duplicates and select what you want to keep. That's fine for some scenarios but that should not be the only option. It would take hours for me to look at each cluster.

There should be a easy way to:
1- Select the fields where duplicates exist.
2- Select which record to keep (most completed fields, newest, oldest, etc).
3- Run it and be done.

I really don't want to have to export to excel and re-import because airtable can't handle such a simple function.

Alexey_Gusev · ‎Jan 13, 2024

It's quite easy, and after 1-2 attempts and understanding how links/lookups working, it is can be made faster than writing this comment. Duplicate phone number field and turn it into a linked to a new table. In new table you have a list of phone numbers. add count field and filter >1 to look at duplicates. Then you can add lookups/rollups etc. For example, add rollup of date with MAX(values), then in first table add lookup of it, and add formula that shows if this lookup=date. Thus, all records where date is not maximum for that phone number, are duplicates that can be removed.

Samir_Patel · ‎Jan 13, 2024

Thanks for the reply. You explained it however as someone who already knows how to do it. Could you break the steps down some more or point me to the relevant documentation that addresses this. 🙂

Alexey_Gusev · ‎Jan 14, 2024

Okay, i'll try
Most are airtable basics so if you don't know basics, you should read them, and if you don't know some harder things, you can find them searching in help system.

I took a part of zipcodes table for example. it has primary_city with many duplicates and unique zip.
I added date field as

DATEADD('2020-01-01',zip,'days')

formula. So it is unique as well.

Let's take a 'newest' record for each 'primary_city'
(NOTE: I missed a part with 'count' field, as it not needed here. you can add it yourself and filter to see a list of which 'primary_city' has duplicates and which is not)

So, duplicating primary city:

turning the duplicate into linked field

In new table(i called NEWTABLE) creating a rollup field (rollup is 'advanced lookup') of date field

now in main table create a lookup of this rollup field

i renamed it to maxdate

now create a formula where maxdate=date

And then group by this field

with collapse all

Here, the fragment of the table. Records under 1 are to be saved. Record under 0 are duplicates to be removed