Remove duplicates without having to look at them 1 by 1

1 year ago
January 13, 2024
3 replies
517 views

+4

Samir_Patel
Participating Frequently
6 replies

I have an airtable base that contains data from a dozen sources and lots of duplicates. The duplicates are easy to define based on phone number. With dedupe currently you have to look at each individual cluster of duplicates and select what you want to keep. That's fine for some scenarios but that should not be the only option. It would take hours for me to look at each cluster.

There should be a easy way to:
1- Select the fields where duplicates exist.
2- Select which record to keep (most completed fields, newest, oldest, etc).
3- Run it and be done.

I really don't want to have to export to excel and re-import because airtable can't handle such a simple function.

+23

Alexey_Gusev
Brainy
1152 replies
1 year ago
January 13, 2024

It's quite easy, and after 1-2 attempts and understanding how links/lookups working, it is can be made faster than writing this comment. Duplicate phone number field and turn it into a linked to a new table. In new table you have a list of phone numbers. add count field and filter >1 to look at duplicates. Then you can add lookups/rollups etc. For example, add rollup of date with MAX(values), then in first table add lookup of it, and add formula that shows if this lookup=date. Thus, all records where date is not maximum for that phone number, are duplicates that can be removed.

A

Z

S

+4

Samir_Patel
Author
Participating Frequently
6 replies
1 year ago
January 13, 2024

Alexey_Gusev wrote:

It's quite easy, and after 1-2 attempts and understanding how links/lookups working, it is can be made faster than writing this comment. Duplicate phone number field and turn it into a linked to a new table. In new table you have a list of phone numbers. add count field and filter >1 to look at duplicates. Then you can add lookups/rollups etc. For example, add rollup of date with MAX(values), then in first table add lookup of it, and add formula that shows if this lookup=date. Thus, all records where date is not maximum for that phone number, are duplicates that can be removed.

Thanks for the reply. You explained it however as someone who already knows how to do it. Could you break the steps down some more or point me to the relevant documentation that addresses this. 🙂

M

+23

Alexey_Gusev
Brainy
1152 replies
1 year ago
January 15, 2024

Okay, i'll try
Most are airtable basics so if you don't know basics, you should read them, and if you don't know some harder things, you can find them searching in help system.

I took a part of zipcodes table for example. it has primary_city with many duplicates and unique zip.
I added date field as

DATEADD('2020-01-01',zip,'days')

formula. So it is unique as well.

Let's take a 'newest' record for each 'primary_city'
(NOTE: I missed a part with 'count' field, as it not needed here. you can add it yourself and filter to see a list of which 'primary_city' has duplicates and which is not)

So, duplicating primary city: