Jan 13, 2024 10:18 AM
I have an airtable base that contains data from a dozen sources and lots of duplicates. The duplicates are easy to define based on phone number. With dedupe currently you have to look at each individual cluster of duplicates and select what you want to keep. That's fine for some scenarios but that should not be the only option. It would take hours for me to look at each cluster.
There should be a easy way to:
1- Select the fields where duplicates exist.
2- Select which record to keep (most completed fields, newest, oldest, etc).
3- Run it and be done.
I really don't want to have to export to excel and re-import because airtable can't handle such a simple function.
Jan 13, 2024 10:38 AM
It's quite easy, and after 1-2 attempts and understanding how links/lookups working, it is can be made faster than writing this comment. Duplicate phone number field and turn it into a linked to a new table. In new table you have a list of phone numbers. add count field and filter >1 to look at duplicates. Then you can add lookups/rollups etc. For example, add rollup of date with MAX(values), then in first table add lookup of it, and add formula that shows if this lookup=date. Thus, all records where date is not maximum for that phone number, are duplicates that can be removed.
Jan 13, 2024 01:31 PM
Thanks for the reply. You explained it however as someone who already knows how to do it. Could you break the steps down some more or point me to the relevant documentation that addresses this. 🙂
Jan 14, 2024 04:10 PM - edited Jan 14, 2024 04:21 PM
Okay, i'll try
Most are airtable basics so if you don't know basics, you should read them, and if you don't know some harder things, you can find them searching in help system.
I took a part of zipcodes table for example. it has primary_city with many duplicates and unique zip.
I added date field as
DATEADD('2020-01-01',zip,'days')
formula. So it is unique as well.
Let's take a 'newest' record for each 'primary_city'
(NOTE: I missed a part with 'count' field, as it not needed here. you can add it yourself and filter to see a list of which 'primary_city' has duplicates and which is not)
So, duplicating primary city:
turning the duplicate into linked field
In new table(i called NEWTABLE) creating a rollup field (rollup is 'advanced lookup') of date field
now in main table create a lookup of this rollup field
i renamed it to maxdate
now create a formula where maxdate=date
And then group by this field
with collapse all
Here, the fragment of the table. Records under 1 are to be saved. Record under 0 are duplicates to be removed