I would love to have a feature that would search a given table by a field or set of fields to find and surface duplicate rows. It would then be great to be able to take a pre-defined action on duplicated rows such as delete, or merge. Dealing with dupes is always a pain with large data sets or importing new data into a set where some records may already exist. A way to deal with dupes in a somewhat automated way would be really helpful.
OMG, please. In an agile workplace leveraging ad-hoc tools, Airtable evangelists are often asked to ingest and “Airtable” (upgrade) spreadsheets of dubious quality. The time it takes to cleanse such data far exceeds the the effort to make the imported work truly relational (and useful). For now, we rely on Excel/Numbers to rework files before uploading them to Airtable. This isn’t fast.
I have 5767 records that need the duplicates deleted, this feature would save me so much time, please add it ASAP
Wait, we can’t find and delete duplicate rows… Well, Sticking with google sheets for now then. I have thousands of data i would like to import and some maybe duplicates. I can’t imagine a excel like tool that doesn’t have this feature.
Chiming in here as a request to add duplicate detection rules, ability to expose UIDs to make importing related records easier, and also a simple merge feature would be a tremendous help.
Thanks for the awesome work. I just discovered Airtable and couldn’t be more excited about it!
-Kenny
Would love duplicated row detection too! Let’s do it Airtable team
Just thought I would add a suggestion…
You could find duplicate rows by using a formula field to concatenate the fields you want to match by and then Grouping a view based upon the formula field. You could then simply look through the data and find any group with more than one record in and act accordingly. Data reduplication is very often NOT an exact science - you may have to manually edit data from one record to another for example - so this is quite a reasonable approach.
If only you could filter by groups with a record count of > 1!
Julian
Not sure how many people will need to say it before it gets prioritized, but I’m adding my voice to this chorus.
Another voice. I need this to detect duplicate contacts, similar to the way Google Contacts works.
As a rule, we don’t commit to specific timelines. But the more details we get from you all about how you’d want to use a feature, or how you’d want it to be implemented, the easier it is for our team to work on those features and make them the best they can be. Your feedback is valuable and appreciated!
I’m using a calculated field for my primary key, concatenating a few other fields. I would have thought that Airtable would throw an error, or at least not store duplicate records when the primary key is the same.
This is totally opposite the way other RDBMSs work. The primary has to be unique.
[copy and pasted from the other thread - Row Duplication Detection]
Hi all! I found a workaround to help find and delete duplicate records. As is, this only works for deleting perfect duplicates, it does NOT merge records together into a single record.
Imagine a table with 3 records: John Smith, John Smith, and Abraham Lincoln.
First, create an auto number field.
Next, ensure you have a field that contains the same text for each duplicate record (“John Smith”) but is unique for records that do are not duplicates (“Abraham Lincoln”).
Next, turn this field into a link to a new table (I called it uniques). Now, you have a record for “Abraham Lincoln” that has 1 linked record (no duplicates found), and you have a record for “John Smith” with 2 linked records (duplicates found).
In the “uniques” table, create a rollup of the autonumber field, using the “MIN” formula. So, for John Smith, this field will now contain 1, and for Abraham Lincoln it will show 3.
Back in the main table, create a lookup field on the link to “uniques” and lookup for the rollup field you created.
Now, create a formula, which will look something like this:
IF(LOOKUP_FIELD<AUTONUMBER_FIELD,“Delete”,“Do Not Delete”)
Now, the first John Smith record will show “Do not Delete”, but the second John Smith will show “Delete”.
Wow! Thanks for the workaround but it seems very convoluted. Should be baked into the implementation!
I’m using a calculated field for my primary key, concatenating a few other fields. I would have thought that Airtable would throw an error, or at least not store duplicate records when the primary key is the same.
This is totally opposite the way other RDBMSs work. The primary has to be unique.
In Airtable, there’s no such thing as using a field for “your” primary key. Airtable uses an internal field that is not visible through the user interface but is accessible through the API.
The first field at the left in Grid View is not the primary key. It’s what shows up in a linked record and it can be changed without affecting the link. There’s no need for it to be unique.
In Airtable, there’s no such thing as using a field for “your” primary key. Airtable uses an internal field that is not visible through the user interface but is accessible through the API.
The first field at the left in Grid View is not the primary key. It’s what shows up in a linked record and it can be changed without affecting the link. There’s no need for it to be unique.
Thanks for the explanation. I guess I mentally glossed over the difference between the primary field and primary key.
I’m looking for a way to delete and/or merge duplicates as well!
I see people are still suffering because of this, so I want to bring up this suggestion which didn’t seem to get much attention and it will absolutely help those who really need to find dat least some kind of] duplicates with some ease.
As with lots of other issues this ia longly requested one.
I’ll add my +1 also by making a suggestion: How about allowing to filter grouped results.
Example: If I group by an email field in a table of 5000 entries with 10 duplicates, i can find duplicate email entries but I have to scroll through 4990 grouped rows. If I would be able to filter out grouped results with a result count <= 1 this could be much easier.
This feature would also help in other situations.
Are any of you using duplicate preventing processes or procedures you’d recommend beyond - “Hey, check the DB before you add a new record?” I am setting up a system and have yet to release it to the team. When I do, I would like to specify procedural guidelines for quality data curation. Thank you.
I’ve just made available on Google Drive a brief write-up of two approaches to duplication prevention within Airtable, one for de-duplication of an existing base and a second for ongoing duplicate detection. It’s an excerpt (in PDF format) from a larger work-in-progress: a collection of Airtable tips and tricks gleaned from postings in the Support forum, premised on the belief that important and useful information often goes overlooked because it lies buried in a confusingly or uninformatively titled thread. (Case in point: These two solutions were originally responses to Find duplicates with Zapier – even though they are built entirely with garden-variety Airtable.) I’ve created a 1,000-recorda1] demonstration base for each approach, and the guide provides step-by-step instructions on how to retrofit either functionality into an existing base.
The de-duplication routine is intended to be run one or fewer times against an existing base only; it is almost painfully resource-hungry, and I offer it only reluctantly. Fortunately, once the base has been purged of duplicates, the worst-offending of the data structures can be eliminated and the rest modified or enhanced to support duplicate detection as part of ongoing base update.
Briefly, for both solutions the user defines a matchStr
assembled from fields in the main table – that is, the table where duplicate records should be identified. matchStr
should be designed to place more importance on the more-distinguishing characteristics of the record while ignoring those liable to be common among a number of records. For example, in the demo bases, matchStr
contains the formula
UPPER(GivenName&SurName)
hoping to prevent false negatives caused by, say, the failure to include MiddleInitial
on one record.
The systems alert when more than one record contains identical matchStr
s; with the de-duplication code, any record containing matching matchStr
s is tagged as a duplicate, as well. (While the duplicate detection routine toggles a single, system-wide flag, the expectation is the most recently entered record is one of a pair.) In both instances, the records are flagged only as possible duplicates, with final determination left to the user. Should he or she decide the records represent a false positive – for instance, a misperceived match between John Q. Public
and John S. Public
– the user selects the DupeOK
field of the current record. This causes RECORD_ID(
) to become appended to matchStr
, making it unique and resetting the flags.
Do these two sample implementations address all of the requirements and concerns voiced elsewhere in this thread? Of course not – but, then, they never intended to. Rather, these were attempts to demonstrate just how much of needed functionality could be provided using only the current release of Airtable.
. ----------------------
- Actually, a 995-record base, having backed off from 1,000 to avoid the limit on free accounts.
Hi there, apologies if this is written about elsewhere, I searched and didn’t find many relevant posts.
We have two duplicate records created from Acuity via Zapier. One Randall H. and one Randy H. Same person. I changed the name of the Randy in airtable to Randall, but they’re still different records and I’d like to merge all this student’s lessons to be under the same single student. Can this be done? Thanks!
You can “update” an Airtable record using zapier (Acuity & Airtable), which would replace the old record information. This is what you’d want to do. But, there is no way currently to merge information, without building a SUPER complex zap or some custom code.
I would love to do contact merge info as well, just not there yet.
You can “update” an Airtable record using zapier (Acuity & Airtable), which would replace the old record information. This is what you’d want to do. But, there is no way currently to merge information, without building a SUPER complex zap or some custom code.
I would love to do contact merge info as well, just not there yet.
Have a public/example base showing how you’d like to merge them?
In our case we want to prevent duplicates from getting created in the primary field. The UI should detect a duplicate as it is typed in and prevent the row from being saved. If trying to create a row via the API and the primary field value is a duplicate, then an error response should be returned. Hovering over the lock in the first column header states that the value is intended as “short unique representation” of the record. So it seems strange that this uniqueness would not be enforced automatically, or at least be able to toggle a setting.
I’ve just made available on Google Drive a brief write-up of two approaches to duplication prevention within Airtable, one for de-duplication of an existing base and a second for ongoing duplicate detection. It’s an excerpt (in PDF format) from a larger work-in-progress: a collection of Airtable tips and tricks gleaned from postings in the Support forum, premised on the belief that important and useful information often goes overlooked because it lies buried in a confusingly or uninformatively titled thread. (Case in point: These two solutions were originally responses to Find duplicates with Zapier – even though they are built entirely with garden-variety Airtable.) I’ve created a 1,000-record01] demonstration base for each approach, and the guide provides step-by-step instructions on how to retrofit either functionality into an existing base.
The de-duplication routine is intended to be run one or fewer times against an existing base only; it is almost painfully resource-hungry, and I offer it only reluctantly. Fortunately, once the base has been purged of duplicates, the worst-offending of the data structures can be eliminated and the rest modified or enhanced to support duplicate detection as part of ongoing base update.
Briefly, for both solutions the user defines a matchStr
assembled from fields in the main table – that is, the table where duplicate records should be identified. matchStr
should be designed to place more importance on the more-distinguishing characteristics of the record while ignoring those liable to be common among a number of records. For example, in the demo bases, matchStr
contains the formula
UPPER(GivenName&SurName)
hoping to prevent false negatives caused by, say, the failure to include MiddleInitial
on one record.
The systems alert when more than one record contains identical matchStr
s; with the de-duplication code, any record containing matching matchStr
s is tagged as a duplicate, as well. (While the duplicate detection routine toggles a single, system-wide flag, the expectation is the most recently entered record is one of a pair.) In both instances, the records are flagged only as possible duplicates, with final determination left to the user. Should he or she decide the records represent a false positive – for instance, a misperceived match between John Q. Public
and John S. Public
– the user selects the DupeOK
field of the current record. This causes RECORD_ID(
) to become appended to matchStr
, making it unique and resetting the flags.
Do these two sample implementations address all of the requirements and concerns voiced elsewhere in this thread? Of course not – but, then, they never intended to. Rather, these were attempts to demonstrate just how much of needed functionality could be provided using only the current release of Airtable.
. ----------------------
- Actually, a 995-record base, having backed off from 1,000 to avoid the limit on free accounts.
Note: The document referenced here is only an extract from a larger piece, released here prior to editing and fact-checking. The piece should be considered a lightly edited draft, at best.]
I’ve been struggling with this problem for a long time also, but I walked through @W_Vann_Hall’s demo and it was extremely helpful. I was looking (like I think lots of ppl are) for a way to constantly check for duplicates, so the ARRAY issues concerned me.
Here’s my attempt to simplify the demo:
- My own example table is posted here: https://airtable.com/shrEZNMgiKrQdv8lp/tblcywQ2RsDL3H9iH/viwd63P4gD3yMRUqD
- Generate a “test duplicates” base with ARRAYJOIN roll-up to your list (this creates a string variable to pass back to your individual list items
- In your main base, pass the list back using an ARRAYJOIN roll-up. I know this is counter-intuitive since you are joining only 1 other, but its necessary to read the column as a string
- Create a formula to detect duplicates:
IF(LEN(List) - LEN(SUBSTITUTE(List,Name,"")) > LEN(Name),“yes”,“no”)
Hope this is helpful for others!
@Christopher_Rex - like it - and wouldn’t it work so much better it you could default (and then hide) the linking field so that the process is transparent to the user.
In our case we want to prevent duplicates from getting created in the primary field. The UI should detect a duplicate as it is typed in and prevent the row from being saved. If trying to create a row via the API and the primary field value is a duplicate, then an error response should be returned. Hovering over the lock in the first column header states that the value is intended as “short unique representation” of the record. So it seems strange that this uniqueness would not be enforced automatically, or at least be able to toggle a setting.
minor challenge with “as you type” detection and prevention is what if you have multiple people with same company name? and want each of them… but don’t want two bob smiths, or there are two bob smiths but they are actually different people with different contact info…
that needs to be part of the protocol… allowing for specific fields to be duplicated when necessary/logical…
totally agree with the disconnect of the primary field not forcing uniqueness at least!
i’ve taken to creating the formula field with auto-numbering to achieve that, and so far it’s working.
Being able to SET your primary field when you import would sure be a help…
… back to my manual deduplicating…
I agree with the prevention of possible records, which could be done when entering data manually. This could easily be a feature on a column type, where the “dupe detector” is turned on, and it searches to see if the record already exists, giving the user an option to add, ignore, or auto-rename. I have a small Contacts base that already has Bob Smith several times.
minor challenge with “as you type” detection and prevention is what if you have multiple people with same company name? and want each of them… but don’t want two bob smiths, or there are two bob smiths but they are actually different people with different contact info…
that needs to be part of the protocol… allowing for specific fields to be duplicated when necessary/logical…
totally agree with the disconnect of the primary field not forcing uniqueness at least!
i’ve taken to creating the formula field with auto-numbering to achieve that, and so far it’s working.
Being able to SET your primary field when you import would sure be a help…
… back to my manual deduplicating…
The Duplicate Detection demo base allows the user to override a detected match. For instance, I enter the name of the second Bob Smith, create the link to the DeDupe table, and am immediately notified of a detected match. I then click the checkbox field beside Bob Smith II’s name, and the alert disappears.
Again, I see the techniques demonstrated in the two example bases as aspiring to some day being called a kludge; next to them, your typical stopgap measure would qualify as ‘elegant.’ That said, they still beat deduping manually from a printout hands down… :winking_face:
Reply
Enter your E-mail address. We'll send you an e-mail with instructions to reset your password.