Skip to main content

I would love to have a feature that would search a given table by a field or set of fields to find and surface duplicate rows. It would then be great to be able to take a pre-defined action on duplicated rows such as delete, or merge. Dealing with dupes is always a pain with large data sets or importing new data into a set where some records may already exist. A way to deal with dupes in a somewhat automated way would be really helpful.

Here is another solution to this issue. It requires one relationship that you might already have.


Let’s say you have two tables: Contacts and Companies and you want to detect duplicate Emails in the Contacts table. The initial structure should look something like this:


Contacts

Email | Company (Link to)


Company

Company Name | Emails (link to)


In the Contacts table we’re going to add two columns, that we can later hide.


Col1: Rollup::Company::Emails with the function CONCATENATE(values)

Col2: Formula with the formula LEN({Col1}) != LEN(SUBSTITUTE({Col1},Email,"",2))


Hide these columns and create a conditional color change with the formula Where Col2 = 1 to create a highlight effect on duplicate emails.


Hope this can help someone else.


In Airtable, there’s no such thing as using a field for “your” primary key. Airtable uses an internal field that is not visible through the user interface but is accessible through the API.


The first field at the left in Grid View is not the primary key. It’s what shows up in a linked record and it can be changed without affecting the link. There’s no need for it to be unique.


Christoff,

The fact that the “Key” field can contain duplicates causes big problems for us when we convert columns of text to links - we can’t be sure which record gets the link.

-Donald


Christoff,

The fact that the “Key” field can contain duplicates causes big problems for us when we convert columns of text to links - we can’t be sure which record gets the link.

-Donald


Donald, I’m not sure why you directed your comment to me. I am not an Airtable employee and I can’t solve the big problems that you’re experiencing with duplicate records.


Just adding my +1 for a dupe detection feature, I’m using Zapier to bring in form data and don’t want to duplicate contacts if they are already in the table.


Just adding my +1 for a dupe detection feature, I’m using Zapier to bring in form data and don’t want to duplicate contacts if they are already in the table.


Hi Everyone,


Just of the top of my head…

(This would be directed to the users more experienced with programming/coding)


A solution to this problem would be to use AIrtable’s API…

It allows you to create,delete etc records in Airtable…


eg. Use case


When you want to import data in from Google sheets…

Use the Google SheetsAPI to store each record in a data structure(In case you are using JAVA to access the API you could use an ArrayList)

ONLY add the record in the data structure if a similar record does not exists.


After that you could loop over the data structure (which now consists of ONLY unique records) and use the Airtable API to insert it into a particular table.


David Smedberg, Great workaround. Thanks!

I agree with Roger (ripwit) … this should be a full-fledged feature that includes duplicate detection by column and an ability to designate at least one more column (other than the first “Name” column) as primary, unique and not-null in every table. (Note: A number of other similar DBs do have this feature.)

It sure has taken Airtable to implement this feature a long time!


+1 here for a duplicates detection and possibly pre-defined actions when they are detected. This is casually usefully in SO MANY scenarios. Specifically, I’m using Airtable for asset tracking using barcodes, and would like to know if someone has attempted to use the same asset ID number twice.


+1 here for a duplicates detection and possibly pre-defined actions when they are detected. This is casually usefully in SO MANY scenarios. Specifically, I’m using Airtable for asset tracking using barcodes, and would like to know if someone has attempted to use the same asset ID number twice.


This is ultimately an ask for data validation per field, against another field or linked field.


This feature looks like it’s something the user base want and was requested back in early 2016… can we get an official answer from Airtable on whether or not its been approved?


2 years…


this is like copy-paste in the old iphones. a feature that is shameful not to have !!!


Occasionally in our video production database, we find the same project is represented by two different records – often both with partial information filled out in separate fields. It would be immensely helpful to have an option to merge two similar records into one, without having to manually copy and paste all relevant info.


Important feature. Please implement!


Seems like a logical addition would be able to have some kind of merge duplicates functionality especially when dealing with complex databases, it’s important to preserve linking between tables which makes manual deduplication a pain in the bum.


I recently created an updated version of my De-duplication Demo base that does away with the need to create a massive match field for each record. It seems to operate more crisply than the original.


I’m currently using a Google Sheets Add-on which works well. I’d love to use Airtable, but without this feature, I can’t move from Google Sheets.


I’m currently using a Google Sheets Add-on which works well. I’d love to use Airtable, but without this feature, I can’t move from Google Sheets.


Please take a look at my previous message. As long as you can create a link from each row of the table you want checked for duplicates to a single row in another table — a process that can be as simple as marking a range of cells and pasting a period into the entire range — you can have duplicate detection with Airtable today.


+1 For having a native feature built into Airtable for detecting duplicate records.


I use Airtable for youth sports registration databases. When a parent registers their child, the information is sent to Airtable via Zapier. Often parents will register twice, sometimes three times at varying points in the registration period. For any interested, here is my workflow for finding duplicates:




  1. I setup my column on the far left to display “Percent Unique”




  2. If that number is less than 100, I group the records by the far left column, collapse all, then skim the columns looking for the groups that have more than one record.




I get that this is a bit clunky, but I am only working with <300 records.


Hi, just started using this and am running into issues with duplicates. It would be great to be able to look at a side by side comparison of a new record that gets added that has some of the same information as an old record, and easily choose what gets merged, and what information gets left behind.

For example, a potential client fills out a form that populates a row with phone number, email address, and name, but there is another row already in the sheet with similar name, same phone number, different email, I’d like to make the call on which information gets merged.


Hi, just started using this and am running into issues with duplicates. It would be great to be able to look at a side by side comparison of a new record that gets added that has some of the same information as an old record, and easily choose what gets merged, and what information gets left behind.

For example, a potential client fills out a form that populates a row with phone number, email address, and name, but there is another row already in the sheet with similar name, same phone number, different email, I’d like to make the call on which information gets merged.


I think you’ve managed to encapsulate both the benefits of and the difficulties in having Airtable incorporate de-duplication as a core function. Aside from such basic, building-block functionality as, for instance, ensuring uniqueness of values for a table’s primary field, most users’ de-duplication needs are highly specific — even circumstance-specific, at times. (For instance, when merging various databases, each new database added may bring with it a different set of criteria for identifying duplication.)


The approaches I outlined earlier support user-tailored detection routines and the application of ‘overrides’ to indicate when detected duplicates were permissible — both of which I would consider a minimum capability of any de-duplication function. Personally, I would rather see Airtable implement features that could be incorporated into de-duplication algorithms, such as Soundex 1] matching or field-level forced uniqueness, than attempt to build an all-encompassing, omnibus de-duplication engine.





  1. Soundex assumes English words and pronunciations; obviously, Airtable would likely need to support comparable approaches for other language sets.


Seriously - this is a constant topic of conversation, and the lack of attention it’s being given is a real shame. So many situations require merging of data like this, and removing duplicates seems like an obvious feature for AirTable - and one I’m up against now myself!


Same here. We track jobs by job name and sometimes people will refer to a job by slightly different names. When we realize the jobs are the same I would like to combine the records.


+1 here for a duplicates detection and possibly pre-defined actions when they are detected. This is casually usefully in SO MANY scenarios. Specifically, I’m using Airtable for asset tracking using barcodes, and would like to know if someone has attempted to use the same asset ID number twice.


i auto number a column for the barcodes, also have put in the formula for check digit.


All unique as i do not enter numbers ad-hock by hand. Let the formula do it and then use them


cheers


daniel


At 3000 records, I also need duplicate detection!


+1 duplicate detection


I need to be able to make sure a value in a table is unique. Its used to handle ID tags, and we need an easy way to detect if a tag already has been assigned to someone, without having to do a search every time.


+1 for automatic dublicates detection


Reply