Help

Assistance with Detecting and Deleting Duplicates Automatically for Large CSV Imports in Airtable

Topic Labels: Automations Formulas
574 5
cancel
Showing results for 
Search instead for 
Did you mean: 
SNM
5 - Automation Enthusiast
5 - Automation Enthusiast

Hi Airtable Community,

I’m facing a challenge with importing large CSV files (10000+ records) into Airtable and would love some guidance. Specifically, I need to automatically detect and delete duplicate records upon adding or updating via CSV import.

Currently, I’m using a script to handle duplicates, but due to Airtable's 30-second execution limit, it times out when processing high-volume data. I’d appreciate any suggestions on:

  1. Optimized duplicate detection and deletion methods that can handle large data volumes quickly and efficiently, potentially in batch processing.
  2. Alternative automations or built-in Airtable solutions that would automatically identify duplicates at the point of import, or shortly thereafter, to help with overall processing speed and accuracy.

If anyone has successfully managed similar high-volume imports or has insights on achieving faster processing, I’d be grateful for your advice!
Thanks in advance.

5 Replies 5
ScottWorld
18 - Pluto
18 - Pluto

Yes, this is super easy to do and only takes a few minutes to setup.

You can do this with Make's CSV automations alongside Make's Airtable integrations.

I give a full step-by-step demonstration on how to do this in this Airtable podcast episode.

If you’ve never used Make before, I’ve assembled a bunch of Make training resources in this thread.

Hope this helps! If you’d like to hire an expert Airtable consultant to help you with anything Airtable-related, please feel free to contact me through my website: Airtable consultant — ScottWorld

Hm, you probably already know this, but Airtable's CSV Import extension (https://support.airtable.com/docs/csv-import-extension) comes with functionality to merge / update based on a unique value, which seems like it'd do what you need.  I take it you're importing these via an automation or something and so you can't use the extension?

Screenshot 2024-11-07 at 9.45.06 AM.png

Oh, right. I thought that for some reason he couldn't use Airtable's CSV Import extension. That is always the first place to start, before reaching out to 3rd-party tools.

Hope this helps! If you’d like to hire an expert Airtable consultant to help you with anything Airtable-related, please feel free to contact me through my website: Airtable consultant — ScottWorld

There is also the Dedupe Extension that can be run manually from a data view after the import is done. If there are duplicates within the CSV (versus between the CSV and existing records), this might be a better option.

However, if you are using the new import CSV from an interface feature and the user does not have access to the data view, you will need to use scripting or a third party tool to detect duplicates. If you go the scripting route, I recommend having a button to manually run the script after the CSV import is done, and not as an automation that triggers off the creation of a new record. 

If your script is timing out at 30 seconds when trying to detect duplicates in only 10,000 records, I suspect that your script could be better written. For example, does your script have nested loops? The dedupe detection scripts that I have written for clients do not have any nested loops.

And another limitation of that new Import CSV function is that it doesn’t support importing data into linked record fields.