Nov 06, 2024 01:39 PM
Hi Airtable Community,
I’m facing a challenge with importing large CSV files (10000+ records) into Airtable and would love some guidance. Specifically, I need to automatically detect and delete duplicate records upon adding or updating via CSV import.
Currently, I’m using a script to handle duplicates, but due to Airtable's 30-second execution limit, it times out when processing high-volume data. I’d appreciate any suggestions on:
If anyone has successfully managed similar high-volume imports or has insights on achieving faster processing, I’d be grateful for your advice!
Thanks in advance.
Nov 06, 2024 03:32 PM - edited Nov 06, 2024 03:34 PM
Yes, this is super easy to do and only takes a few minutes to setup.
You can do this with Make's CSV automations alongside Make's Airtable integrations.
I give a full step-by-step demonstration on how to do this in this Airtable podcast episode.
If you’ve never used Make before, I’ve assembled a bunch of Make training resources in this thread.
Hope this helps! If you’d like to hire an expert Airtable consultant to help you with anything Airtable-related, please feel free to contact me through my website: Airtable consultant — ScottWorld
Nov 06, 2024 05:53 PM
Hm, you probably already know this, but Airtable's CSV Import extension (https://support.airtable.com/docs/csv-import-extension) comes with functionality to merge / update based on a unique value, which seems like it'd do what you need. I take it you're importing these via an automation or something and so you can't use the extension?
Nov 06, 2024 05:57 PM
Oh, right. I thought that for some reason he couldn't use Airtable's CSV Import extension. That is always the first place to start, before reaching out to 3rd-party tools.
Hope this helps! If you’d like to hire an expert Airtable consultant to help you with anything Airtable-related, please feel free to contact me through my website: Airtable consultant — ScottWorld
Nov 06, 2024 06:48 PM
There is also the Dedupe Extension that can be run manually from a data view after the import is done. If there are duplicates within the CSV (versus between the CSV and existing records), this might be a better option.
However, if you are using the new import CSV from an interface feature and the user does not have access to the data view, you will need to use scripting or a third party tool to detect duplicates. If you go the scripting route, I recommend having a button to manually run the script after the CSV import is done, and not as an automation that triggers off the creation of a new record.
If your script is timing out at 30 seconds when trying to detect duplicates in only 10,000 records, I suspect that your script could be better written. For example, does your script have nested loops? The dedupe detection scripts that I have written for clients do not have any nested loops.
Nov 06, 2024 07:01 PM
And another limitation of that new Import CSV function is that it doesn’t support importing data into linked record fields.