I know I fell off the board lately, but it was for [I hope] a worthy cause.
Data Deduplication v3.0
The third iteration of my deduplication routines for Airtable, totally rewritten, with additional features and functionality.
- Standard Airtable functionality
- Works on Free and Plus subscriptions
- Optional use of Pro features (record color, Blocks)
- Does not require 3rd-party SaaS integration
- Includes 3 “modules”
- Duplicate identification
- Suppression of false positives
- Merging of true duplicates
- Supports persistent tagging of false positives
- No need repeatedly to vet the same flagged dupes
- Suppressed dupes still evaluated against new records
- Color coding indicates past, new matches
- Supports real-time deduplication
- Each new record checked at time of data entry
This version of the routines has been tested against bases as large as 25,000 records while proving to maintain an acceptable response time for duplicate detection and false positive functionality.¹ Three demonstration bases are provided, identical except for the number of records, to allow prospective users to judge how well the solution might work in their environments: 14 records, 1,000 records, and 10,000 records.
I’ve prepared a surprisingly long (to me, at least, but probably not to anyone reading this) document describing how the routines work, the steps one needs to take to add such functionality to an existing base, possible problems and ‘gotchas,’ the philosophy of crafting the optimal match field, and many other areas of, at best, tangential interest.
However, please note it also includes a Quickstart section.
The document can be found at paladesigns.com/airtable/dedupe.pdf.
There should also be an introductory video available later today or tomorrow.
- A framework within which to perform duplicate merger is provided for completeness’ sake. As each individual merger results in the recalculation of a dozen or more fields for each record in the base, processing delay becomes noticeable after only a few thousand records. However, as it takes essentially the same amount of time to merge one record as it would to merge all records, the system allows one to indicate desired mergers, which are then executed en masse.