I’ve just made available on Google Drive a brief write-up of two approaches to duplication prevention within Airtable, one for de-duplication of an existing base and a second for ongoing duplicate detection. It’s an excerpt (in PDF format) from a larger work-in-progress: a collection of Airtable tips and tricks gleaned from postings in the Support forum, premised on the belief that important and useful information often goes overlooked because it lies buried in a confusingly or uninformatively titled thread. (Case in point: These two solutions were originally responses to Find duplicates with Zapier – even though they are built entirely with garden-variety Airtable.) I’ve created a 1,000-record demonstration base for each approach, and the guide provides step-by-step instructions on how to retrofit either functionality into an existing base.
The de-duplication routine is intended to be run one or fewer times against an existing base only; it is almost painfully resource-hungry, and I offer it only reluctantly. Fortunately, once the base has been purged of duplicates, the worst-offending of the data structures can be eliminated and the rest modified or enhanced to support duplicate detection as part of ongoing base update.
Briefly, for both solutions the user defines a
matchStr assembled from fields in the main table – that is, the table where duplicate records should be identified.
matchStr should be designed to place more importance on the more-distinguishing characteristics of the record while ignoring those liable to be common among a number of records. For example, in the demo bases,
matchStr contains the formula
hoping to prevent false negatives caused by, say, the failure to include
MiddleInitial on one record.
The systems alert when more than one record contains identical
matchStrs; with the de-duplication code, any record containing matching
matchStrs is tagged as a duplicate, as well. (While the duplicate detection routine toggles a single, system-wide flag, the expectation is the most recently entered record is one of a pair.) In both instances, the records are flagged only as possible duplicates, with final determination left to the user. Should he or she decide the records represent a false positive – for instance, a misperceived match between
John Q. Public and
John S. Public – the user selects the
DupeOK field of the current record. This causes
RECORD_ID() to become appended to
matchStr, making it unique and resetting the flags.
Do these two sample implementations address all of the requirements and concerns voiced elsewhere in this thread? Of course not – but, then, they never intended to. Rather, these were attempts to demonstrate just how much of needed functionality could be provided using only the current release of Airtable.
- Actually, a 995-record base, having backed off from 1,000 to avoid the limit on free accounts.
[Note: The document referenced here is only an extract from a larger piece, released here prior to editing and fact-checking. The piece should be considered a lightly edited draft, at best.]