Re: [SOLVED] Find duplicates with Zapier

Giovanni_Cangel · ‎Aug 03, 2017

Hi,

I know there isn’t a way to find duplicates automatically.
But is there some kind of Zap that i can use for this purpose?
I have all my prospects in Airtable and it would look really stupid to contact them twice for the same deal.
So now i do this manually but my list is expanding to over 1000 prospects so this isn’t going to work for me anymore.

Who knows a solution?

Thanks in advance,
Giovanni

Julian_Kirkness · ‎Aug 03, 2017

Hi Giovanni

I don’t think there’s any way you could use Zapier to de-duplicate an existing base - but you could check for duplicates when you enter a new record. The Zap steps would be something like:

Trigger based on new record
Use Find a record to locate ANOTHER record with the same matching field (say Email) - but use criteria which excluded the current record by doing a not equal based on the record ID (which you would have to add as a formula field to the table).
If another record IS found then do something - maybe send you an email and / or update the new record with a warning.

Of course this will only actually happen when the Zap runs which, depending on your Zapier account, could be up to 15 minutes later. You will need a paid Zapier account so you can create a multi step Zap.

Julian

W_Vann_Hall · ‎Aug 07, 2017

I’ve been playing around some with embedding data-validation code inside tables; although this approach does not truly enforce good data, it can at least flag invalid data at the time of entry, allowing the user to correct the mistake at the time it was made, rather than be forced to locate it later.[1] While it may not be an optimal solution, something similar could easily be added to your contacts base to ensure each new name doesn’t already exist.

You can find a demo base with example code here.

The only change to your workflow required is that when entering a new contact record, you’ll need to link it to the ‘DeDupe’ table. As this table contains but a single row, all you need do is select the plus sign (’+’) in the linked record field and, when the pop-up window providing drill-down access to ‘DeDupe’ records opens, select the first (and only) record. After a brief moment of cogitation (a second or two on the demo base, which contains 995 entries), one of two things will happen: One, nothing, meaning the contact you just entered is not a duplicate, or two, a flag will appear in the ‘Dupe?’ column, meaning the entry you just entered might be a duplicate.

I say ‘might’ because it’s up to you to vet the two similar entries – and because in this implementation I match (GivenName&Surname), ignoring middle initials.[2] If you decide ‘John Q. Public’ and ‘John R. Public’ should both remain on the list, tick the ‘DupeOK’ checkbox ; this will append ~~‘1’~~ RECORD_ID() to ‘MatchName’ and clear the ‘Dupe’ flag.[3]

The routine works by creating a Rollup of ‘MatchName’ using the ARRAYUNIQUE() aggregation formula. It then compares COUNTA(Uniques) with a Count field for the linked records. If the numbers differ, it means ARRAYUNIQUE() removed at least one duplicate.

Since you know the most recently entered name is the one duplicated, it’s relatively easy to find the earlier matching record. If you’re converting an existing base that contains duplicates, though, it’s a slightly more complicated process. It turns out the 1,000 random names with which I seeded the base included both John A. [line 574, now with ‘DupeOK’ checked] and John M. [line 806] Jenkins, and it took a while to locate the duplication. (Actually, since I wasn’t positive the algorithm was correct, I opened the CSV in LibreCalc to double-check my work and ended up finding it there.)

To convert an existing base
First, to create the ‘DeDupe’ table and add the ‘MatchName’, ‘DupeOK’, and ‘Check’ fields to your ‘Contacts’ table. The latter should be defined as a text field. Return to the ‘DeDupe’ table and enter any arbitrary name in the primary field of the first row. (I used the ‘White Heavy Check Mark’ emoticon: :white_check_mark: .) Mark and copy this name, and return to the ‘Contacts’ table. Select the ‘Check’ field in the first row, navigate to the final row of the table, and, while holding down the shift key, select the last row’s ‘Check’ field and then enter ^V (Ctrl-V). This should paste the ‘DeDupe’ record name into the ‘Check’ field of every ‘Contact’ record. Finally, reconfigure ‘Check’ to be a Linked Record field linking to ‘DeDupe.’ This will result in every ‘Contacts’ record being linked to the single entry in ‘DeDupe’. Add the ‘Dupe’ field to ‘Contacts’, and everything should be ready to go.

There: A workable, if not exactly elegant, solution to your problem, one built entirely in Airtable, without need for third-party middleware.

-Notes ----------------

For instance, in the demonstration base that accompanies this reply, whenever a Person’s record is updated to show what Size (e.g., S, M, MT) he or she wears of a certain clothing Type (e.g., Men’s Sweatshirt, Women’s Shorts), a quick sanity check is run to make sure the specified Size is valid for that Type.
I was blessed with very clean data, since it consists of 1000 uniform records created by fakenamegenerator.com (less five, that is, to stay below the 1,000-record limit on free accounts); I also didn’t have to contend with titles or suffixes. If yours isn’t so clean, doesn’t come prepackaged into orderly FirstName/M.I./LastName chunks, or if your existing base contains occasional, unpredictable inclusion of titles, prefixes, or suffixes, you’ll need to scrub it a bit. You can find a set of routines designed for name processing here.
This implementation allows for each given name/surname pair to be duplicated once; should your client list contain, say, a suspicious number of 'John Doe’s, you’ll have to change ‘DupeOK’ to a text or number field and enter a unique deduplication value for each. I recently modified the demo base to support an unlimited number of permitted duplicates per given name/surname pair, so this limitation no longer applies.

W_Vann_Hall · ‎Aug 13, 2017

While I’m reluctant to suggest this, I have put together a method of identifying potential duplicates within an existing base. With a 1,000-record test base, response sometimes lags to the point of nearly being unusable – but given the task is one that will typically be performed only once, for some the tradeoff may be acceptable. (After the base has been cleansed of duplicates, the duplicate detection routines shown earlier can be used to keep the base duplicate-free.)

Vincent_Tang · ‎Feb 24, 2018

hi I just wanted to say I found this post helpful

I use to it to detect duplication upon data entry, but when I actually need to clean up data duplicates, I use another solution

An easy way to combat duplicates in bulk is to take your column A values (primaryKeys) and sort “A-Z”

Copy all column A values into excel

Highlight all values pasted
Home → Conditional Formatting → HighlightCellRules → Duplicatevalues

Any duplicate values found are going to be next to each other, since its A-Z sorted

Jot down on a piece of paper what row numbers have duplicates

Go back to airtable, reference the row numbers, manually fix the duplicates (e.g. decide which data needs to be merged, duplicates need to be removed, etc)

Ideally if you were employing good database practices, there shouldn’t be that many duplicates anyhow.

I had a bit of a hard time understanding all the things you wrote, but it made a lot more sense digging through the table and looking at the formulas and how everything was referenced. Then manually writing it myself into my table.

One way to disable dupeCheck if input lag is a problem is to just delete the “check” table field on your main table with :white_check_mark: on it, so none of the tables are linked.

W_Vann_Hall · ‎Feb 27, 2018

I recently created an updated version of my De-duplication Demo base that does away with the need to create a massive match field for each record. It seems to operate more crisply than the original; my guess is that it could be added to a working base without incurring a noticeable performance penalty.

Katherine_Duh · ‎Sep 07, 2018

This might be relevant to some of the folks in this thread: our deduplication block is currently in beta right now! You can use it to delete duplicate records or merge individual fields from duplicate records together.

If you’re a Pro plan user and you’re interested in testing it out (and giving us your feedback!), we’d love for you to sign up using this form.

Cameron_Deleone · ‎Oct 01, 2018

In case it helps, the dedupe block is now live and open to all Pro users. For more info, check out the support article: https://support.airtable.com/hc/en-us/articles/360008332253.

This topic will close automatically in 24 hours.
Enjoy!