Help

Re: How To Delete Duplicate Records Without

Solved
Jump to Solution
6375 1
cancel
Showing results for 
Search instead for 
Did you mean: 
M_k
11 - Venus
11 - Venus

Hi Airtable!

I have a base that I created in which I will be adding Notes from Apple.

There are duplicate records, but there is no key, except for a formula to create different record ID’s. This ID is unique for all records, including duplicates.

My question is how do I create a unique key, when each of the fields are not unique in my base.

Fields:

DATE *these dates are not all different

SUBJECT *some are the same

NOTES *the titles are either too varied or the same

TIME CREATED *some of the same records will have different times

FORMULA *creates a different record ID, even if it’s a duplicate

Perhaps the formula can be adjusted to create the same ID for all duplicates, based on another field?

If so, I don’t quite know how to do that.

I have created a share link to my base, I thought it might be easier.

I would appreciate any help.

Thank you,

Mary Kay

P.S. For some reason my share link looks like this. If you go to the last record, there’s a link (View larger version) to view the whole base.

I set up this link correctly and it views fine when I select preview in my base, so I am not sure what’s going on.

1 Solution

Accepted Solutions
M_k
11 - Venus
11 - Venus

Hi @kuovonne

I figured it out!

I just typed “Link All Records” in the first field, in the “MasterLookup” table and it worked.

Now I just need to do a test run with my Integromat scenario, so I will see how it everything works out.

Thank you!

Mary K.

See Solution in Thread

16 Replies 16

Is this a one-time dedupe, or an onging dedupe?

Create a new formula field expressly for the purposes of detecting duplicates. In the formula field, concatenate enough other fields to try to uniquely identify records. Then run the dedupe block on the new formula field.

For example:

DATE & SUBJECT

or

DATE & SUBJECT & NOTES

Technically, if they’re running the Dedupe block, they don’t even need to create a formula, because the Dedupe block lets them choose multiple fields. Dedupe will only consider it a duplicate if ALL fields match across records.

@M_k You can learn more about the Dedupe block here.

ScottWorld, you’re right. You can tell the Dedupe block which fields to consider when looking for duplicates.

If you don’t have the Dedupe block, you can create the formula field and then group on the formula field. Collapse the groupings and look for groups that have more than one item in the group. But the Dedupe block is much nicer.

If this is an ongoing dedupe issue, you can also use a Scripting block to do this. I just wrote one for a client.

Hi @kuovonne and @ScottWorld

I am using the Free Plan and I was going to remove the duplicates through Integromat, so I was hoping that I could change the existing formula which creates unique ID’s or another formula to duplicate ID #’s for every duplicate record and using one of my fields in the formula. So that I know which records are duplicates.

@kuovonne can you explain a little more about creating the groups, I am not too familiar with this.

Would I use the same formula that you provided?

This will be an ongoing de-dupe process.

Thank you,
Mary

“This time”? That’s kind of rude, isn’t it? I’ve been helping people nonstop here — 100% voluntarily — for the last few months, and I would say that 99% of my answers have not only been the correct answers, but have actually been extremely thorough answers as well.

@ScottWorld

My apologies. My reply was badly phrased. I have edited it. By “this time” I was referring to the fact that in this thread, you are right while I was wrong, whereas in other posts we simply had differing opinions.

You’ve definitely spent a lot of time helping people here.

I was suggesting using the grouping feature in Airtable views to group on the new formula field. Then collapse the views and look to see which groups have 2 or more in them. It would be a pain, but with only 76 records for a one-time deduping for free, it would be doable. However, that won’t work on an ongoing basis.

I have another way of identifying duplicates that might work for you, but it will take a bit of time to write up.

@kuovonne Aw, thank you so much for your kind apology & your explanation! :slightly_smiling_face: I absolutely accept your apology. Thanks for editing your reply too! :slightly_smiling_face:

Haha, that is true that we have sometimes given people different ways of solving the exact same problem! :stuck_out_tongue_winking_eye:

In my opinion, you have consistently given absolutely fantastic & top-notch advice to everyone in this community! :slightly_smiling_face: And you are a top-notch expert at scripting, too! (I know nothing about scripting, since I’m just beginning to learn JavaScript.) You have even reminded me of little things which I have forgotten about — such as the SWITCH function, which is so much easier than nested IF statements!

So, a big thank you for your MASSIVE CONTRIBUTIONS to this community! :slightly_smiling_face:

It’s late, and I don’t have time to do a full write-up now, but I thought I’d share this screen shot for now. Will this work for your Integromat integration?

image

Hi @kuovonne

Yes! This would work.

I can use the field name “Duplicate?” as my key, to identify which records can be deleted in Integromat.

Thank you so much for your help.

Mary K

Here’s the writeup. Note: I changed some field/table names from my original screen capture.

@M_k I created this solution using a copy of the base that you shared. Since it is your data, I didn’t want to share my copy of the base without your permission. Please let me know if I have your permission to share my copy of your base publicaly.


Purpose

A base has duplicate records, and the user needs a way of identifying and deleting records without using the Dedupe block or writing code.

Overview

This method relies on a combination of techniques:

  • using a formula field to concatenate all the relevant fields together so that a single field can be used to determine uniqueness
  • linking all the records in a table to the same record in another table for purposes of cross-record calculations.

Once each record has a copy if all the records in the table, a formula field can tell if the record is a duplicate or not.

New [MasterLookup] table

You need to create a new table with a single record. All of the records in the original table will need to be linked to this single record.
This table will have three fields:

  • primary field (single line text with a value of “Link All Records”, or any other type/value of your choice)
  • linked record field (allow linking to multiple records, and link to all records in the original table)
  • lookup field (looking up the newly created {Composite Key + Record Id} field in the original table)

New fields in the original table

You need to create several fields in the original table.

Non-formula fields:

  • Linked record field to the single record in the new [MasterLookup] table. You will need to create this link for every record in the table.

  • Lookup of the {Composite Key + Record Id} field in the new [MasterLookup] table

Formula Fields:

  • {Composite Key} formula field, a concatenation of all the fields that must match for a record to be a duplicate (used to identify if a record is a duplicate) example: DATE & SUBJECT & NOTES

  • {Composite Key + Record Id} formula field, {Composite Key} & RECORD_ID(), a concatenation of all the fields must match, plus RECORD_ID() at the end (used to identify if a record is the “first” in a set of duplicates)

  • {Duplicate?} field that determines if the record is a duplicate that can be deleted

Creating the links

In order for this method to work, you must link all the records in the original table to the single record in the [MasterLookup] table. Here is the fastest way to link existing records:

  1. Create a formula field equal to the primary key of the single record in the [MasterLookup] table: “Link All Records”.
  2. Convert the formula field to a linked record field.

The {Duplicate?} formula field

This field ties everything together.

First the formula checks if the record is linked to the master table. If it isn’t, then the formula gives a warning.

Then the formula searches the combined data for all the records for both its {Composite Key}, and its {Composite Key + Record ID}, and returns the position of each string. It will find both of them, because the {Composite Key} is a subset of the {Composite Key + Record ID}.

If the two positions are the same, it is the “first” record, and should not be deleted. If the two positions are different, some other record must have the exact same composite key, but appear “earlier” in the array. Thus, this record is a duplicate and can be deleted.

Note that the {ALL_COMPOSITE_KEY+RECORD_ID} lookup field needs to be concatenated with an empty string to convert it from an array to a string.

IF(MasterLookup, 
  IF( 
    FIND({COMPOSITE_KEY+RECORD_ID}, 
      {ALL_COMPOSITE_KEY+RECORD_ID} & ""
    ) 
    = 
    FIND(COMPOSITE_KEY, 
      {ALL_COMPOSITE_KEY+RECORD_ID} & ""
    ), 
    BLANK(), 
    "duplicate" 
  ), 
  "Record needs to be linked" 
)

Performance issues

This method is very resource intensive and can take a long time, especially as the number of records increases. The amount of data to be processed is the square of the number of records. If there are 1,000 records, an array of all 1,000 records needs to be stored in each of them, resulting in 1,000,000 records worth of info in the lookups alone! If there are many records, be prepared to wait a long time for Airtable’s calculations to catch up.

Deleting duplicates records

To delete duplicate records, create a view with filters to show only records that have “duplicate” in the {Duplicate?} field. Then delete all the records in that view.

You can automate the duplication with a 3rd party integration, such as Integromat or Zapier by having it delete new records in the view.

If you will be adding new records to the table that might be duplicates, those new records will also need to be linked to the [MasterLookup] table. You can also have an Integromat or Zapier to create those links for you. Create a new view with a filter to only include records where the {MasterLookup} linked field is blank, and create the link.

Hi @kuovonne

Thank you so for the writeup. I really appreciate it.

You can share the table publicly and this way I can visually see how everything is setup.

I was also wondering if you could provide an example of the first base option with the “Duplicate?” formula?

I have a little over 400 records, since I have the free plan which allows me to have 1,200 records and 5 megabytes of data, will I stay within the limitations?

Thank you,
Mary K

@M_k Here’s a link to the shared base.

The formula for the {Duplicate?} field in my screen capture is the same as the formula that I included in my post. I just have the other fields hidden in that screen capture. (I had a bunch of other temporary fields that I was experimenting with when coming up with this solution, and I didn’t want to clutter up the screen shot.)

You don’t need to worry about the record limit, as this method only creates one new record.

I’m not sure about your “5 megabytes of data” comment. The Plus plan has a limit of 5 GB (not MB) of attachment space. But you said that you are on the free plan with a 1,200 record limit and 2 GB of attachment space. If you are referring to the attachment space limit, you don’t need to worry about that either–this base doesn’t have any attachments.

My comments about this method being resource intensive apply to the amount of data that needs to be processed by Airtable in all the lookups and formulas. When Airtable has to perform lots of calculations, it can take a while for the calculated fields to catch-up. When I tested this system with your table of only 76 records, it was reasonably fast, a few seconds at most. However, that’s only 76x76=5776 records worth of data to push through. With 400 records, there would be 400x400 copies of the records, which is 160,000 records worth of data. That’s actually not a lot of storage space, because each record is only text strings, which take an infinitesimal about of storage compared to things like video files. It is the processing of that data that takes time.

I’d appreciate it if you report back on how this method works for you, and how long it takes. (Of course, if you’re using Integromat, you might have spend more time waiting for Integromat than waiting for Airtable.)

By the way, for anyone else listening to this conversation, I want to warn them that this method completely ignores any fields that are not included in the {Composite Key} field. In this base, that doesn’t matter, because the {Composite Key} field includes all the user editable fields in the table.

Hi @kuovonne

Thank you so much for your help.

I am just starting to get into the tables setup. I am noticing that when I link a record, I get the field name called “unknown record” instead of “Link All Records” for all the records.

When I manually enter a test record, in Table 1, I select “Link to a record in the MasterLookup table”, then I link it.

I am not sure why I am getting “unknown record”. Unless, I am not linking the records correctly.

I should mention that the table that this is occurring in, is a duplicate base, without records. I am not sure if that makes a difference. The reason I duplicated the base is the first base was just a test.

Thank you,
Mary

P.S. I mentioned the wrong attachment size capacity. You are correct, it should be 2 GB of space.

Mary K.

Mary K., Can you share screen captures of both tables?

M_k
11 - Venus
11 - Venus

Hi @kuovonne

I figured it out!

I just typed “Link All Records” in the first field, in the “MasterLookup” table and it worked.

Now I just need to do a test run with my Integromat scenario, so I will see how it everything works out.

Thank you!

Mary K.