Automatically Delete Duplicate Records in Table

I’m trying to delete duplicate records from a table.

I have table “Table1” that contains linked fields “Field1” and “Field2”. Where multiple records have the same values in “Field1” and the same values in “Field2”, I want to delete all duplicate records that are newest, leaving the original record that is oldest.

In the below example (just for illustration purposes), “Record1” and “Record2” have the same values in “Field1” and “Field2”, and “Record3” and “Record4” have the same values in “Field1” and “Field2”. After running the script, “Record2” and “Record4” should be deleted (they’re the newest) and “Record1” and “Record3” should remain (they’re the oldest).

[
    {"Record": "Record1", {"Fields": [{"Field1": "Value1"}, {"Field2": "Value2"}], "Created": "2020-01-01"}
    {"Record": "Record2", {"Fields": [{"Field1": "Value1"}, {"Field2": "Value2"}], "Created": "2020-02-02"}
    {"Record": "Record3", {"Fields": [{"Field1": "Value3"}, {"Field2": "Value4"}], "Created": "2020-03-03"}
    {"Record": "Record4", {"Fields": [{"Field1": "Value3"}, {"Field2": "Value4"}], "Created": "2020-04-04"}
]

The script is below. What I’m stuck on is writing the criteria for the filter. If someone could point me in the right direction, it would be a huge help!

After any duplicate records are deleted, I know they’re deleted from “Table1”, but I don’t think they’re deleted from the variable “Table1_Records”, which means that when the next “Table1_Record” of “Table1_Records” is reached it may attempt to delete the same record again and throw an error. How can I re-get records in “Table1” after each deletion of duplicate records (to help avoid errors)?

let Table1 = base.getTable("Table1");
let Table1_Query = await Table1.selectRecordsAsync({
    sorts: [
        {field: "Field1", direction: "asc"},
        {field: "Field2", direction: "asc"}
        ],
    fields: [
        "Field1",
        "Field2"
        ]
});
let Table1_Records = Table1_Query.records;

for (let Table1_Record of Table1_Records) {
    let Table1_Field1_Value = Table1_Record.getCellValue("Field1")[0].id;
    let Table1_Field2_Value = Table1_Record.getCellValue("Field2")[0].id;

    let Table1_FilteredRecords = Table1_Records.filter(
        // Insert Filter Criteria
    )
        await Table1_Records.deleteRecordsAsync(Table1_FilteredRecords);
}

P.S. - I know that there is a block for merging duplicate records, but I’m trying to automate the process of deleting duplicate records as part the script. I will never want to merge records, and the criteria for which records to delete and keep is always the same, so it makes sense to automate this instead of doing it manually.

Hi @StevenB!

Here’s an attempt at the script you described:

Click to expand
/**
 * Copyright 2020 Bocoup
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to
 * deal in the Software without restriction, including without limitation the
 * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
 * sell copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
 * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
 * IN THE SOFTWARE.
 */

/**
 * Duplicate deletion script
 *
 * Remove duplicate records in a given table according to the value of two
 * input fields. For any two records that are considered duplicates, use a
 * third field to determine which of the two records should be deleted.
 *
 * **Notes on adapting this script.**
 *
 * The script prompts for input every time it is run. For some users, one or
 * more of these values may be the same with every execution. To streamline
 * their workflow, these users may modify this script by defining the constant
 * values in the first few lines. The values should be expressed as JavaScript
 * strings in the object named `hardCoded`.
 */
'use strict';

/**
 * Users may provide values for any of the properties in the following object
 * to streamline the script's startup.
 */
const hardCoded = {
    tableName: '',
    firstIdFieldName: '',
    secondIdFieldName: '',
    comparisonFieldName: ''
};

/**
 * Do not edit any code following this message.
 */

// Airtable limits batch operations to 50 records or fewer.
const maxRecordsPerCall = 50;

const table = hardCoded.tableName
    ? base.getTable(hardCoded.tableName)
    : await input.tableAsync('Table name');
const firstIdField = hardCoded.firstIdFieldName
    ? table.getField(hardCoded.firstIdFieldName)
    : await input.fieldAsync('First identifying field name', table);
const secondIdField = hardCoded.secondIdFieldName
    ? table.getField(hardCoded.secondIdFieldName)
    : await input.fieldAsync('Second identifying field name', table);
const comparisonField = hardCoded.comparisonFieldName
    ? table.getField(hardCoded.comparisonFieldName)
    : await input.fieldAsync('Comparison field name', table);

function choose(recordA, recordB) {
    let valueA = recordA.getCellValue(comparisonField);
    let valueB = recordB.getCellValue(comparisonField);
    return valueA > valueB ? {keep: recordA, discard: recordB} : {keep: recordB, discard: recordA};
}

const existing = Object.create(null);
let toDelete = [];

// Part 1: Identify duplicate records in need of deletion
//
// We don't modify the table contents in this Part in the interest of
// efficiency. This script may trigger a large number of deletions, and it's
// much faster to request that they be done in batches. When we identify a
// record that should be deleted, we add it to an array so we can batch the
// operations in Part 3 of the script.
const query = await table.selectRecordsAsync({
    fields: [firstIdField, secondIdField, comparisonField]
});

for (let record of query.records) {
    let key = JSON.stringify([
        record.getCellValue(firstIdField),
        record.getCellValue(secondIdField)
    ]);

    // If we've already encountered a record with identical field values,
    // either that record or the current record need to be removed.
    if (key in existing) {
        let {keep, discard} = choose(record, existing[key]);
        toDelete.push(discard);
        existing[key] = keep;

        // If this is the first time we've observed a record with these
        // particular field values, make a note of it so we can recognize
        // duplicates as we iterate through the rest.
    } else {
        existing[key] = record;
    }
}

// Part 2: Verify
//
// Inform the script's user of the changes to be made and await their
// confirmation.
output.markdown(`Identified **${toDelete.length}** records in need of deletion.`);

const decision = await input.buttonsAsync('Proceed?', ['Yes', 'No']);

// Part 3: Execute the necessary operations

if (decision === 'No') {
    output.text('Operation cancelled.');
} else {
    output.text('Applying changes...');

    while (toDelete.length > 0) {
        await table.deleteRecordsAsync(toDelete.slice(0, maxRecordsPerCall));
        toDelete = toDelete.slice(maxRecordsPerCall);
    }

    output.text('Done');
}

It deviates from your sample code a bit because the Scripting Block can’t determine a record’s creation time. Unfortunately, you’ll have to explicitly track that information in the table itself if you want to use it to choose between duplicates.

I’ve included some comments to explain each part of the algorithm in context, and I’m happy to answer any questions you have here :slight_smile:

1 Like

It deviates from your sample code a bit because the Scripting Block can’t determine a record’s creation time. Unfortunately, you’ll have to explicitly track that information in the table itself if you want to use it to choose between duplicates.

This isn’t as bad as it sounds. Airtable offers a Field Type named “Created time”. If you extend your “Table1” to include a field with that type, then this script will work just as you requested without requiring any additional data maintenance. You can also hide the new field from your default view if it’s only relevant for the script.