Delete duplicates (old records and not new ones)

Hello, the default Delete duplicates extension is awesome to remove with one click all duplicated records at once but have a little problem, its removing new records and keep first one, how i can invert the process and keep the last record instead of forst ? remove then all old records and not new.

here is the default code :

let settings = input.config({
    title: 'Delete duplicates',
    description: `This script will delete duplicate records in a given table according to the value of
two input fields. Duplicate records are detected when they contain the same cell value for each identifying
field. For any two records that are considered duplicates, it will use a third comparison field to determine
which of the two records should be deleted.`,
    items: [
        input.config.table('table', { label: 'Table' }),
        input.config.field('firstIdField', {
            parentTable: 'table',
            label: 'First identifying field',
        }),
        input.config.field('secondIdField', {
            parentTable: 'table',
            label: 'Second identifying field',
        }),
        input.config.field('comparisonField', { parentTable: 'table', label: 'Comparison field' }),
    ],
});

let { table, firstIdField, secondIdField, comparisonField } = settings;

// Airtable limits batch operations to 50 records or fewer.
let maxRecordsPerCall = 50;

function choose(recordA, recordB) {
    let valueA = recordA.getCellValueAsString(comparisonField);
    let valueB = recordB.getCellValueAsString(comparisonField);
    return valueA > valueB ? { keep: recordA, discard: recordB } : { keep: recordB, discard: recordA };
}

let existing = Object.create(null);
let toDelete = [];

// Part 1: Identify duplicate records in need of deletion
//
// We don't modify the table contents in this Part in the interest of
// efficiency. This script may trigger a large number of deletions, and it's
// much faster to request that they be done in batches. When we identify a
// record that should be deleted, we add it to an array so we can batch the
// operations in Part 3 of the script.
let query = await table.selectRecordsAsync({
    fields: [firstIdField, secondIdField, comparisonField],
});

for (let record of query.records) {
    let key = JSON.stringify([
        record.getCellValue(firstIdField),
        record.getCellValue(secondIdField),
    ]);

    // If we've already encountered a record with identical field values,
    // either that record or the current record need to be removed.
    if (key in existing) {
        let { keep, discard } = choose(record, existing[key]);
        toDelete.push(discard);
        existing[key] = keep;

        // If this is the first time we've observed a record with these
        // particular field values, make a note of it so we can recognize
        // duplicates as we iterate through the rest.
    } else {
        existing[key] = record;
    }
}

// Part 2: Verify
//
// Inform the script's user of the changes to be made and await their
// confirmation.
output.markdown(`Identified **${toDelete.length}** records in need of deletion.`);

let decision = await input.buttonsAsync('Proceed?', ['Yes', 'No']);

// Part 3: Execute the necessary operations

if (decision === 'No') {
    output.text('Operation cancelled.');
} else {
    output.text('Applying changes...');

    while (toDelete.length > 0) {
        await table.deleteRecordsAsync(toDelete.slice(0, maxRecordsPerCall));
        toDelete = toDelete.slice(maxRecordsPerCall);
    }

    output.text('Done');
}

Hi,

return valueA > valueB ? { keep: recordA, discard: recordB } : { keep: recordB, discard: recordA };

here, try to change the sign (or swap the order of brackets)

Thank you !

any tips to :

  1. Skip the validation question and execute the action directly
  2. execute the script from time to other to clear the base ?

first,
‘lite way’
change line let decision = await inpu.... to
let decision = 'Yes'
OR, ‘more brutal’, change it to
while (toDelete.length) await table.deleteRecordsAsync(toDelete.splice(0, maxRecordsPerCall))
and remove all code under this line

second,
set up automation with desired schedule
‘hardcode’ these 4 variables on the left side, in input parameters, like this

Omit let settings= part and start with
let { table, firstIdField, secondIdField, comparisonField } = input config()

change output.markdown to console.log (you can also just remove this line, same as any other output, automation supposed to work without user interaction)
note that astericks will be seen as usual characters, without effect of bold number of records to delete. but I would left it all ‘as is’ for logging purposes.

p.s. I would think twice before implement such solution - I mean delete records without confirmation and review. it feels like increasing risk of losing a piece of ‘good data’.

1 Like

Thanks a lot @Alexey_Gusev

This topic was solved and automatically closed 3 days after the last reply. New replies are no longer allowed.