Help

Scripting to identify duplicate name records

Topic Labels: Community
1307 1
cancel
Showing results for 
Search instead for 
Did you mean: 
Kongtho_Imarith
5 - Automation Enthusiast
5 - Automation Enthusiast

Hello, 

In my Airtable's "PEOPLE" Table, we stored many people's name records—more than 15,500 records. I aimed to identify those name records whose spelling is the same as any particular person's name. I developed the following scripting codes using the Scripting Extension in Airtable. When I tested it with another table with a small number of name records, less than 10, it worked well. However, when testing the code with a table with many people's names, the script's running was not complete. This issue could be related to the large number of records in the Table. Please note the following regarding my "PEOPLE" Table: 1) Table name: PEOPLE, 2. Primary field name: English Name, and 3. "Duplicate Name" is a checkbox field, which means if a tick is created for any name, that name has more than one record. Please see my scripting codes below:

//Read our table
var table = base.getTable('PEOPLE');
var query = await table.selectRecordsAsync();

//Identify duplicates
//loop through all of the records

let duplicates=query.records.filter((record)=>{
return query.records.find((potentialDuplicate)=>{
//compare record to potentialduplicate
return record.getCellValue("English Name")=== potentialDuplicate.getCellValue("English Name") && record.id !==potentialDuplicate.id;
})
});
let updates = duplicates.map(update=>{
return {
"id":update.id,
fields:{
"Duplicate Name": true
}
}
});
await table.updateRecordsAsync(updates);

I look forward to hearing from all of you who can help with this.

1 Reply 1

What is your level of experience writing scripts and what is your level of interest in learning to write better scripts?

What error message are you seeing when you run the script on a larger data set?

When using table.updateRecordsAsync(), you can have a maximum of 50 records. This might be your issue.

This script selects the checkbox for the records that are duplicates, but you still have to find the duplicate of duplicates to do something with them. 

If the checkbox is already selected for a record, the script does not clear the checkbox if there is no duplicate.

This script has nested looping, which means that as the record count increases, the run time dramatically increases. A more efficient version would not have nested looping.

Is there a reason you want to use this script instead of the dedupe extension?