CSV Block Feature Request: merge by formula field

I just used CSV Block to import for the first time. My table was initially created with the base CVS import.

The CSV schema I’m importing does not have any unique column. However, a set of three columns is unique. I made my primary field, id, a formula for those columns with the formula: {a} & " " & {b} & " " & {c} (None of a, b, or c has internal spaces.)

When using the CSV Block import, I am importing a more recent download of the same CSV report so it has duplicate rows and it’s a superset of the original import, having new rows.

I see the “merge rows” option but that asks me to merge by a single CSV field. What I need is to merge by the formula field I defined, so that deduplication happens with the set of a, b, c fields.

Has this been discussed / considered? How feasible is it?

Since formula are required for uniqueness with these kinds of datasets, it seems like a required constraint that deduplication can work off of formula.

2 Likes

You could handle this with the Scripting Block – they recently announced a new API for importing files like CSV’s. Then you can have custom scripting for how to handle the file, how to handle duplicates, etc…

Jeremy’s exactly correct! The scripting block now has a file input, meaning scripts can now access a file you upload, then do whatever you want with the data. You can compare the data in the file against the data you already have in your base, then perform whatever custom logic you want, including what you’re describing!

Here’s a script that should be more or less what you’re looking for. I added some comments to explain the logic!

// First, fetch the records from the base
const table = await input.tableAsync('Pick table');
const selectRecordsResult = await table.selectRecordsAsync();

// Make a map that lets us quickly lookup a record by its unique name (the
// primary field of your table)
const recordsByUniqueName = {};
selectRecordsResult.records.forEach(record => {
    recordsByUniqueName[record.name] = record
});

// Ask for a CSV file that contains a header row, then print out the CSV
let csvFileResult = await input.fileAsync('Upload CSV with header row',
    {
        allowedFileTypes: ['.csv'], 
        hasHeaderRow: true
    }
);
output.table(csvFileResult.parsedContents);

// Let's build 2 lists: the records we want to update, and the records we want to create.
const recordsToUpdate = [];
const recordsToCreate = [];

// Loop through rows using a `forEach` loop. For each row...
csvFileResult.parsedContents.forEach(csvRow => {
    
    // 1) Extract data from the row
    const {unique1, unique2, unique3} = csvRow;
    
    // 2) Generate the ID with the same formula you use in the primary field
    const recordName = unique1 + " " + unique2 + " " + unique3;

    if (recordsByUniqueName[recordName]) {
        // 3a) if it already exists, update the record with whatever data from
        // the csv row you want
        recordsToUpdate.push({
            id: recordsByUniqueName[recordName].id,
            fields: {
                "some other field": csvRow.someOtherField
            }
        })

    } else {
        // 3b) otherwise, create a new one.
        recordsToCreate.push({
            fields: {
                "unique 1": unique1,
                "unique 2": unique2,
                "unique 3": unique3,
                "some other field": csvRow.someOtherField,
            }
        })
    }
});

// If you're creating or updating more than 50 records at a time, you'll have to add
// some batching logic here to avoid rate limits
output.text('Updating records...')
await table.updateRecordsAsync(recordsToUpdate);
output.text('Creating records...')
await table.createRecordsAsync(recordsToCreate);

output.text('-------------------------');
output.text(`${recordsToUpdate.length} record(s) updated.`)
output.text(`${recordsToCreate.length} record(s) created.`)