Script Request: Deduplicate based on timestamp

Is there a script to deduplicate content, then select the rows to keep based on a timestamp or another count field?

Eg., In a table of social media data the content is duplicated, but the dedupe block doesn’t allow me to group content then choose to keep only the most recent.

Eg., In a table of social media data the content is duplicated, but the dedupe block doesn’t allow me to group content then choose to keep only the most recent.

If you’re looking to keep the most recent record by creation time and you don’t need to do any complex merging, you can actually do this with the dedupe block by using the sort feature.

Sorting by creation time in the dedupe block

If you sort by “Newest first”, you can use keyboard shortcuts to quickly process your records:

  1. Press 1 to select the first record in each set
  2. Press A to use that record as the primary record
  3. Press Cmd + Enter to delete the other records

The sort criteria will be preserved for each set of records, so you’ll always keep the latest record. This isn’t quite as efficient as a script that automatically deduplicates all records as a bulk operation, but it may be sufficient for your needs in the meantime.

Unfortunately, the timestamp is unrelated to creation time–it’s based on tweet time. Is there a way to sort by other fields than what’s in the current Sort By dropdown?

This isn’t possible unfortunately, so a script would be the way to go. What constitutes a “duplicate record” in your particular use case?

The body of the text matches exactly, I’d like to keep only the most recent based on date field (this date doesn’t correspond to ‘newest’) or by another count field. The script should be similar. Thanks for your help!

This topic was solved and automatically closed 15 days after the last reply. New replies are no longer allowed.