Get rows with unique values


#1

I have a table with several entries. One Column is an ID which could be used several times (not unique).

Now I would like to retrieve a list of all records with just one unique entry in column ID. Somehow like an inverted duplication-checker.

I wanted to solve this using a filerByFormula but have no idea how to do it (I miss the group by operator of SQL at this point).

How could I manage this using the API?


#2

While not exactly a “Group By” operator, here’s two approaches I’ve used:

A) Use your programming language to create a list of distinct IDs from a list of all the record IDs. Use the “fields” parameter to specify that you only want the API to return the ID field/column to speed things up and cache the results since, depending on your table size, this request could take 10s of seconds.

B) This requires creating a “reference” table with a “basket” field and a “distinct” field.

  1. Create table “reference”
  2. Create field “basket” that is a link back to your first table
  3. Create field “distinct” that is a rollup that uses the ID field and ARRAY_UNIQUE(values)
  4. To get the “distinct” ID’s from the field in the reference table, you’ll first need to fetch all records from your IDs table where the reciprocating linked field (called “reference”) is FALSE() and update them so they’re linked.
  5. then you can simply query the value of the “distinct” field of the “reference” table!

Ug… And this is why Airtable will never compete with any other data source as a backend unless the API (and probably underlying data storage engine) is overhauled OR unless you develop your own API interface and caching layer that can make some of these basic scenarios, well… basic. The Airtable interface ROCKS! And the API is great, but it’s just not built or optimized to replace a relational database backend.


#3

I don’t think there are other options either…


#4

Many thanks for your replies! I think I will fetch all records and filter then in the java backend. I had the little hope that I could reduce the amount of data to fetch.


#5

OK, I’m confused.

When you say

do you mean

  1. All records with one or more entries in {ID} where all entries are identical

or

  1. All records where the contents of {ID} are not found in any other record?

If the latter, I would think in Chester’s example, rows 1 and 4 should be distinct but not rows 2 and 3.

Of course, there’s always option

  1. Something else entirely.

Which is the right interpretation?
. _____________

edit

Not an API solution, but it’s relatively easy to identify uniques by adding a couple of columns to the base, if that’s a possibility…


#6

Sorry for the confusion,

It is the option 2. I am looking for.
Your example is great. This could reduce my amount of rows to load. I will investigate in it to adopt it for my project.


#7

Thanks!

Something I failed to mention is that your IDs need to be entirely unique; for instance adding a fifth record, ‘bluegreen,’ will fail because record #4 invalidly matches ‘bluegreen’. In such a case, concatenating a comma to the beginning and end of both ‘ID’ and ‘{All IDs}’ should work – as long as there are no embedded commas in ‘ID’…