Hey @DanBliss Can you clarify a bit more what you mean by normalizing your data?
The first thought that comes to mind is a using a 'Single Select' Field, which limits the options to pre-defined values (example in screenshot). But I need more context to really understand what you're trying to do. Thanks!

Hey @DanBliss Can you clarify a bit more what you mean by normalizing your data?
The first thought that comes to mind is a using a 'Single Select' Field, which limits the options to pre-defined values (example in screenshot). But I need more context to really understand what you're trying to do. Thanks!

Hi Arthur,
Thanks for your response. In a small table with not much data, manually designating characteristics of the data is possible, as you suggest, but as the data set gets quite large (imagine expanding columns to include the Buyer; Name_of_Retailer; Store_Location; Recipient (Grandma, Auntie, etc). Now let's add several years worth of shopping - now you might have thousands of rows, the dataset becomes too large to update manually.
Now let's imagine that we wish to clean up the Recipient column data, so that entries like "Aunt Sally" and "Susan Smith", who are the same person, are shown consistently as "Aunt Sally". Typically that might involve a series of steps: (1) Create a working copy of Grocery_Shopping, (2) in the working copy remove all columns except Buyer, (3) remove duplicate Buyer names, (4) remove any empty values. This creates a table of unique buyer names, which we can call Buyer_Unique. Next create a crosswalk table of 2 columns, the first being a list of all unique Buyer values from Grocery_Shopping, the second column matching Buyer_Unique. The task of building the crosswalk is mostly human intervention (although with the increasing prevalence of AI, I expect this problem will go away). Once the crosswalk table is updated and complete, all matches made, a "view" of Grocery_Shopping could be constructed replacing Buyer with Buyer_Unique values: no more "Sally Smith" - instead they would all appear as "Aunt Sally". Such standardization of data involves a lot of normalizing of the data. I hope you get the idea. The question is... how best to automate this process so that as more data is imported, unique data sets are updated and links built to other table values, in a way that minimizes repetitive data editing?
Updated Grocery_Shopping:
Buyer Day Qty Item Type
Aunt Sally | Mon | 4 | apples | fruit |
Sally Smith | Mon | 10 | snap beans | vegetable |
George Costanza | Tues | 13 | strawberries | |
G. Costanza | Tue | 6 | potatoes | vegetable |
Martin Luther | Tues | 5 | kiwi | fruit |
Hareit Marshall | Thur | 23 | bananas | fruit |
H Marshal | Thurs | 6 | banana | fruit |
Hariett Marshall | Fri | 14 | chili pepers | spice |
List of preferred buyer names,
Table: Buyer_Unique
Preferred Name |
Aunt Sally |
George Costanza |
Martin Luther |
Hariett Marshall |
And here's a table, Buyer_Crosswalk matching Buyer to Buyer_Unique values. In the crosswalk table, the Buyer column is linked to Grocery_Shopping.Buyer and the Preferred Name column values are linked to the table Buyer_Unique.Preferred Name.
Buyer Preferred_Name
Aunt Sally | Aunt Sally |
Sally Smith | Aunt Sally |
George Costanza | George Costanza |
G. Costanza | George Costanza |
Martin Luther | Martin Luther |
Hareit Marshall | Hariett Marshall |
H Marshal | Hariett Marshall |
Hariett Marshall | Hariett Marshall |
We could avoid the crosswalk table by adding a column in Grocery_Shopping linking to Buyer_Unique. Whether this is advisable depends on the number of rows - with very large number of rows, it's more efficient to create the crosswalk table instead - and also better normalization.
Hey @DanBliss wow thanks for the detailed follow up, that really helped to better understand your situation.
Unfortunately this one falls outside my experience level and I don't believe there's much here I can help you with.
Sounds like you have a large and complex dataset you're working with, might be worth hiring an airtable consultant or agency to help you set it up and manage it.
But I'll leave it open here for someone else to chime in with some suggestions...
@DanBliss For your dataset do you have something you could use as a unique identifier such as an email?
This would take some time to set up either way, if you want to book a call to discuss further click this link here.
https://zcal.co/systemswithsteve/freeconsult
Steven Da Silva
Systems With Steve