Help

Discover what data silos are costing your org in our commissioned Forrester study. Learn more

Creating linked tables with user-inputed "fuzzy" data

1155 11
cancel
Showing results for 
Search instead for 
Did you mean: 

I have a donation database:
Table 1: Donation Data: donations as they come in online (user inputed information)
Table 2: Donors: each donors’ record which is linked to table 1 so you can see all their unique donations

Problem: user inputted information sometimes changes (e.g. Meghan Moore, Meghan E Moore, Dr. Meghan Moore), but I want the donation data to link ONLY to one record in the Donor Table (Meghan Moore)

I want to find a way to link both so that no matter what the user enters, the donation data gets linked to the proper donor record without creating a new record in the linked table because it looks like a “new” name.

e.g.
Table 1:
Meghan Moore ; $20
Meghan E Moore ; $50
Dr. Meghan Moore ; $100

Table 2: Meghan Moore ; $20, $50, $100

Is there a way to do this automatically (assuming I may already know some of the possible aliases) or do I have to always manually check for near duplicates and then assign to the proper donor record?

Thanks!

11 Replies 11

Hi Meghan, welcome to the community!

Fuzzy linking is the stuff of machine-learning and unfortunately, Airtable doesn’t provide much assistance in this regard. But they do provide an API and that’s how it can be done with pretty efficient results.

I use FuzzySort, an open-source javascript library which I bake into an API app in Google Apps Script. I have not built any linking solutions but I have used it to find dupes and the accuracy of this library is the best available. It’s near perfect.

It’s possible that when Airtable releases its embedded javascript capability (which I hear is in beta now) you may be able to use libraries like FuzzySort in a block which could help your data link to closest donors.

@Bill.French in a similar vein, see my “Show and Tell” here:

JB

There are so many ways to do smart things with Python, and in many case with a third the effort and a sixth the code necessary with Javascript. I’m always envious of the Python guys when I am forced to create solutions with javascript. There’s something about Python that is just more fun. :slightly_smiling_face: Your show-and-tell example makes my point - you had a ball writing it up - admit it.

And despite my affection for Python, I guided @Meghan_Moore toward Javascript because there’s zero friction to (a) hosting the solution without cost, and (b) establishing a repeating chron process to maintain a relatively low latency update to the Airtable data. Google Apps Script (with Javascript) makes this possible.

Is there an equally frictionless way to deploy Python apps as web services, preferrably in a serverless environment and at low or no cost?

OK, I admit, it was fun :winking_face:

Yes, completely agree. From the Airtable scripting beta that I have seen (just a form to sign-up), it looks like the language will be JS or JS-like, so Python won’t be my way forward, but that’s OK, JS works too.

I haven’t tried this, but Google Cloud Functions in Python would be a possibility. Likely free unless there is particularly heavy usage. But not “simple” to set up, certainly not as simple as using the script environment of a Googlesheet.

I do not have access yet but I have been briefed by those who do and I’m fearful that this integrated scripting feature is going to be limited in the ways it can be deployed into an application. I’m not completely aware of the limitations (because they have not granted me access yet), so I don’t want to rain on a parade that hasn’t begun. However, what I heard did not sound like a pervasive way to solve intricate process challenges.

I am groping for a serverless Python hosting and development environment but only stumbled on this one recently. In my view, taking Airtable data to the the next level is best achieved with Python. Javascript is wonderful for process automation but Python provides so much more analytical opportunities that it seems to be the best fit in a data-centric environment. Javascript helps you move data, but Python helps you make sense of data.

Hey Bill and Jonathan - thank you so much! I’m in awe of what you’re both able to do with scripting and will await the public integration of this. Right now I think the scripting is a bit beyond me, but I’m super impressed and appreciate the responses.

Meghan: do you all collect email addresses?

If scripting is not your preferred choice, linking donors based on emails will at least reduce your work.

The assumption here is donors will, more often than not, use the same email address. The number of iterations they make with their name will not be the same with emails.

Just a thought. :slightly_smiling_face:

Long wait; 5 years+ on many things that were foregone conclusions. Fuzzy linking? Perhaps late 2024?