Creating a public web view of an entire base -- caching and filtering ideas?

I’m about to develop a public view of our entire product database, which is one Airtable base of around 2000 records. Right now we’re having something like that in place, but basically just as an embedded view (the standard Airtable functionality) – but we’d like to enhance this public view, hence the custom developing need of it.
The functionality of our public view will have to enable filtering all of the products on multiple levels – e.g. product category, price range, colour … you name it.

Since this will be a public view, and hitting the api limits would happen easily with around 2000 records (I guess), I believe some sort of caching of the entire base data is in order. My main question is: how, and in what way?

My first thought is to poll for the Airtable base at like an hourly interval, and saving all that data in one json file, then reading that json file on the client side with vue.js and outputting it in a gallery sort of way, with all the filters in place.
That still leaves me with dealing with all that filtering in the json data … And handling json data of 2000 records on the client side – not too heavy?

Does anyone have suggestions on this process? Or already can notice a caveat? Something I really need to be aware of? Maybe a handy js (front-end) library?

Check out the Cloudflare Proxy Script running on Cloudflare workers:
https://builtonair.com/resource/airtable-proxy-cloudflare-worker/

I believe it handles caching automatically within Cloudflare workers, but you’ll want to verify.

1 Like

Thanks for the suggestion @openside, but that seems to require a paid plan on Cloudfare, which we are trying to avoid at the moment – sticking to our existing php hosting.

Hence some trickery with exported json files will be the direction, probably. Anyone knows any good techniques to query this sort of data? Be it on the clied side or server side …

@Chester_McLaughlin’s excellent Airpress Wordpress plugin provides an easy-to-use web interface to Airtable — especially for query-and-display applications. It’s been a while since I used it, but as I recall it performs some degree of caching and flow control transparently. It requires Wordpress, obviously, but if you’re already on php hosting, turning up a basic Wordpress implementation is an hour’s work.

There are only two hard things in Computer Science: cache invalidation and naming things.

That said, the way I handle caching in Airpress is by creating a hash of the exact request sent to the API. Then I use a “refresh” and “expire” limit to deal with invalidation.

When the “refresh” limit (in seconds) is reached, the existing cached data is served while an asynchronous request is made to refresh the cached data. When the “expire” limit is reached the page-load must wait while new data is fetched.

To me, this was a reasonable approach for Wordpress sites that may have high traffic (and will utilize the refresh limit mostly) as well as low traffic sites that need caching but don’t want to serve old data.

Paging is the most difficult thing to deal with as Airtable will only return a maximum of 100 records per request. This is a problem because if you cache “page 4” of the results separately from “page 5” you’ll get duplicate (or missing) records when records are added / removed.

To get around this in Airtable I actually don’t do paging at all!! Gasp. I know. Instead, I’ll loop a request getting 100 at a time until all records are fetched and cache that result with the original request.

On the plus side, adding/deleting/updating records doesn’t mess with the cached records, but on the downside “showing all records” could be problematic from a memory, processing, or even storage standpoint.

However, most queries I’ve ever written against Airtable are filtered/limited to some sane number of records any way (by the way, I think 2000 records is perfectly sane, it’s only 20 requests!). And if that number starts getting high, then I do progressive requests where the first request is for the Record ID and Created At fields only, I then do subsequent requests for specific records (or batches of records) based on what the interface needs.

Beyond that, I would recommend creating an import process (as opposed to a caching process) that cycles through all your Airtable data at a given rate and imports (creates/updates/deletes) a “local” database and never have your application touch Airtable outside that sync/import. That way you’ll have your own “local” that can handle paging, caching, etc (and should be as performant as you want) while still being able to frequently update that data from Airtable. The limit here would be something like 30,000 records a minute ( 5 requests per second * 100 records per request * 60 seconds a minute).

What I do on other projects is create my import script and schedule it to run every 15 minutes. My longest (unoptimized) script takes about 5 minutes to run through several thousand records, fetching related records, updating the local DB, etc.

Hit me up if you want to talk further.

1 Like

Rather than Vue, I use Elm (which compiles to Javascript) for the frontend. I export my Airtable bases to JSON files, using curl and the command-line JSON processor jq to merge the JSON exports, and to strip out (or rename) fields that I don’t want made public.

(I’ve also written this same code in Elixir to allow greater control, and to be more maintainable as I add more pre-processing steps. I also use Elixir to repeatedly load data into Airtable since it’s resilient, with supervisors that reboot it and keep it running 24/7.)

If you’re on the free Airtable plan, you should be able to handle its data limits on the front end with live filtering, etc. It really makes a smooth experience, since all the data is in memory. For larger sites, I “shard” my data into separate bases and load as necessary as the user browses that part of the website.

The other benefit of having your Airtable data cached is that your Airtable doesn’t need to be in a consistent state. In fact, it would be wonderful if the API permitted you to query a specific snapshot, but you can also achieve that by making a stable backup of your bases.