Airtable Backup & Recovery Strategies: Simply, a Nightmare

Bill_French · ‎Nov 03, 2019

I hope someone here can lend a few minutes and some thoughtful analysis to explain how to avoid all the risk that seems apparent in Airtable when formulating an off-Airtable backup strategy.

Some observations - feel free to tell me if and why my assertions are invalid…

Assertion #1

Exporting any table to CSV and then attempting to reimport into another fresh table is never going to work as most might assume.
Export from (a) using Airtable, import to (b) using Airtable - all is good, right?
The reality is that Airtable fields (such as formulas, attachments, links) are incapable of being represented in a flat CSV document. True?

Assertion #2

Exporting any table to JSON (using the API) and then attempting to reimport is not possible without writing another custom API process for each table because JSON loading is not supported in Airtable.
The reality is that Airtable provides no data loader for JSON. One must ask - given the vast support of complex data types, why isn’t JSON an import feature?
The pervasive availability of complex data structures and field types without any ability to recover such data through a reliable and efficient process exacerbates the risks. True?

Assertion #3

Importing a CSV file that contains previously exported lists will be created as Text fields. Any attempt to change the data type will erase the content of the field.
The reality is that in a crisis recovery mode, any CSV snapshots of the table(s) in crisis will contain the data you want, but you’ll be forced to create parsers through formula fields to reinstate the data. Or, users will be forced to recover through a custom-built API process. True?

Assertion #4

No human can possibly remember the design and meta-information about the solutions they create in Airtable.
Given that in any Airtable base, the tables, the names, the data model, field formatting, relationships, linkages, formulas, Blocks configurations and designs, and all of the connective tissue that holds a solution together represent the vast majority of the working solution - perhaps 90+% in some apps. Yet, there is no approach or best practice to protect this huge intellectual layer and investment that every user and company has made in their Airtable solutions. True?

If I have this all wrong, please help me understand what I’m missing.

Ambroise_Dhenai · ‎Jan 11, 2020

Some answers @Bill.French

Very likely true. They can technically be represented in a CSV format, assuming whoever does that does it well, but some data are gonna be lost, such as attachments indeed. I hardly see a 100% working solution here.
True, there is no JSON importer. You will need to use a script that send the JSON to their API in the background.
I pass, no idea.
True. Even using the API, there is no way to know what the tables of a base are. That’s very disappointing and sucks. It’s also proof of either very bad technical design decisions, or the will to disallow such usage. And I’m not even talking about the more complex stuff you quote, such as blocks, formulas, field formatting, etc. Even using the API not all that is discoverable.

Maybe this will help with your backups issues though.

Bill_French · ‎Jan 11, 2020

Thanks for the validation!

In #4 I’m not referring to the inability to know what and where all the tables exist; that can be pretty well overcome by maintaining a table of bases/tables, although, manually documenting the topology of all solutions would be tedious and likely error-prone. What I’m referring to in #4 is the inability to codify the relationships between tables and the dependencies required to produce viable data.

Blocks, formulas, and field formatting represent very important attributes of the solution. However, entity relationships and dependencies would also be lost in a recovery crisis and may require weeks or months to restore if at all.

BobBannanas · ‎Jan 11, 2020

I must say… your assertions are thoughtful and 100% concerning.

Ambroise_Dhenai · ‎Jan 12, 2020

Well, to give you a full idea of what’s stored in a backup made by the tool we built (using the full API capabilities), here is:

A backup, as JSON: 2020_01_11_18-43-39.json · GitHub
The template for you to be able to reproduce the backup to better understand data and relationships: Digital Video Production Template - Free to Use | Airtable

Here is a preview:

As you can see from the backup, the relationships are properly represented in the backup. Also, assets such as images can be restored because there are links pointing to https://dl.airtable.com/.

But what you won’t see, are the formatting rules (only data is represented), the fields type (and much more like blocks, formulas, etc.).

And, regarding tables, it’s a big deal. If you add another table (like, a relationship) and even if you use our tool to produce automated backups, those new tables won’t be backed up, because there is no way to know about them programmatically, so you’ll have to update the backup configuration every time a table is renamed, delete, or created. That’s definitely a big limitation for something meant to be “automated”.

Bill_French · ‎Jan 12, 2020

Ambroise,

Your tool is a gallant effort to overcome the nightmare I initially described in this thread.

I have also built similar solutions for enterprises by leveraging the API in similar ways. Much of the ability to perform a deep understanding of an Airtable solution and its content came from my work on Airborne. To fully index content for findability, you have to fully understand the content regardless of how it was designed.

This is great, but it’s one thing to say there are relationships, it’s yet another to extract the actual table names and fields used in every dependency, right? I don’t see anything in the preview screenshot that shows field “x” is a link field into table “y”, field “z”. Perhaps you have developed some magic to do exactly that and if so, I would certainly make that the centerpiece of your marketing strategy.

Backup Implies Restore

Any backup tool makes the subliminal assertion to its customers that they can fully restore the system in the event of an unlikely crisis. The true test of a backup tool is to demonstrate the steps and tasks required for complete restoration. Unfortunately, few backup services ever publish the true nature of the recovery process because it’s almost always very ugly.

As I’m sure you have imagined, every manner of complex workaround and convoluted formulas and data models that can be used to create an effective solution in Airtable is nearly impossible to predict. In fact, this forum documents hundreds of poorly-designed data models that use formulas to overcome many of Airtable’s subtle inadequacies. This undeniable state of the Airtable architecture adds to the difficulty of capturing seamless and restorable backup resources that can bring a solution back from the dead.

Indeed, and assuming these links are sustained post-crisis, you have a chance at restoring these artefacts. But there are two key issues I see with this assumption -

When a data crisis hits, there’s a chance that either Airtable is unable to provide access to your data, or the data has been lost altogether - perhaps unintentionally entirely deleted.
Some data access issues are related to nothing more than invalid index pointers; indices can be scrambled as we all know and have experienced.

In both of these cases, there’s a significant risk that attachment content will be lost or inaccessible. And even in the best of circumstances, you may get all the attachments but have no clue which records they belong to. These are both devastating losses to any solution that depends on record-embedded attachments and one of the many reasons I admonish solution designs that store content dependencies by value, and not by reference.

Storing by Value vs Reference

Airtable is pretty cool - you hand it important documents for a related record and it magically creates a new copy of the document, hands back a URL (inside the Airtable field as a collection), and discards the original source URL. By all accounts, this is a dream solution for almost everyone. It’s a fire-and-forget feature that we all love.

There’s a fundamental tenet of data preservation that says -

Never rely on ANY aspect of the SaaS platform to recover and restore ANY of the data.

Suggesting that dl.airtable.com links will be reliable and dependable in a data recovery crisis violates the fundamental idea that any given snapshot is fully independent of the storage system at-risk and can be reliably restored. As such, any backup plan that has this content dependency gets a C-Minus grade because it leaves an enterprise with a degree of risk that could be significant depending on the solution and the nature of the content you are entrusting ONLY to Airtable to sustain.

Separate and apart from the recovery risk is the nature of the storage architecture users choose for key document dependencies. In my view, while it’s very convenient to rely on Airtable’s “document attachment magic”, it accentuates risks that will only become fully known in a restoration crisis.

I Love Airtable

And I truly admire anyone who is trying to take on the backup-restore challenge. Please do not mistake my comments as an indictment of Airtable or your backup solution - these are simply practical things we should all think about when making Airtable solutions and planning for disaster when the not-likely-inevitable shit-storm happens. :winking_face:

Ambroise_Dhenai · ‎Jan 12, 2020

Actually, it is possible. If you look at the data at https://gist.github.com/Vadorequest/045e4fa6c0963cc9d25cb847b00c7e53 you will notice that records start with the rec prefix. (I believe that’s always the case, but I haven’t verified this assumption)

So, the following can be understood as a relationship:

"Agency":[
  "recBTWS8rPlKdxW5Y"
],

The problem is, you’ll have to guess to which table it’s related, as it isn’t specified. Ofc, you can guess it’s from the Agencies table, but good luck doing that automatically when field names aren’t deductible.

Perhaps you have developed some magic to do exactly that and if so, I would certainly make that the centerpiece of your marketing strategy.

To be fully transparent, there is no “marketing strategy” at hand here. We aren’t planning on building any kind of paid tooling for Airtable, we’re “just” OSS lovers and I have personally seen what a struggle Airtable backups are, hence the fact we have open sourced our work, for others to enjoy it. (and we wouldn’t have any “magic” to sell, anyway)

Restore

Our tool does backups, not restoration. And I totally agree that’s a must have… We’ve been working on it and the least I can say is that it’s very, very hard because of what Airtable API exposes.

But, my first problem to solve isn’t automated backup restoration. It’s performing managed backups that are actually exploitable. That’s what our tool does. It gives anyone the ability to be in control of the data, and perform safe backups that can be used for many things, including data restoration.

And, if restoring had been easy, we’d have open sourced it as well. But we don’t have that tool in-house yet.

Indeed, and assuming these links are sustained post-crisis, you have a chance at restoring these artefacts. But there are two key issues I see with this assumption -
When a data crisis hits, there’s a chance that either Airtable is unable to provide access to your data, or the data has been lost altogether - perhaps unintentionally entirely deleted.
Some data access issues are related to nothing more than invalid index pointers; indices can be scrambled as we all know and have experienced.

Couldn’t agree more. It’s unreliable backups at best. A true backup strategy would download all related assets (png, pdf, etc.) and store them within the backup, so that a restore tool could restore the whole thing for real, even if links are not working anymore.

Our current implementation doesn’t solve this, as we went for the easiest way (and, our inability to produce full 100% recoverable backups (due to Airtable’s limitations) even if we had implemented asset restoration played a big part of that choice, after all, better to have a “better-than-nothing” tool rather than no tool at all)

We love airtable as well, despite all those very serious limitations. I personally wonder very much if those limitations are caused by poor engineering decisions, or by the actual will of someone who doesn’t want to give such abilities. I seriously ask myself that question. Especially when such important community concerns are being ignored for years.

Bill_French · ‎Jan 12, 2020

@Ambroise_Dhenain,

All your points are good - and we all appreciate the support for the community. And a “better-than-nothing” approach is always better than nothing. :winking_face:

The inabilities present seem to suggest some intentionality, but I’ll bet they’re not. I think it’s a team (at Airtable) who created a super-usable data management app and the rapid success and adoption rate sort’a got away from them. The success and attraction by people who want simple, elegant solutions that they can create without wait have created serious demands on this company. Getting in front of the demand is not easy especially when there’s a sizeable code debt to repay.

I would urge the folks at Airtable to create some best-practice whitepapers concerning backup and more importantly - recovery. Their snapshotting feature (which is quite good) should also be enhanced to allow the data to be managed independently (off-domain). These are simple steps that could help enterprises achieve more comfort.

John_inNJ · ‎Jan 28, 2020

it’s not perfect, but I have a second airtable pro account I share my databases with. Then I copy them into another workspace at that second account. I do still risk losing it all if somehow all of airtable crashes.