Help

Analysis of causes and scenarios leading to data loss, and what "Backup & Restoration strategy" might be used for each

Topic Labels: ImportingExporting
2019 1
cancel
Showing results for 
Search instead for 
Did you mean: 
TFP
6 - Interface Innovator
6 - Interface Innovator

Introduction

There are many causes that might lead to loss of data, or, more generally, to the need to restore data from a previous “point in time”.

I’ve also defined what are the “key findings” I learned from that analysis and find most relevant.

I’ve summarized my findings through an Executive Summary below, and then analysed all scenarios I could think of that might cause a data loss. (from huge to small)

Also, I created a poll below regarding what you are worried about that might cause data loss for your business. :bar_chart:

Context

I realized such analysis for my company’s interests first (Unly), but thought it could be useful to the Airtable community.

Our goal was to figure out against which causes/scenarios we want to protect our data from, so we might apply the proper Backup & Restoration strategies.

Key findings:

  • For most situations, the Airtable “Trash” will be the most useful and efficient way to restore almost all deleted data (Base, Table(s), Field(s), record(s))
    • :warning: One must be careful when restoring tables/fields and make sure to do so in a proper order, to keep relationships intact.
  • The Airtable Snapshot might be “good enough” in many “data loss” scenarios, or, at least, act as a good support to retrieve former data quickly and efficiently.
  • Nothing beats an easy copy/paste for data restoration, if it can be done, then it’s the most intuitive and fastest way to restore corrupted/missing data, as long as it doesn’t affect multiple tables. (or relationships might break)
  • Providing a bit more configuration about Airtable Snapshots might help many people feel more secure regarding backups. (especially, frequency)
    • If one could make a daily snapshot, while keeping only 30 active snapshots, they would have a whole month of day-by-day backups, and that’s usually more useful than having a 6 months old snapshot.
    • Very old snapshots are mostly useful for history and keeping trace of things, but can’t be used as a potential “recovery backup”. Depending on the business, data from 7 days ago might completely stale and unusable as a backup recovery.
    • Allowing more “recent snapshots”, like “one snapshot every 6h for the current day”, then 1 snapshot per day for 30 days, then 1 snapshot per month for 6 months might be a much more reliable snapshot strategy for disaster-recovery.

  • The API doesn’t allow any destructive action for Bases, tables and fields (renaming, deletion).
  • The Base that are deleted go into a Trash that cannot be wiped, even manually. (that’s a very good thing)
  • The tables/fields/records that are deleted go into a Trash that can be manually wiped (I wish it’d ask for an email confirmation, to protect against password leak and malicious intents)
  • Records that are deleted through the API also go into the Trash, and can be restored from there in a similar fashion as those deleted through the UI.
  • Custom Apps don’t allow destructive actions either (Base, Table, Field renaming/deletion), and a field type cannot be modified. But it allows creating tables and fields (while the API doesn’t allow such thing).
  • Overall, the only way to delete a Base, table or field is through the UI, as an admin user. This is very important, because it brings a lot of confidence in the inability of 3rd parties to mess big time with our bases, even if they have our API token, or are installed within our base (for Apps).

Recommendations

If I had 3 recommendations to give to the Airtable team, it’d be:

  1. Allow us to tailor the Snapshot frequency for our business needs (paid plans)
  2. Slightly change the snapshot versioning to keep more “recent snapshots” and keep less “outdated snapshots”, and purge them more often, only keeping the relevant ones when they get too old.
  3. Alternatively, improve the API to read fields metadata, so we might build better Backup & Restoration tools for the community. (basically allow reading tables/fields and field configuration through the API, similar to what’s possible with Apps)

Executive summary:

Before asking what tool you should use for your Backup and Recovery strategy, one must ask themselves against what causes of data loss they want to protect themselves.

We identified below about 10 different causes of data loss, and each has its own way of being dealt with. The native Airtable tools (snapshot, undo/redo, trash) help quite a lot more than I had believed, before running this analysis. Unfortunately, they aren’t 100% reliable and any serious business still need, at the very least, to have an automated data backup strategy, even if it only backs up the data themselves (no schema backup).

My personal opinion is that having our own “full backup recovery system (schema + data)” is not an absolute necessity. After analysing all potential risks, we figured most of the risks we idenfied can be resolved easily enough by using the tools provided by airtable (undo, trash and snapshot).

Even though, we will keep making our own backups, through our open-source tool https://github.com/UnlyEd/airtable-backups-boilerplate, because we feel necessary to have a backup of the data, at the very least. But this backup will not allow us to make a full recovery, and it’s not its purpose.

One thing is for sure, the minimal acceptable backup must contain the data themselves, even if the whole data structure isn’t included. Also, things like relationships and data types should be made obvious in any backup, to be exploited by any in-house tools that might restore the data in a way that suits your needs.

Also, while there might be various ways to go about “how to restore” data (which are strongly related to the cause of the data loss), we could rely on a single way to perform the backup itself (which would allow different data recovery strategies) .

Meaning, the backup structure should be as exploitable as possible, and contain as much information as possible about both the base structure in general (tables, fields and their configuration) but also the data themselves. Then, how that backup is actually used will depend on the situation, but the original backup should contain enough metadata for different data recovery strategies to be possible.

Most of the Base metadata available to developers is very limited when using the API. It’s a bit better when building an App.

One should wonder whether a custom App (previously “Block”) isn’t more suited for this kind of schema-intensive jobs. The main reason being the App API has access to a lot more of the Base metadata than the API does, even if it has other strong limitations.

Maybe the best tool that could be created at this time, would be a smart use of both the API and a custom App.

I could imagine a combination of both tools. The App being the one that reacts to changes in the data structure, and then call an external tool that performs the backup. Similarly, an App could be made to use a backup as input, and generate all tables, fields, their configuration and then import the data themselves. While it still would cover 100% of the configuration (formulas being lost, for instance), it would make it possible to automate the restoration of a whole base. Which isn’t doable by any tool at this time, besides by the native Airtable Snapshot.

What scenarios do you believe you need to be able to recover from?
  • Base deletion
  • Table deletion
  • Table data malicious alteration (massive/bulk)
  • Field deletion
  • Field configuration alteration (data type change)
  • Record deletion
  • Records deletion (massive/bulk)
  • Record alteration
  • Records alteration (massive/bulk)
  • Record single cell deletion/alteration

0 voters

What options would you need from a recovery tool?
  • Full restoration (all tables, fields, their configuration + data, into a new base)
  • Partial restoration (select tables, fields + data to be restored, into a new base or existing base)
  • Compare data differences between two backups (highlight what has changed) and help to apply selected changes
  • Compare schema differences between two backups (highlight what has changed) and help to apply selected changes
  • Filter tables/fields and records based on the data, to reduce the amount of displayed information when trying to compare to data sets
  • Automatically repair broken relationships when restoring data

0 voters

Causes and scenarios:

Let’s try to break down the main causes that might lead to a data loss and how that might happen (scenarios), from huge to minimal data losses:

1) A Base was deleted

  • Potential reasons:
    • An admin mistakenly deleted the whole Base, through the UI
    • An unknown 3rd party deleted the whole base on purpose, through the UI (password leaked)
  • Potential recovery strategies:
    • Recover from “Trash”
      • Airtable allows restoring from “Trash” a Base for 7 days upon deletion. You can cancel the deletion at anytime for 7 days, and you get an email saying the base will be deleted in 7 days.
      • This is the most efficient recovery strategy. Assuming you receive and read the email, you should know that you can restore the Base for 7 days. This option is preferred in most cases and makes this critical and destructive action easily recoverable without any data loss.
    • Ask Airtable support:
      • I’m not even sure if they can do anything for you. But, you can always ask (the bigger you are, the more likely it is they might spend time finding a solution for you, I suppose).
    • Manual “full restoration” (schema + data)
      • In the event where you didn’t restore the Base from Trash on time, you won’t have any other choice than recreating the Base, all tables and all fields and their respective configuration. And then, import your data back.
    • Use an Airtable Snapshot:
      • The native “Snapshot” feature would be completely unusable because the snapshots are destroyed alongside the base itself.
Analysis:

This is the most destructive action. It’ll delete absolutely everything, leaving no trace.

It is straightforward to undo it from the Trash up to 7 days after deleting the Base. But, once the allotted time is up, then the only way to recover from it, is to have a “full backup” that allows to recreate the base from scratch.

Also, a malicious attacked wouldn’t be able to “Empty the trash” manually, as Airtable doesn’t allow it.

Otherwise, a good recovery tool for this scenario would recreate a base with all its tables and all their fields, but also configure all fields properly (Formulas/Rollup, Long Text as Rich Text, default values, but also advanced things like custom views, forms, comments, etc.).

Even if not everything can be automatically restored, a backup that allows for automated restoration and also contains the configuration that needs to be done by hand (what cannot be automated) would be quite helpful, to configure the Base as it was.

Summary:

Overall, this is a highly critical action, with very low probability to happen. (assuming only trusted people have admin access)
It is easily recoverable from, thanks to the trash.

But, if you don’t react within a 7 days period, you’ll end up losing absolutely everything without any way to restore the data if you don’t have a “Full backup recovery system” at your disposal.

:warning: This is the main scenario where you might need a full backup recovery system (schema + data), in the very unexpected event where you’d loose the whole base.

2) A table was deleted

  • Potential reasons:
    • An admin mistakenly deleted the table, through the UI
    • An unknown 3rd party deleted the table on purpose, through the UI (password leaked)
  • Potential recovery strategies:
    • Recover from “Trash”:
      • Your table will be moved to the Base “Trash” and can be restored from there for 7 days. (
      • This is the most efficient recovery strategy. Assuming you receive and read the email, you should know that you can abort the operation for 7 days. This option is preferred in most cases and makes this critical and destructive action easily recoverable without any data loss.
      • :warning: If you delete more than one table at a time and need to restore several tables, make SURE to restore them in the right order (from top to bottom, which is the same as from most recently deleted to last recently deleted). Otherwise, you’ll lose the existing relationships and you’ll need to link them back, by hand.
        • If you’re unsure about the order and don’t want to take the risk of messing it up, I suggest you create a new base from one of the template and test this behavior before doing any restoration.
        • Duplicating the base after the tables were added to the Trash will not contain the “Trash” bases (you won’t be able to make a Base backup in case you do it wrong)
    • Manual partial “full restoration” (schema + data for one/several table)
      • In the event where you didn’t restore the table from Trash on time, you won’t have any other choice than recreating the table, and all fields and their respective configuration. And then, import your data back.
    • Use an Airtable Snapshot:
      • The usage of the native “Snapshot” might come in handy, assuming it’s not too outdated. Even so, it would still be useful because it’d allow you to generate the deleted table and all its data. Thus, you could manually recreate all fields and their configuration in the altered base, and then copy all data from the restored table (from the snapshot base) and paste it back into the altered base.
        • :warning: This is a good trick. Even if you’re unsure of what to do, restoring the most recent snapshot into a new Base is probably a good idea, so that it won’t automatically expire (especially on Free plan) and you still might access those data later on, if needed.
        • It might be as effective as a custom restoration tool, and might even be faster (copy/paste), depending on the size of the data.
        • You wouldn’t restore Views/Automations through data copy/paste though, but you could manually replicate them, using the Snapshot Base as model.

Analysis:

This is another very destructive action, that can easily be recovered from, thanks to the temporary Trash.

Then, using a Snapshot can be a good workaround, and might be “just good enough” for many use cases. It’s recommended to use it anyway to make sure you still have access to the data from the snapshot (which might expire) later on, even if you don’t think you’ll need it.

Otherwise, a good recovery tool for this scenario would allow to either:

  • Recreate the lost table within the existing Base, automatically create all its fields and their configuration, and restore all the data from the table. Then, optionally restore all broken relationship by looking for all relationship fields and update their value. (might need to create a temporary field for that, but let’s not get too technical)
    • :warning: If several tables were deleted and need to be restored, then it might become very difficult (technically) to recreate all tables and all relationships correctly.
  • Create a whole new Base, similarly to the previous scenario. But you might suffer more data loss, on unrelated tables, although the data integrity itself would be better.

The choice of the scenario you might want to use might depend on “how old” is your last “full backup”. Because, if you create a new Base from an existing backup, and if this backup doesn’t contain all the data the current base owns (the one where the table was wrongfully deleted), then you might lose data nonetheless.

Ideally, a tool that allows both of those scenarios and let the user decides what’s best for business would be very great.

Summary:

Overall, this is a highly critical action, with very low probability to happen. (assuming only trusted people have admin access)
It is easily recoverable from, thanks to the trash.

But, if you don’t react within a 7 days period, you’ll end up losing the whole table. Data loss might be mitigated through the restoration of a previous snapshot, but data integrity might be affected.

3) A table was altered (edge case)

  • Potential reasons:
    • A malicious 3rd party massively changed existing content to links to a virus site or p0rn site or adware, or anything alike (most likely through the use of a leaked API token)
  • Potential recovery strategies:
    • Manual partial “data restoration”
      • Assuming the schema wasn’t altered (no deleted field nor field configuration changes), you only need to restore the data from a previous moment “back in time”.
      • Any backup that contains the data should be enough to handle this crisis. Relationships might not even be affected.
      • But, it might be difficult to spot what has changed exactly, especially if several tables are affected. And in doubt, some people might prefer to perform a full data recovery.
    • Use an Airtable Snapshot:
      • Might be a good solution to copy/paste the altered data from the Snapshot Base, probably the fastest and most effective, assuming the Snapshot isn’t too outdated.
      • In doubt, and in order to ensure no traces of the alteration are left, the usage of a Snapshot from prior the alteration might be the quickest and most efficient, but it’ll likely lead to data loss.

Analysis:

There is no built-in way to deal with such situation. (besides the Snapshots)

Using a Snapshot can be a good workaround, and might be “just good enough” for many use cases. It’s recommended to use it anyway to make sure you still have access to the data from the snapshot (which might expire) later on, even if you don’t think you’ll need it.

A good recovery tool for this scenario would allow:

  • To see the difference between each altered record and their counterpart from a previous backup, and allow the user to decide which one they want to keep.
    • This might be very tedious if there are tons of records, but might also be very helpful if there are only a few dozens/hundreds.
    • It might also help understand what part of the data were altered. (and figure what part of the data could be overwritten from an earlier version, and which part should be left untouched)
  • To apply a previous backup on existing records, and allow overwriting existing records.

This scenario is a bit tedious to anticipate because the need of the recovery strategy is very business-oriented. One might not know what they need until the need arises.

Therefore, a flexible tool is probably what would be the most efficient and cover the most use-cases (instead of a limited set of options).

Also, it’s not the most common use case, but rather an edge case.

Summary:

Overall, this is a highly critical action, with medium probability to happen, because the same token is used by all 3rd parties connecting to any of your Airtable base, any 3rd party might be the cause, and you might not be able to know which one performed such vilainous action. Identifying the culprit might be a challenge if you have many 3rd party integrations.

The snapshot might be a good enough solution to recover from this, and might be enough in many cases.
Alternatively, a proper restoration tool that helps you locate what are the affected data will be most helpful to decide what to do.

4) A field was deleted

  • Potential reasons:
    • An admin mistakenly deleted the field, through the UI
    • An unknown 3rd party deleted the field on purpose, through the UI (password leaked)
  • Potential recovery strategies:
    • Undo the action:
      • You can “Undo” by clicking on the “Undo” popup, or pressing the ctrl/cmd+z shortcut.
      • This is the most effective way to cancel the destructive action, as it’ll restore all data in a breeze.
    • Recover from “Trash”:
      • Your field will be moved to the Base “Trash” and can be restored from there for 7 days.
    • Manual partial “data restoration”
      • You’d only need to restore the data from a previous moment “back in time”.
      • Any backup that contains the data should be enough to handle this scenario.
    • Use an Airtable Snapshot:
      • Might be a good solution to copy the deleted column from the Snapshot Base and paste in into the field (after creating it back manually), very effective, assuming the Snapshot isn’t too outdated. (but even though it’s outdated, only one column would be affected, which might be acceptable, or not, depending on the data sensitivity)
        • :warning: This is a good trick. Even if you’re unsure of what to do, restoring the most recent snapshot into a new Base is probably a good idea, so that it won’t automatically expire (especially on Free plan) and you still might access those data later on, if needed.
        • It might be as effective as a custom restoration tool, and might even be faster (copy/paste), depending on the size of the data.

Analysis:

The “undo” action is the easiest way to resolve the issue, and it can be used for 7 days from the “Trash”! Most mistakes should be easily corrected.

Then, using a Snapshot can be a good workaround, and might be “just good enough” for many use cases. It’s recommended to use it anyway to make sure you still have access to the data from the snapshot (which might expire) later on, even if you don’t think you’ll need it.

Otherwise, a good recovery tool for this scenario would allow:

  • To load a previous backup and allow to restore a single field, by overwriting all existing records.

Summary:

Overall, this is a lowly to highly critical action (depending on the importance of the column for your business), with very low probability to happen. (assuming only trusted people have admin access)
It is easily recoverable from, thanks to the trash.

But, if you don’t react within a 7 days period, you’ll end up losing the whole field and associated data. Data loss might be mitigated through the restoration of a previous snapshot, but data integrity might be affected.

5) A field configuration was wrongfully changed

  • Potential reasons:
    • An admin mistakenly deleted the whole Base, through the UI
    • An unknown 3rd party deleted the whole base on purpose, through the UI (password leaked)API token
  • Potential recovery strategies:
    • Undo the action:
      • You can “Undo” by clicking on the “Undo” popup, or pressing the ctrl/cmd+z shortcut.
      • Note that you lose that ability if you refresh or close the page where you performed the action.
      • This is the most effective way to cancel the action, as it’ll restore all data in a breeze.
    • Manual partial “schema or backup restoration”
      • You could open a previous backup and see how the field was configured back then, and change it back.
        • Depending on the field and how it was altered, that could be all there is to do.
        • It might also have altered the configuration in a non-backward compatible way, and changing it back might not revert the data to their original state.
      • You could apply an existing backup to overwrite all affected records, once you’ve manually changed the field configuration back to its original state.
    • Use an Airtable Snapshot:
      • Might be a good solution to copy the altered column from the Snapshot Base and paste in into the field (after changing back its configuration manually), very effective, assuming the Snapshot isn’t too outdated. (but even though it’s outdated, only one column would be affected, which might be acceptable, or not, depending on the data sensitivity)
        • :warning: This is a good trick. Even if you’re unsure of what to do, restoring the most recent snapshot into a new Base is probably a good idea, so that it won’t automatically expire (especially on Free plan) and you still might access those data later on, if needed.
        • It might be as effective as a custom restoration tool, and might even be faster (copy/paste), depending on the size of the data.

Analysis:

The “undo” action is the easiest way to resolve the issue, but it can only be used in the current session. Most mistakes should be easily corrected even though. (It is not possible to “Undo from Trash” for this kind of action)

Then, using a Snapshot can be a good workaround, and might be “just good enough” for many use cases. It’s recommended to use it anyway to make sure you still have access to the data from the snapshot (which might expire) later on, even if you don’t think you’ll need it.

Otherwise, a good recovery tool for this scenario would allow:

  • To load a preview backup and see for yourself what was the field configuration back at that time.
  • To load a previous backup and allow to restore a single field, by overwriting all existing records.

Summary:

Overall, this is a lowly to highly critical action (depending on the importance of the column for your business), with medium probability to happen. (assuming only trusted people have admin access, they might do it without noticing and not be able to undo by the time they figure it out)

Restoring a snapshot might be the most straightforward way to go, to restore the data.
Alternatively, a proper restoration tool that helps you restore the affected data might help restoring a more up-to-date version of the data.

6) Record(s) was/were deleted

  • Potential reasons:
    • An admin mistakenly deleted the record(s)
    • An unknown 3rd party deleted the record(s) on purpose
    • A trusted 3rd party deleted the record(s) by mistake or on purpose, through the use of the API token
    • A trusted 3rd party deleted the record(s) by mistake or on purpose, through an installed App/Block
    • A user deleted the wrong record(s)
  • Potential recovery strategies:
    • Undo the action:
      • You can “Undo” by clicking on the “Undo” popup, or pressing the ctrl/cmd+z shortcut.
      • This is the most effective way to cancel the destructive action, as it’ll restore all data in a breeze.
    • Recover from “Trash”:
      • The records will be moved to the Base “Trash” and can be restored from there for 7 days.
    • Manual partial “backup restoration”:
      • You’d only need to restore the data from a previous moment “back in time”.
      • Any backup that contains the data should be enough to handle this scenario.
    • Use an Airtable Snapshot:
      • Might be a good solution to copy the deleted records from the Snapshot Base and paste them into the altered base, very effective, assuming the Snapshot isn’t too outdated. (but even though it’s outdated, only a few records would be affected, which might be acceptable, or not, depending on the data sensitivity)
        • :warning: This is a good trick. Even if you’re unsure of what to do, restoring the most recent snapshot into a new Base is probably a good idea, so that it won’t automatically expire (especially on Free plan) and you still might access those data later on, if needed.
        • It might be as effective as a custom restoration tool, and might even be faster (copy/paste), depending on how many records are concerned.

Analysis:

The “undo” action is the easiest way to resolve the issue, and it can be used for 7 days from the “Trash”! Most mistakes should be easily corrected.

In the unfortunate case where you missed the deadline to restore the records (or the trash was emptied), using a Snapshot can be a good workaround, and might be “just good enough” for many use cases. It’s recommended to use it anyway to make sure you still have access to the data from the snapshot (which might expire) later on, even if you don’t think you’ll need it.

Otherwise, a good recovery tool for this scenario would allow:

  • To load a preview backup and find the records that were deleted, and then restore them.
  • To restore all related records accordingly (relationships).

Summary:

Overall, this is a moderately critical action, with medium probability to happen. It’s probably what’s the most common use-case of data loss, often caused by manual mistakes.

Restoring from the trash is very easy, even if you missed the “undo” action.
Alternatively, restoring a snapshot might be the most straightforward way to go, to restore the data manually.
Alternatively, a proper restoration tool that helps you restore the affected data might help restoring a more up-to-date version of the data.

7) Record(s) was/were partially altered

Summary:

A complex recovery tool doesn’t seem necessary to restore a single record, unless the data is really critical for the business.

The native Airtable “revision history” feature might be good enough to see the history of the record, and help figure out what has changed.

Alternatively, restoring a snapshot might be good enough to copy/paste the record’s affected cells.
Alternatively, one might still use a tool that displays a diff between the current record and a previous value, to visually help figure out what has changed.

😎 A record value (a single cell) was removed or altered (edge case)

Summary:

A complex recovery tool doesn’t seem necessary to restore a single record cell, unless the data is really critical for the business.

The native Airtable “revision history” feature might be good enough to see the history of the record, and help figure out what has changed.

Alternatively, restoring a snapshot might be good enough to copy/paste the record’s affected cells.
Alternatively, one might still use a tool that displays a diff between the current record and a previous value, to visually help figure out what has changed.

1 Reply 1
TFP
6 - Interface Innovator
6 - Interface Innovator

I reworked the above analysis, I noticed I had made several assumptions that were wrong and corrected them.