One table or many tables design challenge

David_Solimini · ‎Sep 02, 2021

Hi all –

I am building a production pipeline database for the nonprofit where I work. We currently track web publications, email, events, and social content in separate bases and that has made it difficult to visualize everything together and do cross-org and program-specific reporting. Our new base will unify all those content types into one pipeline for better reporting, more consistent data, and also tracking relationships between content types (ie, social post X is promoting web content Y). The different content types (Web, Email, Events, Social) have some shared fields, but each also has many fields which are unique to that type and our processes – tracking social content has different data than web content, for example.

With that context here’s the question: One table with all items regardless of type OR a table per item type and some kind of ‘unity’ table that brings things together for reporting and planning?

Option 1: One table to track all the content regardless of type.

Clearly the canonical choice.
An item with content type X will have many empty fields that are intended only for content type Y.
Requires at least 75 fields, which I worry will be difficult to manage/maintain
Makes it easier to make automation errors - one errant filter and you mess up tons of stuff.

Option 2: A pipeline table per content type (probably 4) and a “parent stubs” table. In other words, one All Content table with the fields that all content records have, with each record linking to another table with the content-type-specific fields. ie, for each row in ALL CONTENT where [Type]=Web, there’s a linked record in table WEB CONTENT with all the web-specific fields. Same for [Type]=Email, [Type]=Social, etc.

Has the potential to create orphan items disconnected from the central table.
More complicated for staff to use – “Why does everything have info in two places?”
Easier to develop each new pipeline version one at a time.
Functions that require support tables – journaling changes to a particular field, for example – end up being more complicated to implement because they either link to more tables or are implemented with multiple tables.

Option 3: Separate Tables per Content Type and a unified view created through automation. In other words, each current pipeline moves into the one DB and the common fields are synced into an All Content table.

Fixes the orphan and data-in-two-places problems of Option 2
But requires lots of automation work to make sure the All Content is synced up with the individual pipelines.
Could sync each pipeline out to another base and then sync that back in, but then field data becomes just text rather than clickable links to related items, etc.

In conclusion… I’m definitely leaning towards option1, but wanted to make sure there wasn’t a good method I’m missing! Your suggestions are very welcome.

Thanks much,
Dave
**

Alexey_Gusev · ‎Sep 02, 2021

Hi,
i would recommend to read this:
The primary field

and then
Combining multiple tables into one table with multiple views

I did cleanup of some old base, with ~50 tables of different size, and joining some tables into one helped a lot. some had near 100 fields, but large part of it was redundant info, like links (sometimes joining the same two bases each other), lookups, ‘orphaned links’ etc.
you don’t need to use joins, when data stored in single table.
also, take attention at record limit per plan:

Free - 1,200 records, 2GB attachment space per base
Plus - 5,000 records, 5GB attachment space per base
Pro - 50,000 records, 20GB attachment space per base

I would recommend to think in terms “Separate Tables per Content Type”, imagine these tables, but then join those having the same primary fields type, if other fields type and their number match at least 50%

Bill_French · ‎Sep 02, 2021

Your narrative is excellent but it lacks enough data to make a realistic recommendation. What’s missing is scale and budget constraints - how much data for each entity are you anticipating? Sometimes the best approach may be out of reach solely because of constraints related to the size of your data.

Are you aware that the sync feature could allow you to roll up the existing data model architecture into a wholly new, high-level base containing the data necessary to extract key metrics and reports? It’s very possible you needn’t change a thing about the current system to achieve your objectives.

David_Solimini · ‎Sep 10, 2021

Hi Bill –

Thanks for the feedback. Always appreciate someone asking the step-back-big-picture questions :winking_face:

A few answers:

Scale. We are a ~75 person think tank. Our current processes will track ~400 publications, ~400 mass emails, ~100 events, and ~3000 social media items in 2021. Each record has ~40 fields that are common between the various types (though many are lookups), and another 20-40 that are type-specific. (This does not account for columns that are purely for making automations work or for tracking performance data. Perf data is going to be in a separate table for reasons explained below.) Each item in Pipeline has to work through between 12 and 5 different phases from concept to completion, reporting, and archiving depending on its type. A variety of changes and conditions trigger notifications - either to my team or the owner of the pipeline item.
Budget. We’re a nonprofit. The time to build this is mine. I am not a professional developer, but I have some relevant background, having gotten my start building websites back in the mid-90s and through college, and having led re-dos of multiple organization’s data stacks, websites, etc. over the years. We have 3 staff and a few interns as users in our AT Pro Plan, basic Zappier for some connections, Mailchimp for mass email, Hootsuite for social, Salesforce for the finance and development team.

FYSA, the table structure I’m increasingly settling on:

Pipeline. Table for all tracked items.

All items have an Item Type (link), Project (link), and Staff Owner (link), and multiple Change Records (link).
Some items have Content Submissions (link), Email Distro Lists (link), Date Submissions (link), and Email Performance Record (link)

Item Types. Table of the different types of things we produce and process requirements associated with each. (For example, a longer Report requires CEO review but a shorter commentary piece would not, etc.) Many of these columns are used as lookups to determine if a pipeline item is subject to particular automations.

Projects. Canon for the organization’s Program/Project hierarchy. We have ~15 policy programs, each of which has multiple projects. Includes links to Staff table to pull in the responsible approver for pipeline items.

Staff. Name, email, slack, status for team members. Lets us automate things like status change email notifications, internal publication notifications, etc.

Change Records. Records produced via automations that record changes to particular fields so that we have an easily email-able record of updates made as an item moves through the pipeline. (dont believe I can get record change history via the API.) For example, If someone changes Pipeline.Phase, an automation creates a new record linked to the modified Pipeline record that notes the date, field that changed, the new value, and the user.

Content Submissions. When programs complete writing a product, or need to send it for review, they submit it via a form. That form confirms they have completed the relevant publication requirements (inherited by the Pipeline item via Item Type table), whatever their final titles and summary are, and an attachment (word). Plan right now is that entries here get notified to my team and then we can accept a submission (probably a button that sets up an automation to copy the submitted content into the appropriate fields in the Pipeline record).

Email Distro lists. Data pulled from mailchimp of our various lists (groups and tags).

Date Submissions. Similar to content submissions but much simpler – a way for staff to update their expected completion dates so that my team can plan releases/launches.

Email Performance Record. Each item in Pipeline with ItemType.category=Email gets a record here. On a schedule (X days after the email is sent), we will pull performance numbers into Airtable from the Mailchimp API. The record name is matched against the incoming data and populated accordingly. (This process is imperfect, which is why this is a separate table - too easy to inadvertently mess up lots of records.) Note eventually we will also have a Social Performance Records table that pulls social item performance from the relevant APIs, but will be later.

Yes. There are a few reasons for doing it fresh.

The existing bases were created as we grew and are inconsistent with each other in a number of ways that makes syncing to a central reporting base challenging.
In the intervening time, Airtable has introduced a number of features which would be a lot of work to take advantage of in separate bases.
Most persuasive to me is the value of having all the working records and reporting in one place is easier for my team compared to switching between one central synced reporting base and multiple separate bases of working records. (Could be different if sync were a 2-way affair.)

Bill_French · ‎Sep 10, 2021

Okay - and by “one place”, you mean “one base”, I think.

Given the scale and limited budget, I don’t think this is an irrational approach; it makes sense for the points you make obvious. And it seems you have a good grasp of the implications when tilting in favour of a flatter data model.

One of the big benefits of NoSQL platforms is the ability to manage variable record lengths to optimize performance, findability, and storage costs. However, in this case, the advantages of a unified base/table approach pay big dividends and the cost of many empty fields seem like a small price to pay for increased simplicity and reduction of “parts” and simplification in reporting.

People often forget that reducing parts in anything - from physical goods to data management, to process automation - is fundamentally a wise move. I recently touched on this topic here.

One aspect of Airtable to be cautious of is the expanded nature of formula-type fields to make stuff work. Unlike a spreadsheet which has the capacity to store formulas and values in the same cell, Airtable pushes its users to create what I refer to as “kludge fields” to overcome the lack of a unified ability to both store and compute dynamic values. This has the impact of creating some incentives to lean on formulas a bit too much and despite a relatively small number of data fields, there sometimes exists an equal number of supporting computation fields which drag down performance. Your annual data flow doesn’t seem like this should create any serious performance issues though.

Recommendation -

Keep the plane aloft;
Experiment with a unified data model in parallel;
Determine how best to transition from old to new once it proves to show the advantages you suspect;
Cutover to the new approach if the tests and experiments show you picked the right approach.

David_Solimini · ‎Sep 10, 2021

Yup!

Thanks :slightly_smiling_face:

Yes! I find that my “programmer brain” experience, limited as it is, has served me very well when thinking about organization and process design. I often find that people worry about creating single points of failure while forgetting that adding parts is also adding new, more complicated, failure modes.

Those hacks bother me, too – they always seem like covers for a missing feature. One of my goals with this rebuild is to minimize that kind of thing. I’m hoping that well designed views and very granular automations will cover most cases. (I expect I’ll be using lots of buttons to flip checkboxes.)

On the same page with you. I’ll add: “3.5. Change management comms and process documentation.”

Many thanks again, Bill.

Kevin_Smith2 · ‎Sep 17, 2021

Interesting read - I have the same problem, so maybe restating the problem in a generic way…

What structure (tables/links) should someone create, to enable one table to contain fields shared with entries in other tables, where the other tables “extend” the information in the common table.

Basically, its a MANDATORY one-to-one relationship that is needed?

But what you end up with, is the “shared” table having a field for each “extend” table, when it should really have 1 field called “child” that can be linked to any 1 of the sub tables…

But I don’t think thats possible in airtable right?