One table or many tables design challenge

David_Solimini · 2021-09-02T17:23:28+00:00

Hi all – I am building a production pipeline database for the nonprofit where I work. We currently track web publications, email, events, and social content in separate bases and that has made it difficult to visualize everything together and do cross-org and program-specific reporting. Our new base will unify all those content types into one pipeline for better reporting, more consistent data, and also tracking relationships between content types (ie, social post X is promoting web content Y). The different content types (Web, Email, Events, Social) have some shared fields, but each also has many fields which are unique to that type and our processes – tracking social content has different data than web content, for example. With that context here’s the question: One table with all items regardless of type OR a table per item type and some kind of ‘unity’ table that brings things together for reporting and planning? Option 1: One table to track all the content regardless of type. Clearly the canonical choice. An item with content type X will have many empty fields that are intended only for content type Y. Requires at least 75 fields, which I worry will be difficult to manage/maintain Makes it easier to make automation errors - one errant filter and you mess up tons of stuff. Option 2: A pipeline table per content type (probably 4) and a “parent stubs” table. In other words, one All Content table with the fields that all content records have, with each record linking to another table with the content-type-specific fields. ie, for each row in ALL CONTENT where [Type]=Web, there’s a linked record in table WEB CONTENT with all the web-specific fields. Same for [Type]=Email, [Type]=Social, etc. Has the potential to create orphan items disconnected from the central table. More complicated for staff to use – “Why does everything have info in two places?” Easier to develop each new pipeline version one at a time. Functions that require support tables – journaling changes to a particular field, for example – end up being more complicated to implement because they either link to more tables or are implemented with multiple tables. Option 3: Separate Tables per Content Type and a unified view created through automation. In other words, each current pipeline moves into the one DB and the common fields are synced into an All Content table. Fixes the orphan and data-in-two-places problems of Option 2 But requires lots of automation work to make sure the All Content is synced up with the individual pipelines. Could sync each pipeline out to another base and then sync that back in, but then field data becomes just text rather than clickable links to related items, etc. In conclusion… I’m definitely leaning towards option1, but wanted to make sure there wasn’t a good method I’m missing! Your suggestions are very welcome. Thanks much, Dave **

+25

Alexey_Gusev
Brainy
1260 replies
Forum|Forum|4 years ago
September 2, 2021

Hi,
i would recommend to read this:
The primary field

and then
Combining multiple tables into one table with multiple views

I did cleanup of some old base, with ~50 tables of different size, and joining some tables into one helped a lot. some had near 100 fields, but large part of it was redundant info, like links (sometimes joining the same two bases each other), lookups, ‘orphaned links’ etc.
you don’t need to use joins, when data stored in single table.
also, take attention at record limit per plan:

Free - 1,200 records, 2GB attachment space per base
Plus - 5,000 records, 5GB attachment space per base
Pro - 50,000 records, 20GB attachment space per base

I would recommend to think in terms “Separate Tables per Content Type”, imagine these tables, but then join those having the same primary fields type, if other fields type and their number match at least 50%

B

+19

Bill_French
Inspiring
3263 replies
Forum|Forum|4 years ago
September 2, 2021

Your narrative is excellent but it lacks enough data to make a realistic recommendation. What’s missing is scale and budget constraints - how much data for each entity are you anticipating? Sometimes the best approach may be out of reach solely because of constraints related to the size of your data.

Are you aware that the sync feature could allow you to roll up the existing data model architecture into a wholly new, high-level base containing the data necessary to extract key metrics and reports? It’s very possible you needn’t change a thing about the current system to achieve your objectives.

D

+13

David_Solimini
Author
Inspiring
41 replies
Forum|Forum|4 years ago
September 10, 2021

Your narrative is excellent but it lacks enough data to make a realistic recommendation. What’s missing is scale and budget constraints - how much data for each entity are you anticipating? Sometimes the best approach may be out of reach solely because of constraints related to the size of your data.

Are you aware that the sync feature could allow you to roll up the existing data model architecture into a wholly new, high-level base containing the data necessary to extract key metrics and reports? It’s very possible you needn’t change a thing about the current system to achieve your objectives.

Hi Bill –

Thanks for the feedback. Always appreciate someone asking the step-back-big-picture questions :winking_face:

A few answers:

Scale. We are a ~75 person think tank. Our current processes will track ~400 publications, ~400 mass emails, ~100 events, and ~3000 social media items in 2021. Each record has ~40 fields that are common between the various types (though many are lookups), and another 20-40 that are type-specific. (This does not account for columns that are purely for making automations work or for tracking performance data. Perf data is going to be in a separate table for reasons explained below.) Each item in Pipeline has to work through between 12 and 5 different phases from concept to completion, reporting, and archiving depending on its type. A variety of changes and conditions trigger notifications - either to my team or the owner of the pipeline item.
Budget. We’re a nonprofit. The time to build this is mine. I am not a professional developer, but I have some relevant background, having gotten my start building websites back in the mid-90s and through college, and having led re-dos of multiple organization’s data stacks, websites, etc. over the years. We have 3 staff and a few interns as users in our AT Pro Plan, basic Zappier for some connections, Mailchimp for mass email, Hootsuite for social, Salesforce for the finance and development team.

FYSA, the table structure I’m increasingly settling on:

Pipeline. Table for all tracked items.

All items have an Item Type (link), Project (link), and Staff Owner (link), and multiple Change Records (link).
Some items have Content Submissions (link), Email Distro Lists (link), Date Submissions (link), and Email Performance Record (link)

Item Types. Table of the different types of things we produce and process requirements associated with each. (For example, a longer Report requires CEO review but a shorter commentary piece would not, etc.) Many of these columns are used as lookups to determine if a pipeline item is subject to particular automations.

Projects. Canon for the organization’s Program/Project hierarchy. We have ~15 policy programs, each of which has multiple projects. Includes links to Staff table to pull in the responsible approver for pipeline items.

Staff. Name, email, slack, status for team members. Lets us automate things like status change email notifications, internal publication notifications, etc.

Change Records. Records produced via automations that record changes to particular fields so that we have an easily email-able record of updates made as an item moves through the pipeline. (dont believe I can get record change history via the API.) For example, If someone changes Pipeline.Phase, an automation creates a new record linked to the modified Pipeline record that notes the date, field that changed, the new value, and the user.

Content Submissions. When programs complete writing a product, or need to send it for review, they submit it via a form. That form confirms they have completed the relevant publication requirements (inherited by the Pipeline item via Item Type table), whatever their final titles and summary are, and an attachment (word). Plan right now is that entries here get notified to my team and then we can accept a submission (probably a button that sets up an automation to copy the submitted content into the appropriate fields in the Pipeline record).

Email Distro lists. Data pulled from mailchimp of our various lists (groups and tags).

Date Submissions. Similar to content submissions but much simpler – a way for staff to update their expected completion dates so that my team can plan releases/launches.

Email Performance Record. Each item in Pipeline with ItemType.category=Email gets a record here. On a schedule (X days after the email is sent), we will pull performance numbers into Airtable from the Mailchimp API. The record name is matched against the incoming data and populated accordingly. (This process is imperfect, which is why this is a separate table - too easy to inadvertently mess up lots of records.) Note eventually we will also have a Social Performance Records table that pulls social item performance from the relevant APIs, but will be later.

Yes. There are a few reasons for doing it fresh.

The existing bases were created as we grew and are inconsistent with each other in a number of ways that makes syncing to a central reporting base challenging.
In the intervening time, Airtable has introduced a number of features which would be a lot of work to take advantage of in separate bases.
Most persuasive to me is the value of having all the working records and reporting in one place is easier for my team compared to switching between one central synced reporting base and multiple separate bases of working records. (Could be different if sync were a 2-way affair.)

B

+19

Bill_French
Inspiring
3263 replies
Forum|Forum|4 years ago
September 10, 2021

Hi Bill –

Thanks for the feedback. Always appreciate someone asking the step-back-big-picture questions :winking_face:

A few answers:

Scale. We are a ~75 person think tank. Our current processes will track ~400 publications, ~400 mass emails, ~100 events, and ~3000 social media items in 2021. Each record has ~40 fields that are common between the various types (though many are lookups), and another 20-40 that are type-specific. (This does not account for columns that are purely for making automations work or for tracking performance data. Perf data is going to be in a separate table for reasons explained below.) Each item in Pipeline has to work through between 12 and 5 different phases from concept to completion, reporting, and archiving depending on its type. A variety of changes and conditions trigger notifications - either to my team or the owner of the pipeline item.
Budget. We’re a nonprofit. The time to build this is mine. I am not a professional developer, but I have some relevant background, having gotten my start building websites back in the mid-90s and through college, and having led re-dos of multiple organization’s data stacks, websites, etc. over the years. We have 3 staff and a few interns as users in our AT Pro Plan, basic Zappier for some connections, Mailchimp for mass email, Hootsuite for social, Salesforce for the finance and development team.

FYSA, the table structure I’m increasingly settling on:

Pipeline. Table for all tracked items.

All items have an Item Type (link), Project (link), and Staff Owner (link), and multiple Change Records (link).
Some items have Content Submissions (link), Email Distro Lists (link), Date Submissions (link), and Email Performance Record (link)