I'm trying to parse a Json file from Airtable API, but it's not parsing empty fields or delimiters

Maxwell_Gren · ‎Feb 20, 2020

I’m trying to parse a Json file from Airtable API, but it’s not parsing empty fields or delimiters.
I’m currently trying to parse some data from Airtable API, and to some extent, it has worked.

My problem is, when I try to put it into a Dataframe, the empty fields are not parsed, and cause the columns to get displaced, hence the dataframe is never prop
erly built.

For example:

{‘offset’: --,
‘records’: [{‘createdTime’: time,
‘fields’: {‘A’: ‘1’,
‘B’: ‘2’,
‘C’: 3,
‘D’: ‘text’,
‘E’: object,
}]
Since I’m taking the first row of fields as dictionary keys, any time one of them is missing, for some reason it’s not separated in the code and it would be like this:

[{‘createdTime’: time,
‘fields’: {‘A’: ‘1’,
‘B’: ‘2’,
‘D’: ‘text’,
‘E’: object,
}]
And the script does not align it with its respective column. I need help because I’ve tried parsing the records in several ways and this is as close as I’ve gotten it to work. If anyone has experience with the Airtable API, it would be much appreciated. PowerBI queries can actually get the data properly so there must be a hidden separator somewhere in there.

I’m not super experienced in API requests, and I’ve mostly worked with non-complicated dataframes, so I humbly come to all of you for help.
Thanks in advance, Maxwell

Bill_French · ‎Feb 20, 2020

As you have seen, Airtable does not include JSON elements where data is not populated. This is usual and customary in almost every NOSQL systems’ API. I think this is the problem you are experiencing.

Basing all assumptions on the data in the first row from any table is a flawed data model architecture in your Python app (I assume Python only because of your reference to a “dataframe”). It must not use a row - indeed, any given row - as the basis for any dataframe assumptions.

Instead, you could use a number of approaches that will make your external app more successful.

Template Row Approach

Consider establishing a row in the source table as a “template” for your API activities. The purpose of the template row is to store representative values in every field - e.g., read this row first - establish the data model from it - and then proceed to populate the data frame. If you don’t like the idea of a “template” row in your actual data, make a copy of the table with one template row and use it as a way to convey to your API app the nature of the data to be expected.

Data Dictionary

Establish a complete data dictionary of your Airtable data model locally in your Python app. This is a little work, but it is also a good way to create documentation that will prove useful in many ways.

Data Sampling

In Python, it is very easy to retrieve an Airtable table and then sample every row to know the true field map required to use the information. By sampling the data first, you can know what fields you need to process - ostensibly building a dynamic data dictionary and thus, avoiding possible API process failures.

Sorry, there is no “separator” - there is simply nothing in the JSON payload and your code assumes every field is present. PowerBI overcomes this by using the data sampling approach - it dynamically senses all the fields that it must deal with. Any values in a visualization widget that are missing are also handled gracefully by PowerBI - something your Python app must also do.