CSV Import: Duplicate data errors

Topic Labels: Data Importing-Exporting
9318 4
Showing results for 
Search instead for 
Did you mean: 
5 - Automation Enthusiast
5 - Automation Enthusiast

I have a problem with CSV import that is messing up all the data. When I try to import CSV with more than one record with the same Primary field name, it updates another field to be the same. 

For example: 
Data 1:  Frederik (Primary Field) | Kiyomizu 1 (Field) 
Data 2:  Frederik (Primary Field) | Kiyomizu 2 (Field) 
Data 3:  Frederik (Primary Field) | Kiyomizu 3 (Field) 

It updates as: 
Data 1:  Frederik (Primary Field) | Kiyomizu 1 (Field) 
Data 2:  Frederik (Primary Field) | Kiyomizu 1 (Field)  <--- incorrect data in field
Data 3:  Frederik (Primary Field) | Kiyomizu 1 (Field)  <--- incorrect data in field 

Is there any way to work around this glitch? Please help! Thank you

4 Replies 4

Hey @changirene
This isn't actually a glitch! You're running into a structural problem with how you've built your data model.

There are two ways to deal with this.
The first method would be to identify another field that can uniquely identify your records and avoid the data overwrite. Without knowing what your data model looks like, I can only blindly recommend something like a phone number, an email address, or some sort of other id that you might have associated with your given guest records.

The second way would be to address an issue with your data design.
One of the most important best practices to observe when working with a database is that you should never have records with the same primary field name.

The primary field should always allow you to uniquely identify each and every record.
It's not a coincidence that even the first method falls under this methodology in a lot of ways.

If I were in your shoes and didn't want to correct the schema, I would simply have the CSV import create entirely new records then using something like the Dedupe extension to handle the merge and deduplication.


Hi Ben, Thank you for the quick response! 
The second solution wouldn't work as we import batch files daily, and sometimes the same guest (Primary) will book multiple room types (Field).

I am hoping to fix it, but no matter what field type, it keeps replacing the Field with incorrect data. I am confused as it only happens in this particular field. 
It is frustrating because every time I fix it manually, the next import will replace the data with incorrect data again. 

I'm really hoping there is a way to fix this! 

Yeah... it will never matter what the field types are. The key to a successful upsert (update/insert) operation in a database is your ability to uniquely identify your records, which is something that your current implementation will not allow you to do. The way to properly "fix" it would be to fix the structure that you're trying to organize your data in.
There's possibly some strange, inefficient workaround that someone else on the forums might have in mind for you, but without knowing what alternative fields you could match unique records to, you're in a rough spot.
Now... this is usually the part where Airtable (and databases as a whole) take a popularity hit for seeming like it can't do what you need it to do.

Thinking entirely in the abstract and speaking plainly, this seems like a layup to me.
It just so happens that I'm a bit bored this fine Saturday evening and want to stretch my legs a bit.
If you're curious what it might theoretically look like to implement a clean and scalable database for your needs, then here's a small demo for you to explore.

If you'd like to take a peek at the base I created, you can use this public viewing link. Currently, list views are not supported inside of publicly shared bases, so those will be absent from the visible views.
If you're actually curious to poke around and see how all the fields are laid out n' such, feel free to copy the base into your own workspace and play with it.

I don't really know anything about your data or your requirements, so I just filled in a bunch of fake data and filled in the blanks with basic contextual information to account for the things I don't know. Nonetheless, the schema generally remains the same.

Just looking at your screenshots and based on what you've posted, it seems that you're dealing with a few different objects.
You have customer objects. On the other end, you have property/location objects.
A given location can have multiple rooms associated to it.
Next, you have booking objects. A single booking is related to both a customer and the given room that's being booked.

For the sake of adding depth, we'll throw in a service object. Maybe these are things like bookable add-ons that a customer can add to a reservation.
If we draw out the schema, it ends up looking something like this:


These will be our tables.
Now that we've planned out what we're doing. We can pop open Airtable and get to work.
First is our customer table:



Here we have a single, unique record for each of our customers. In order to guarantee that we can easily identify our unique customers, I've leaned on the record autonumber to help us generate a unique primary field value. There's also some functionality to account for a contact's preferred and given name which can come in handy if you're managing communications or marketing tasks.

Next is our locations table:


Super simple. We've just captured some basic relevant information like the address and a name (think nickname) that each location goes by during everyday business operations.

Next up is the rooms table.
Here, we have the ability to define a room name, building/semantic identifier, or both.


We have all our room specific information here.
We could get pretty granular here. Some ideas that come to mind are things like tracking occupancy and getting some really robust rollups configured that can support reporting on overall capacity.

Next is the little spike I threw in for things like services and add-ons.
This is the services table:


I built a relationship between the locations table and the services table to allow us to theoretically manage the services that might be available at certain locations but not others.
This table only has four records, but a list view allows us to create a slick look at what's available for a given location.
Here's what it looks like in a grid view:


This brings us to our bookings.
I intentionally did this last because our bookings are a structural junction between customers and the rooms.


When it comes to your workflow, you would theoretically map your data import to the Bookings table.
If you're importing data from another platform, then the exact answer on how to do it just depends on what that data export looks like.

Regardless... this structure is robust, it's organized, and your relationships are clearly defined.
Additionally, you open yourself up to some really solid reporting on some of your operational metrics.
I even got a bit curious about what I could drum up as an example, so I added a quick exploratory interface dashboard for you to explore that will show you some of the visibility that you could gain when you trust in the process of clean data.



If you're importing data that has been exported from another platform, then you will need to pull apart your data and make sure you know where it should all live.
Flat importing a CSV of complex, in-depth data into a structured relational database is akin to trying to force the cube through the cylinder slot. Depending on the set you're playing with, you might get lucky and be able to kinda get it through, but you end up damaging the cube or the cylinder slot. Other times, it's just not possible.
In our case, when you force the shape through, the damage you're doing is found in the changes and workarounds that you might try to implement in order to just make it work.

For anyone reading through the forums that has been frustrated and struggling with getting things to click with Airtable, I'd be willing to bet that you didn't think through your data and the relationships between that data.

Okay. I have to stop myself. I could go for hours.
If you have any questions, curiosities, or want to explore something about what I shared, don't hesitate to reply and I'd be happy to provide any insight that I can.

Thank you for creating this Ben, this is a very robust setup. 

I have found a unique # to "merge" the records by, so by using "Booking ID" instead of "Guest Name", it works for now. 

I really hope to implement your more proper setup but running into serious learning curves everywhere! 
I am not sure how to implement my .csv in separate tables like that, but that is a whole other issue.