Googlebot blocked by robots.txt [HELP]

James_Murray · ‎Dec 28, 2019

Hi Everyone! :wave:

I am trying to use Airtable for my VueJS app’s database. I have it set to retrieve data using Axios. Everything renders find for the client, but I am wanting to have my site index on Google. I tested it using Google’s Mobile Friendly Test (search(dot)google(dot)com/test/mobile-friendly to see how the content and meta tags would render, but I receive an error message. This is for Googlebot.

Expected Result

See my site’s rendered HTML and show the content that shows when a normal user visits.

Actual Result

Error: X page resources couldn’t be loaded
Resource: https://api.airtable.com/v0/mybaseandviewandstuff
Type: XHR
Status: Googlebot blocked by robots.txt

It links me here (api(dot)airtable(dot)com/robots.txt) where i can see:

User-agent: *
Disallow: /

What I’ve Tried

I’ve tried:

Updating my apps robots.txt to add the below:

# Group 1
User-agent: Googlebot
Disallow: /nogooglebot/

# Group 2
User-agent: Airtable
Allow: /

# Group 3
User-agent: *
Allow: /

I tried adding a Crawl-delay for Googlebot in my Robots.txt
Screamining into the void (surprised this did not work)

Any help would be appreciated! :grinning_face_with_smiling_eyes:

Edit: The site is hosted on Firebase Hosting on a custom domain

-James

Bill_French · ‎Dec 28, 2019

Hey James - welcome to the Airtable community!

Describe the climate where you are hosting this site.

Also, tell me how your site would behave if 100 users all made requests at about the same time? Would your Axios API calls open new connections for each of the 100 users? Or, would it cache data from Airtable and perform GETs based only on changed data in Airtable?

James_Murray · ‎Dec 28, 2019

Hey Bill! Thanks!

I am hosting on Firebase Hosting.

To be honest, I have not considered your question regarding API requests. Currently, every time a user goes to the page it is a new request. I will add moving to cacheing to my to do list.

-James

Bill_French · ‎Dec 29, 2019

To be clear, a page request by a new user results is just one data request to Airtable? Literally one (and only one) call to the Airtable API?

Think about what a crawler is designed to do.

Imagine you have a catalog of items and a high-level query that exposes all the possible catalog items. For discussion purposes, imagine that each item is a separate page, and there are 500 items.

Google will quickly finds the high-level list of products and then attempts to index all 500 items. This results in a flurry of requests to your backend - a backend designed for the opposite of “flurries”. This is when Googlebot comes face-to-face with Airtable’s max API quota of five requests per second.

Your issue is not likely related to Google’s inability to crawl your content. Rather, it’s the opposite - it is not paced slowly enough to index all of the pages whose dependencies rely on a backend that is simply not designed to support indexation.

Your design should be modified to cache-forward Airtable data into a platform more capable of serving content (like Firestore).