Re: Can I run NodeJS to scrape Google search

Ken_Savage · ‎Jul 30, 2022

I want to run some code on within Airtable to run a NodeJS script that will scrape Google search results and then save the results to my base.

This is the code: Google API | ScrapingBee

Possible?

Daniel_Leeman · ‎Jul 30, 2022

Hi Ken,

Yes, you should be able to do any kind of scraping or analysis and then utilize the Airtable API to write the records into Airtable (as the final step in your app/script). The cool part is that Airtable generates API docs for each of your existing bases, by selecting your base from here: REST API - Airtable

Bill_French · ‎Aug 01, 2022

@Ken_Savage …

Far easier to utilize the Airtable SDK and script features. You don’t need Node (because Airtable has Node), and you don’t need an external process (because Airtable supports this type of stuff internally). Works with automation script, script extensions and custom apps, and the SDK makes it far easier to capture and store your scraped results into Airtable records/fields.

CAUTION: Scaping Google search results is a ToS violation, so I advise anyone doing this to do it right and above-board using a Google search API or any third party that can abstract your work away from a potential ToS violation.

Example…


output.markdown('# Google Search Example');

let url    = "https://www.google.com/search?q=Airtable+is+Cool&rlz=1C5CHFA_enUS908US908&oq=Airtable+is+Cool&aqs=chrome..69i57j69i64.3994j0j7&sourceid=chrome&ie=UTF-8";
output.text(url);

let options   = {
    "method": "GET",
    "headers": {
        "Content-Type"  : "application/json",
    }
}

await remoteFetchAsync(url, options)
    .then((resp) => resp.text())
    .then(function(data) {
        output.inspect(data);
    })
    .catch(function(e) {
        output.markdown("ERROR!");
        output.markdown(e.message);
    });

Ken_Savage · ‎Aug 01, 2022

Great answer Bill. I appreciate this insight too.

The only issue I see here is security and the google api not allowing so many requests coming from the same IP. That’s where ScrapingBee comes in handy with their proxies.

Bill_French · ‎Aug 01, 2022

That’s the whole point of using the API. It is sanctioned and you pay and they want and expect lots of requests coming from the same IP address.

What is not sanctioned is scraping HTML results directly. If ScrapingBee does all of that for you please remind me why we are having this discussion. :slightly_smiling_face: