Help

Re: Can I run NodeJS to scrape Google search

1711 0
cancel
Showing results for 
Search instead for 
Did you mean: 
Ken_Savage
4 - Data Explorer
4 - Data Explorer

I want to run some code on within Airtable to run a NodeJS script that will scrape Google search results and then save the results to my base.

This is the code: Google API | ScrapingBee

Possible?

4 Replies 4
Daniel_Leeman
4 - Data Explorer
4 - Data Explorer

Hi Ken,

Yes, you should be able to do any kind of scraping or analysis and then utilize the Airtable API to write the records into Airtable (as the final step in your app/script). The cool part is that Airtable generates API docs for each of your existing bases, by selecting your base from here: REST API - Airtable

@Ken_Savage

Far easier to utilize the Airtable SDK and script features. You don’t need Node (because Airtable has Node), and you don’t need an external process (because Airtable supports this type of stuff internally). Works with automation script, script extensions and custom apps, and the SDK makes it far easier to capture and store your scraped results into Airtable records/fields.

CAUTION: Scaping Google search results is a ToS violation, so I advise anyone doing this to do it right and above-board using a Google search API or any third party that can abstract your work away from a potential ToS violation.

Example…

image


output.markdown('# Google Search Example');

let url    = "https://www.google.com/search?q=Airtable+is+Cool&rlz=1C5CHFA_enUS908US908&oq=Airtable+is+Cool&aqs=chrome..69i57j69i64.3994j0j7&sourceid=chrome&ie=UTF-8";
output.text(url);

let options   = {
    "method": "GET",
    "headers": {
        "Content-Type"  : "application/json",
    }
}

await remoteFetchAsync(url, options)
    .then((resp) => resp.text())
    .then(function(data) {
        output.inspect(data);
    })
    .catch(function(e) {
        output.markdown("ERROR!");
        output.markdown(e.message);
    });

Great answer Bill. I appreciate this insight too.

The only issue I see here is security and the google api not allowing so many requests coming from the same IP. That’s where ScrapingBee comes in handy with their proxies.

That’s the whole point of using the API. It is sanctioned and you pay and they want and expect lots of requests coming from the same IP address.

What is not sanctioned is scraping HTML results directly. If ScrapingBee does all of that for you please remind me why we are having this discussion. :slightly_smiling_face: