Can I run NodeJS to scrape Google search

Forum|Forum|3 years ago
July 31, 2022
4 replies
33 views

Ken_Savage
New Participant

I want to run some code on within Airtable to run a NodeJS script that will scrape Google search results and then save the results to my base.

This is the code: Google API | ScrapingBee

Possible?

D

Daniel_Leeman
New Participant
Forum|Forum|3 years ago
July 31, 2022

Hi Ken,

Yes, you should be able to do any kind of scraping or analysis and then utilize the Airtable API to write the records into Airtable (as the final step in your app/script). The cool part is that Airtable generates API docs for each of your existing bases, by selecting your base from here: REST API - Airtable

Like

B

+19

Bill_French
Inspiring
Forum|Forum|3 years ago
August 1, 2022

Hi Ken,

Yes, you should be able to do any kind of scraping or analysis and then utilize the Airtable API to write the records into Airtable (as the final step in your app/script). The cool part is that Airtable generates API docs for each of your existing bases, by selecting your base from here: REST API - Airtable

@Ken_Savage …

Far easier to utilize the Airtable SDK and script features. You don’t need Node (because Airtable has Node), and you don’t need an external process (because Airtable supports this type of stuff internally). Works with automation script, script extensions and custom apps, and the SDK makes it far easier to capture and store your scraped results into Airtable records/fields.

CAUTION: Scaping Google search results is a ToS violation, so I advise anyone doing this to do it right and above-board using a Google search API or any third party that can abstract your work away from a potential ToS violation.

Example…


output.markdown('# Google Search Example');

let url    = "https://www.google.com/search?q=Airtable+is+Cool&rlz=1C5CHFA_enUS908US908&oq=Airtable+is+Cool&aqs=chrome..69i57j69i64.3994j0j7&sourceid=chrome&ie=UTF-8";
output.text(url);

let options   = {
    "method": "GET",
    "headers": {
        "Content-Type"  : "application/json",
    }
}

await remoteFetchAsync(url, options)
    .then((resp) => resp.text())
    .then(function(data) {
        output.inspect(data);
    })
    .catch(function(e) {
        output.markdown("ERROR!");
        output.markdown(e.message);
    });

Like

K

Ken_Savage
Author
New Participant
Forum|Forum|3 years ago
August 1, 2022

@Ken_Savage …

Far easier to utilize the Airtable SDK and script features. You don’t need Node (because Airtable has Node), and you don’t need an external process (because Airtable supports this type of stuff internally). Works with automation script, script extensions and custom apps, and the SDK makes it far easier to capture and store your scraped results into Airtable records/fields.

CAUTION: Scaping Google search results is a ToS violation, so I advise anyone doing this to do it right and above-board using a Google search API or any third party that can abstract your work away from a potential ToS violation.

Example…


output.markdown('# Google Search Example');

let url    = "https://www.google.com/search?q=Airtable+is+Cool&rlz=1C5CHFA_enUS908US908&oq=Airtable+is+Cool&aqs=chrome..69i57j69i64.3994j0j7&sourceid=chrome&ie=UTF-8";
output.text(url);

let options   = {
    "method": "GET",
    "headers": {
        "Content-Type"  : "application/json",
    }
}

await remoteFetchAsync(url, options)
    .then((resp) => resp.text())
    .then(function(data) {
        output.inspect(data);
    })
    .catch(function(e) {
        output.markdown("ERROR!");
        output.markdown(e.message);
    });

Great answer Bill. I appreciate this insight too.

The only issue I see here is security and the google api not allowing so many requests coming from the same IP. That’s where ScrapingBee comes in handy with their proxies.

Like

B

+19

Bill_French
Inspiring
Forum|Forum|3 years ago
August 1, 2022

Great answer Bill. I appreciate this insight too.

The only issue I see here is security and the google api not allowing so many requests coming from the same IP. That’s where ScrapingBee comes in handy with their proxies.

That’s the whole point of using the API. It is sanctioned and you pay and they want and expect lots of requests coming from the same IP address.

What is not sanctioned is scraping HTML results directly. If ScrapingBee does all of that for you please remind me why we are having this discussion. :slightly_smiling_face:

Like

Sign up

Login with SSO

Login to the community

Login with SSO

Scanning file for viruses.

This file cannot be downloaded