Looking for a contractor to write a script for updating records by pulling in data with an api

Christopher_Vea · 2020-11-03T16:22:37+00:00

I’m looking for someone that can write a script to update Airtable records (via the Octoparse api) as the data is being scraped from the web by Octoparse such that the Airtable database is updated within seconds of the scrape.

B

+19

Bill_French
Inspiring
Forum|Forum|5 years ago
November 3, 2020

Hi Chris and welcome to the community.

Yep - I’ve built one other Octoparse integration, not with Airtable but with Firebase. The process is nearly identical though and pushing the data to Airtable or Firebase is pretty trivial. One concern is the amount of data we’re talking about. In the Firebase project, we were generating tens-of-thousands of records every week and using that data for business intelligence reporting and analytics. Have you given the relatively low ceiling of Airtable consideration in this automated process?

This is where it gets a bit complicated. Since Octoparse doesn’t support webhooks, you need to use their lame email alerting process as a webhook proxy. This requires a role account in something like G-Suite to act as a proxy webhook server. The alternative is to poll their Task API looking for completed jobs and at best would be latent by five minutes. Pushing it to every minute will risk overreach of the script trigger quota. Even if performed in Airtable action script, this would be a concern.

I’m pretty swamped lately but happy to have a chat or share more details with however you decide to hire.

Like

C

Christopher_Vea
Author
New Participant
Forum|Forum|5 years ago
November 3, 2020

Thank you for your input and offer.

Yes, I’m pushing the limits of airtable a bit - keeping a database of about 40,000 records. I wouldn’t mind a higher capacity, but the 50k capacity maybe helps me keep my database lean. I estimate that I will want to update 50-5000 records per day depending on the day.

For a bit more information about the tasks: My octoparse tasks on average scrape 1-4 single page urls per task per minute pulling 3-5 strings/numbers per url. And those 3-5 strings/numbers would update a single existing record.

Like

F

frank_johnson
Participating Frequently
Forum|Forum|5 years ago
November 3, 2020

Hi @Christopher_Veazey,
Hope you’re doing well.

I can assist with the requirement.

Kindly reach to me at Email: frank@cisinlabs.com or at Skype: cis.am3

Hope to hear from you soon.

Thanks,
Frank

Like

B

+19

Bill_French
Inspiring
Forum|Forum|5 years ago
November 3, 2020

Thank you for your input and offer.

Yes, I’m pushing the limits of airtable a bit - keeping a database of about 40,000 records. I wouldn’t mind a higher capacity, but the 50k capacity maybe helps me keep my database lean. I estimate that I will want to update 50-5000 records per day depending on the day.

For a bit more information about the tasks: My octoparse tasks on average scrape 1-4 single page urls per task per minute pulling 3-5 strings/numbers per url. And those 3-5 strings/numbers would update a single existing record.

Okay - the term “update”; does it mean simply changing values in an existing record? Or, creating a new record with each “update”?

If the latter is the case, in about ten days you could potentially max the capacity of a base. Have you considered this concept?

Business and technical analytics tend to be comprised of small data objects. As such, it may be an ideal use case where details about scraped data and other task outputs are captured into Airtable as aggregations of data objects stored in a single long-text field as JSON. I have two clients who use this approach and in one of the tables, we have successfully stored the equivalent of 250,000 detail records inside just 20,000 actual records. This approach is not only condoned by Airtable itself, it is fundamentally encouraged as evidenced by the JSON editor beta.

Like

C

Christopher_Vea
Author
New Participant
Forum|Forum|5 years ago
November 3, 2020

Okay - the term “update”; does it mean simply changing values in an existing record? Or, creating a new record with each “update”?

If the latter is the case, in about ten days you could potentially max the capacity of a base. Have you considered this concept?

Business and technical analytics tend to be comprised of small data objects. As such, it may be an ideal use case where details about scraped data and other task outputs are captured into Airtable as aggregations of data objects stored in a single long-text field as JSON. I have two clients who use this approach and in one of the tables, we have successfully stored the equivalent of 250,000 detail records inside just 20,000 actual records. This approach is not only condoned by Airtable itself, it is fundamentally encouraged as evidenced by the JSON editor beta.