Nov 03, 2020 08:22 AM
I’m looking for someone that can write a script to update Airtable records (via the Octoparse api) as the data is being scraped from the web by Octoparse such that the Airtable database is updated within seconds of the scrape.
Nov 03, 2020 08:36 AM
Hi Chris and welcome to the community.
Yep - I’ve built one other Octoparse integration, not with Airtable but with Firebase. The process is nearly identical though and pushing the data to Airtable or Firebase is pretty trivial. One concern is the amount of data we’re talking about. In the Firebase project, we were generating tens-of-thousands of records every week and using that data for business intelligence reporting and analytics. Have you given the relatively low ceiling of Airtable consideration in this automated process?
This is where it gets a bit complicated. Since Octoparse doesn’t support webhooks, you need to use their lame email alerting process as a webhook proxy. This requires a role account in something like G-Suite to act as a proxy webhook server. The alternative is to poll their Task API looking for completed jobs and at best would be latent by five minutes. Pushing it to every minute will risk overreach of the script trigger quota. Even if performed in Airtable action script, this would be a concern.
I’m pretty swamped lately but happy to have a chat or share more details with however you decide to hire.
Nov 03, 2020 09:09 AM
Thank you for your input and offer.
Yes, I’m pushing the limits of airtable a bit - keeping a database of about 40,000 records. I wouldn’t mind a higher capacity, but the 50k capacity maybe helps me keep my database lean. I estimate that I will want to update 50-5000 records per day depending on the day.
For a bit more information about the tasks: My octoparse tasks on average scrape 1-4 single page urls per task per minute pulling 3-5 strings/numbers per url. And those 3-5 strings/numbers would update a single existing record.
Nov 03, 2020 09:21 AM
Hope you’re doing well.
I can assist with the requirement.
Kindly reach to me at Email: firstname.lastname@example.org or at Skype: cis.am3
Hope to hear from you soon.
Nov 03, 2020 09:46 AM
Okay - the term “update”; does it mean simply changing values in an existing record? Or, creating a new record with each “update”?
If the latter is the case, in about ten days you could potentially max the capacity of a base. Have you considered this concept?
Business and technical analytics tend to be comprised of small data objects. As such, it may be an ideal use case where details about scraped data and other task outputs are captured into Airtable as aggregations of data objects stored in a single long-text field as JSON. I have two clients who use this approach and in one of the tables, we have successfully stored the equivalent of 250,000 detail records inside just 20,000 actual records. This approach is not only condoned by Airtable itself, it is fundamentally encouraged as evidenced by the JSON editor beta.
Nov 03, 2020 12:52 PM
Changing values in existing records
Nov 03, 2020 01:44 PM
Then scale should not be an issue. Carry on!
Nov 05, 2020 02:40 AM
My new app, Data Fetcher, might be able to help with this Christopher. It provides a way to run, save and schedule API requests within Airtable.
Hoping to on the marketplace in the next couple of weeks or I can install it manually in your base if you like using the cli