Airtable should buy or implement SimpleScraper's functionality

It would great to see a general purpose data extraction tool, and many companies are trying.

I think what may inhibit progress in this area is the possibility of ‘only’ achieving 90% reliability, which, while an amazing feat, is effectively no different from the brittleness inherent in today’s scraping tools.

Relevant xkcd ; )

It’s certainly worth trying though as it would be game changing.

Perhaps true if you set aside benefits like these:

  1. Never configuring a scraper to get the data in the first place;
  2. Never having a scraper break;
  3. Never needing to monitor if a scraper is performing;
  4. Never modifying a scraper after it has fallen over;
  5. Never getting egg on your face when a scraper fails;
  6. Never spending any time analyzing the CSS patterns.

I believe that if you eliminate all of these activities and achieve 80% accuracy, the cost-benefit ratio tips well into your advantage column. You then have a chance to use additional pattern detection to remove the remaining 20% inaccurate data.

As such, even if you only eliminate 50% of the effort spent on these six activities, it is very different the brittleness inherent in today’s scraping tools.

If you have metrics concerning these six classes of profit-robbing activity, I have a hunch the math would demonstrate good reason to invest in a different technical approach.

Bear in mind, I’m certainly not a scraping expert; I just spend a lot of time revamping process automation and quite often, there are vast activities that can be eliminated while changing the underlying infrastructure.

@chrismessina - Hey Chris, a quick update on this:

We’ve been working on a custom block that allows you to easily import data into Airtable using Simplescraper. Here’s a preview of it in action:

In the demo we import data from Stackoverflow but the source could be any website that you choose - simply use the dropdown to change recipes.

Let me know if this is similar to what you had in mind?

Airtable’s custom blocks are still in preview so no ETA but wanted to keep you posted and listen to any suggestions that you may have.

Peace, Mike


Ooo… looks promising! So — how will this apply to arbitrary URLs?

Here’s a use case: I post a lot of stuff to Product Hunt. A lot of the apps I post are in Apple’s App Store. I’d like to be collect the app’s:

  • icon
  • the gallery of images, not just at the 300px size, but the 960px size
  • description
  • website

How could I use your block to do this?

This looks amazing! Any improvement with this block/script project?

I would be interested to use it for product importation on a website where I cannot use API

Mike from simple scraper. Any update for your block? This seems like a great idea.

Hello! Is the project dead?