Help

The Community will be undergoing maintenance on Friday January 10 at 2:00pm - Saturday January 11 at 2:00pm EST, and will be "read-only." For assistance during this time, please visit our Help Center.

Re: Airtable should buy or implement SimpleScraper's functionality

2557 0
cancel
Showing results for 
Search instead for 
Did you mean: 
chrismessina
6 - Interface Innovator
6 - Interface Innovator

The SimpleScraper (go to simplescraper dot io) Chrome extension looks super powerful and useful for sucking up data into Airtable!

37 Replies 37

Funny you should mention this - about a decade ago I created a [server-side] javascript library that dynamically alters the CSS and HTML structure in non-repeatable ways without changing the rendering making it impossible for scrapers to capture any data. I just licensed it to a prominent data sciences team to protect their data from automated scraping updates and encourage data consumers to go through the proper mechanisms for real-time updates.

Classic game of cat and mouse but that 10-year old cat is getting a close to retirement now :winking_face:

LOL! I wish - I’ve updated it 11 times. I think I will retire before it does.

But that’s the point - there’s no way to predict the next rendering change, the mouse is in the snare; end of game.

The mouse is crossed now with Airtable and replicates through automation. Catch me if you can.

The mouse is crossed now with Airtable and replicates through automation. Catch me if you can.

Plot twist: Airtable was a feline all along.

Not outright saying so is simply the more profitable option right now.

But if the scraping crowd ever truly starts migrating to Airtable en masse, I think they’d crack down on this use case way before our remoteFetchAsync queues started yielding error after error haha. I’m sure there are domains that get significant traffic from Airtable but most would probably just go for the outright CORS block, which would threaten the platform’s modularity, which is still a core part of its sales pitch, from what I can tell.

So, what’s that, ten, 12 lines of this month’s flavor of node middleware? In order for that Abagnale line to not even reach the target, that is? Not a rhetorical question btw, I just don’t know servers that well.

Who would dare the issue a wide block when most of the uses are harmless. If it’s legal to scrap Linkedin data to analyze and predict employee job loyalty and Facebook photos for Face recognition for law enforcement agencies world-wide, without any punishment, what would be the measure to prevent Airtable to provide multipurpose tools from completely blocking it from accessing said websites. The chances that Airtable would sue them is limited but other individuals may. The information on Amazon and other sites is publicly available information. As long as it is only used to analyze and research the information, it’s probably unlikely. If there is a law prohibiting outright scraping, I would say that low will be in the first place regulating personal information of individuals. That is likely going to be illegal sooner or later. While Airtable scraper may get blocked, SimpleScraper would probably get by it by using different IP addresses. It’s not illegal.

This isn’t a question of legality or boldness. And if people are willing to pay for CSS class obfuscation solutions as mentioned above, they’re surely willing to block a productivity tool from accessing their servers.