Dec 04, 2019 12:19 PM
The SimpleScraper (go to simplescraper dot io) Chrome extension looks super powerful and useful for sucking up data into Airtable!
Jul 10, 2021 02:08 PM
Funny you should mention this - about a decade ago I created a [server-side] javascript library that dynamically alters the CSS and HTML structure in non-repeatable ways without changing the rendering making it impossible for scrapers to capture any data. I just licensed it to a prominent data sciences team to protect their data from automated scraping updates and encourage data consumers to go through the proper mechanisms for real-time updates.
Jul 11, 2021 02:51 PM
Classic game of cat and mouse but that 10-year old cat is getting a close to retirement now :winking_face:
Jul 11, 2021 02:53 PM
LOL! I wish - I’ve updated it 11 times. I think I will retire before it does.
Jul 11, 2021 02:55 PM
But that’s the point - there’s no way to predict the next rendering change, the mouse is in the snare; end of game.
Jul 11, 2021 03:24 PM
The mouse is crossed now with Airtable and replicates through automation. Catch me if you can.
Jul 13, 2021 06:22 PM
The mouse is crossed now with Airtable and replicates through automation. Catch me if you can.
Plot twist: Airtable was a feline all along.
Not outright saying so is simply the more profitable option right now.
But if the scraping crowd ever truly starts migrating to Airtable en masse, I think they’d crack down on this use case way before our remoteFetchAsync queues started yielding error after error haha. I’m sure there are domains that get significant traffic from Airtable but most would probably just go for the outright CORS block, which would threaten the platform’s modularity, which is still a core part of its sales pitch, from what I can tell.
So, what’s that, ten, 12 lines of this month’s flavor of node middleware? In order for that Abagnale line to not even reach the target, that is? Not a rhetorical question btw, I just don’t know servers that well.
Jul 16, 2021 08:31 AM
Who would dare the issue a wide block when most of the uses are harmless. If it’s legal to scrap Linkedin data to analyze and predict employee job loyalty and Facebook photos for Face recognition for law enforcement agencies world-wide, without any punishment, what would be the measure to prevent Airtable to provide multipurpose tools from completely blocking it from accessing said websites. The chances that Airtable would sue them is limited but other individuals may. The information on Amazon and other sites is publicly available information. As long as it is only used to analyze and research the information, it’s probably unlikely. If there is a law prohibiting outright scraping, I would say that low will be in the first place regulating personal information of individuals. That is likely going to be illegal sooner or later. While Airtable scraper may get blocked, SimpleScraper would probably get by it by using different IP addresses. It’s not illegal.
Jul 27, 2021 09:42 AM
This isn’t a question of legality or boldness. And if people are willing to pay for CSS class obfuscation solutions as mentioned above, they’re surely willing to block a productivity tool from accessing their servers.