I am creating a Airtable base for HR recruiting purposes. I use the Airtable form to collect information from potential candidates.
I would like to collect only their LinkedIn Profile URL and contact info (email or phone).
Using the collected LinkedIn Profile URL, I would like to pull info such as Name, Location, Experience, Education from LinkedIn into Airtable fields. Can this be done by Scripting block or is there a better alternative solution to this?
Your help is greatly appreciated. Thanks in advance.
Banning (they know precisely who/what is scraping and your actions could affect the ability for people inside the Airtable domain to access their linked in accounts).
Ideally, use the Linkedin API (it requires some payment I think).
Scale; Airtable script blocks are very fast with regard to fields and tables; very slow for all other stuff and crawlers need lots of processing cycles.
I doubt there are any limits on HTTP requests to external web sites, but a really thorough crawler built using Script block may demonstrate why Airtable should discourage (or prohibit) doing this.
Finally - (in my opinion) no one should ever build a crawler in Script Block because it would be too brittle, your investment would be exposed to far too many ways it could or should be shut down, and it’s not a sustainable approach - better to use something more viable for the task.
Certainly, small numbers of fetches of content over HTTP(s) should not be a big deal and it might work really well until it doesn’t.
Hello @Aswanth_Selva_Pragat, @Bill.French,
I had started building a python WebCrawler App that would then populate an AirTable via AT public API.
This WebCrawler was to meet several attentions from me including those explained here by Bill.
I always preferred the API of a site to any other data collection technique for the reasons explained by Bill and a lot more other ones that I discovered while practicing experimental low-scale Web Crawling lab’s attempts under python.
This message is given as a good intention under condition that it had to become practice.
Although, I could share my Chrome / Chromium by js Control and Automation curated URLs List if it is interesting someone.
But my today’s priorities are now on AirTable Script- and Custom- BLOCK so Web Crawling Projects went asleep.
All the points that you have put out totally makes sense.
We do have access to LinkedIn APIs and are planning to leverage that for the process. But, I don’t think the scripting block is the right place to build this at. Probably a separate instance that integrates with Airtable through Airtable API and then accesses LinkedIn APIs to pull info and put it back into Airtable through Airtable APIs would be the better option.
I built a functionality to collect data using the CSS selector functionality of the Web Clipper rather than the Scripting Block. This would be most useful for an individual recruiter or sales rep who can collect information from accounts as they’re browsing.
It may require using the dedupe block to clean up every once in a while but I didn’t find it to be too much a hassle when I was using Airtable every day in my sales job.
Hey Olpy, I am looking to hire someone to help scrape date from TikTok posts and feed it into AirTable. Is this something you can help with, or perhaps can help point me in the right direction. Thanks a lot!