Looks like import.io was put on ice for new users so no new signups for free tier accounts, just the sales pitch available so I am hoping too this project makes some progress. Any news @Mike_Simplescraper ?
The one issue with Simplescraper is it doesn’t look for text in the page based on some logic, but rather it looks at the structure of the page. For example, there can be a page 1 with text as follows
Where AAA, BBB, CCC are my titles of fields.
Page 1 is crawled correctly.
The issue is if some of the elements of the page are missing.
If there is another page 2 (same site essentially) and it has same structure but the element AAA: XXXX is missing (not listed on the page). Because the way Simplescraper works, when it crawls the page, it will put the results:
CCCC: (will be empty because now the data got shifted to AAAA and BBBB.
Simplescraper should have looked at the page structure but also look for similar blocks in text. If I create recipe where AAAA, BBBB, and CCCC are there, and if AAAA is missing, then it should still be able to fill in BBBB and CCCC correctly and not move things up.
The specific page is Amazon product page.
The specific issue is caused by the product details element for “Discontinued by manufacturer” property. I am not interested in this one. Just want to start the scrape at Product Dimensions.
- Is Discontinued By Manufacturer : No
I can solve is by having 2 different recipes.
I know this is not Simplescraper support forum, but since I am doing it into Airtable I wanted to mention in case @Mike_Simplescraper happens to be around reading it.
Other than that it works well. Is an integration necessary to Airtable, what would be the benefit of Airtable buying it? I can use Zapier or Integromat to process the results.
Anyhow, earlier there was discussion local vs cloud scrape. In the simplesraper FAQ:
What’s the difference between local scraping and cloud scraping?
Using the extension to select and download data is local scraping. It’s simple and free. If you scrape the same pages often, need to scrape multiple pages, or want to turn a website into an API, you can create a scrape recipe that runs in the cloud. Cloud scraping has advantages like speed, page navigation, a history of scrape results, scheduling and the ability to run multiple recipes simultaneously.