Wondering if anybody can suggest efficient way to extract text and put into fields.
I was going to play with LEN and SUBSTITUTE formulas but I know it will take me a day or two to figure it out because I don’t really understand well how the two formulas work. I have used them successfully to extract Airtable attachment links to separate fields as URLs and it only worked because some genius here came with the suggestion.
All I try to do is to go to Amazon, copy the description, I would past that into a field and then the formulas would grab it and populate the individual fields.
Specifically I am grabbing this information (PRODUCT INFORMATION - it’s at the end of description in Amazon product page, just before the Customer Questions & Answers section).
This is what I need to copy:
3.54 x 2.24 x 0.47 inches
Item model number
The formatting doesn’t show up when past in Airtable. The text contains some vertical lines | when I copy them from Amazon which I believe can be helpful to build the formula.
However, when I past the text to Airtable, the vertical lines don’t show up, so I guess they are just “gaps” and not really vertical lines.
So I guess it’s just requires a lot of testing to get it work but before I even try to go with the two formulas, I am wondering if there is any other better way doing it.
There likely isn’t an easier way to do it, unfortunately. But since you have some predictable structure in your data, you can use a combination of LEFT(), RIGHT(), LEN(), FIND(), and MID() to get what you need.
The LEFT() or RIGHT() functions can also be used in combination with LEN() and FIND() to let you extract text between the beginning or end of the copied text and the first instance of a specific word. For example, this formula will let you get the Item model number, using the input example you pasted:
The regular expressions for the width and height are a bit more complex. If you cannot figure them out on your own with a regular expression tester, and have budget to hire someone to figure out the regular expression for you, feel free to reach out.
@ScottWorld Thanks for great tip. I kinda ignored the web-clipper, always thought the clipper is to clip image of the page or capture the URL. I didn’t know how powerful the Webclipper is. That’s amazing.
I figured out how to get the CSS values easily using the configuration page on Simplesraper and have now tested works flawlessly.
I found post about Simplescraper and it seems like a great tool for bigger jobs. It has a flaw which I described towards the end of this post:
Next I am going to work around the formula method suggested by @Katerie and @kuovonne, I’ll see how far I can get and I’ll share the results if I get to working solution. I find using the formula is more handy for individual ad-hoc tasks. The formula way will do just fine. Thanks to all for help.
Worked fine but it’s fragile as once can expect. I used:
SUBSTITUTE to add more free space around the text
LEN, FIND, LEFT, RIGHT, MID
TRIM to remove the free space
Then tested and it all worked nice and clean. Then disappointment. I texted on more Amazon products and I found that block of text is inconsistent. They have different versions of this product details. I guess it depends on product category or they didn’t update things, who knows…
Sometimes weight is separate line, sometimes together with sizes
The lines are shuffled up
And the product details block looks different across products, so I would have to build minimum 2 versions versions of this and the nested formulas get complex.
The MID function is the best but it’s not possible to know the “count” value in MID function. The string of the property changes and the position changes in the product details. The only consistent thing is the ASIN which is Amazon standard identification number. I didn’t notice before how inconsistent Amazon pages are before this exercise. Any inconsistency is negative on customer behaviours for shoppers who look for consistency, so Amazon is that sloppy in that regard at least. Gotcha.
So having that kind of fragile formula, if things change from Amazon side, I can start from scratch.
Tried but it didn’t give me any result the Regex way. The result was just white cell. So I assume those tabs are not being copied as “gaps” but some other code that is either converted to space or just not displayed in Airtable.
It looks like I will settle with the Clipboard solution after all. It’s probably more stable because it links to the actual CSS code in the page… of course until that breaks if Amazon makes changes to it.