- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎Apr 06, 2023 07:15 AM - edited ‎Apr 06, 2023 07:16 AM
Hi,
I need to build a script to parse a webpage. At first run it crafts POST request to search for item, the page returns a table-like list of several matching items with their statuses. (I'm on that stage now). I need to choose item(s) with certain status, enter their links and get info I need.
So far I parsed such pages using my own functions, but it feels like I re-invent the wheel. Is there any 'best practice' or free working solution for such action?. I just need an example using only script extension (I saw examples of using DOMParser() but I would like to stay inside Airtable without establishing additional servers, node.js etc)
Note that I receive text, not JSON. Also, page doesn't require credentials.
Example of functions I'm using now (POST options omitted):
const myurl = 'https://sampleurl.com';
const [tagA,tagZ,tagX]=[`id="items">`,`</h3>`,`</span>`] //single item start/end/exclude
const [DIVIDER,WASTE]=['mw-headline',`<span `] //items divider/waste pieces marker
const cutter=(txt,pattern)=>txt.split(pattern).join('') // replaceAll(pattern,'')
const cut=(txt,a,z,x)=>cutter((txt.split(a,2).pop().split(z,2).shift()), x); //from A to Z excluding X
const parse=txt=>txt.split(DIVIDER).map(text=>cut(text,tagA,tagZ,tagX)).filter(n=>!n.includes(WASTE));
const query = await remoteFetchAsync(myurl);
const items = await query.text();
console.log(parse(items))
- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
‎Apr 06, 2023 02:51 PM
If you do this in an extension rather than a script then you can do this:
let a = document.createElement("document")
a.innerHTML = html
let items = a.querySelector("#items h3")
In scripting, there is no document object.
![](/skins/images/DD0CD7D0ACF200EF4456420D87029A3D/responsive_peak/images/icon_anonymous_message.png)