Help

Parsing html text returned by webpage with scripting extension

1786 1
cancel
Showing results for 
Search instead for 
Did you mean: 
Alexey_Gusev
13 - Mars
13 - Mars

Hi,
I need to build a script to parse a webpage. At first run it crafts POST request to search for item, the page returns a table-like list of several matching items with their statuses. (I'm on that stage now).  I need to choose item(s) with certain status, enter their links and get info I need.
So far I parsed such pages using my own functions, but it feels like I re-invent the wheel. Is there any 'best practice' or free working solution for such action?. I just need an example using only script extension (I saw examples of using DOMParser() but I would like to stay inside Airtable without establishing additional servers, node.js etc)
Note that I receive text, not JSON. Also, page doesn't require credentials.
Example of functions I'm using now (POST options omitted): 

 

const myurl = 'https://sampleurl.com';
const [tagA,tagZ,tagX]=[`id="items">`,`</h3>`,`</span>`] //single item start/end/exclude
const [DIVIDER,WASTE]=['mw-headline',`<span `] //items divider/waste pieces marker
const cutter=(txt,pattern)=>txt.split(pattern).join('') // replaceAll(pattern,'') 
const cut=(txt,a,z,x)=>cutter((txt.split(a,2).pop().split(z,2).shift()), x); //from A to Z excluding X 
const parse=txt=>txt.split(DIVIDER).map(text=>cut(text,tagA,tagZ,tagX)).filter(n=>!n.includes(WASTE));

const query = await remoteFetchAsync(myurl);
const items = await query.text();
console.log(parse(items))

 

 

1 Reply 1
Steve_Haysom
8 - Airtable Astronomer
8 - Airtable Astronomer

If you do this in an extension rather than a script then you can do this:

let a = document.createElement("document")
a.innerHTML = html
let items = a.querySelector("#items h3")

In scripting, there is no document object.