Skip to main content

Parsing html text returned by webpage with scripting extension

  • April 6, 2023
  • 1 reply
  • 0 views

Alexey_Gusev
Forum|alt.badge.img+12

Hi,
I need to build a script to parse a webpage. At first run it crafts POST request to search for item, the page returns a table-like list of several matching items with their statuses. (I'm on that stage now).  I need to choose item(s) with certain status, enter their links and get info I need.
So far I parsed such pages using my own functions, but it feels like I re-invent the wheel. Is there any 'best practice' or free working solution for such action?. I just need an example using only script extension (I saw examples of using DOMParser() but I would like to stay inside Airtable without establishing additional servers, node.js etc)
Note that I receive text, not JSON. Also, page doesn't require credentials.
Example of functions I'm using now (POST options omitted): 

 

const myurl = 'https://sampleurl.com'; const [tagA,tagZ,tagX]=[`id="items">`,`</h3>`,`</span>`] //single item start/end/exclude const [DIVIDER,WASTE]=['mw-headline',`<span `] //items divider/waste pieces marker const cutter=(txt,pattern)=>txt.split(pattern).join('') // replaceAll(pattern,'') const cut=(txt,a,z,x)=>cutter((txt.split(a,2).pop().split(z,2).shift()), x); //from A to Z excluding X const parse=txt=>txt.split(DIVIDER).map(text=>cut(text,tagA,tagZ,tagX)).filter(n=>!n.includes(WASTE)); const query = await remoteFetchAsync(myurl); const items = await query.text(); console.log(parse(items))

 

 

1 reply

If you do this in an extension rather than a script then you can do this:

let a = document.createElement("document")
a.innerHTML = html
let items = a.querySelector("#items h3")

In scripting, there is no document object.


Reply