Automating open-source intelligence geolocation in Ukraine using Airtable and Simplescraper

Hey everyone,

There’s a large open source intelligence (OSINT) community helping to document the invasion of Ukraine with many focused on geolocation: identifying exactly where an event took place or where certain media footage was recorded (here’s a recent article on the subject).

Given the volume of information, it’s sometimes difficult to keep up so I’ve built a system that searches Twitter for tweets containing coordinates, extracts them and plots the locations onto a map. It’s all fully automated and built on top of Airtable.

You can view the base here: Airtable - Simplescraper OSINT.

OccasionalShockingCavy-size_restricted

I’ve shared it to some OSINT communities and it seems to be useful, and at the same time it’s a neat example of how Airtable makes it super easy to quickly prototype new ideas.

For those interested in how it works:

  • Simplescraper (disclaimer: this is a service I built) scrapes Twitter every 3 minutes and sends new tweets to a base using its Airtable integration
  • In Airtable the base has two automations that run whenever a new record is created:
    • Automation one triggers a script that searches the content of the tweet for coordinates and if found updates the latitude and longitude fields in the table
    • Automation two checks to see if the tweet includes an image URL and if so it copies the image URL to an attachment type field which displays the image in Airtable
  • A Google Maps app has been added to the base which uses the latitude and longitude values extracted earlier to automatically pinpoint each new location mentioned in the tweet

And that’s it. Let me know what you think.

To make it more user-friendly for people not familiar with Airtable it would have been neat to extend this using Interfaces, but shared links and apps are not available on Interfaces (yet :crossed_fingers:).

3 Likes

This is an amazing and useful example! Thanks for sharing!

1 Like

Great example for real-life tasks.
In my ‘use case’, I would add a view with time filter like ‘after’ ‘2-3 days ago’ or ‘this week’.
Sitting in Kyiv, it’s quite ‘mental-draining’ to read all those news while working or perform volunteer activity of a different kind. But on the other side, we can’t ignore whole picture, it’s vitally important to monitor current state,
Grouping missile city hits, we may detect launch site(s), they are usually the same for target city. Thus, we can choose safer rooms in a flat, or cover in shelter during air alarm, if the flat is not safe in common.
Most valuable part - map of a land battle near city. Currently, our army can repeal them far enough, but in the worst case, it’s a really a matter of life - to retreat on safer place before road closed and it’s too late.

1 Like

you can actually filter each field, including time of tweet

I beleive that answer should be here, in replies, instead of creating new topic.
For me, without much cloud experience, hardest part of such tasks - where and how I should place scraper to run from time to time.
Here is example of my code to get html answer from oryxspioenkop.com site (Equipment Losses) and place head lines into new table. Will not work if the table already exist, but you can change name in first line.

const TNAME='Oryx';
const urlRus = 'https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-equipment.html';
const urlUkr = 'https://www.oryxspioenkop.com/2022/02/attack-on-europe-documenting-ukrainian.html';
const [S1,S2,SX]=[`id="Pistols">`,`</h3>`,`</span>`] //single item start/end/clean
const [T1,T2]=[`<span style="color: red;">`,`<br /></span>`] //totals start/end
const [TX,DIVIDER,WASTE]=[SX+T1,'mw-headline',`<span `] //items divider/waste pieces marker

const tableId=await base.createTableAsync(TNAME,[
  {name:'Side',type:'singleLineText'}, {name:'Loss',type:'richText'}])
if (!tableId) throw new Error(`Cannot create base ${TNAME}`);
const table=base.getTable(tableId);
const cutX=(txt,pattern)=>txt.split(pattern).join('')
const cut=(txt,a,z,x)=>cutX((txt.split(a,2).pop().split(z,2).shift()), x);
const bold=text=>`**${text}**`;
const total=text=>(text.indexOf(T1)>0)? bold(cut(text,T1,T2,TX)):cut(text,S1,S2,SX)
const parse=txt=>txt.split(DIVIDER).map(total).filter(n=>!n.includes(WASTE));

const create=(el,side)=>({fields:{'Side':side,'Loss':el }})
const rows=(arr,side)=>arr.map(el=>create(el,side))

const queryRus = await remoteFetchAsync(urlRus);
const lostRus = await queryRus.text();
const queryUkr = await remoteFetchAsync(urlUkr);
const rawUkr = await queryUkr.text();  
const CLN1 = `&nbsp;</span></div><h3>` 
const CLN2 = `<span class="mw-headline" id="Pistols">`
const CLN3 = CLN1+CLN2
const CLNU = `Ukraine - `
const lostUkr=cutX(rawUkr.replace(CLN3,'').replace(CLNU+TX,CLNU),CLN1);

const crt=[...rows(parse(lostRus),'Russia'),...rows(parse(lostUkr),'Ukraine')]
while (crt.length) await table.createRecordsAsync(crt.splice(0, 50));