There are numerous challenges with this requirement. For example, to know if anything has changed across the entire site, you must index the entire site because the web page at a specific domain address may not be updated with deep link pages in the site are updated. So, unless you have a complete list of every page’s URL in the domain, you may get some false negatives - i.e., the site was actively changing but the home page was not modified at all.
An HTTP request that senses any given page has been updated could provide false positives because many web sites are “touched” daily even though the content has not change at all. They do this in an attempt to influence search engines. As such, your competitive intelligence system will be completely inaccurate simply because a content management system is aggressively touching the pages.
Lastly, web servers are likely to get in the way of your intelligence-gathering logic because many are set up to optimize responses and defend against certain types of crawling activities. This can give you misleading outcomes.
Computing Content Deltas
The only reasonable and reliable way to determine content change over time is to compute the delta (i.e., difference) for each page in the site between two dates. This is a big task and similar to version control of content and building this is non-trivial.
Google Alerts. Build out your competitive intelligence solution by using alerts and target specific keywords to provide precise notifications about the terms that really matter.
Visual Ping - this platform is also very good at finding change.
For each of these services, you can have the alerts forwarded to Zapier or Integromat and then on to Airtable.