Bulding a Python Web Scrapper

Today we are setting up a simple Python web scraper. The idea is basic: start with one URL, fetch the page with requests, then parse the HTML to find links and crawl from there. We decide what to save, like text or images, but for now we stick to HTML text only. We use queues to pass work between two workers: one fetches pages and puts the HTML into a pages queue, and the other parses that HTML, pulls out href links, and puts new URLs back into the URL queue. We also track completed URLs so we do not download the same page twice. This is not a full browser, so it will miss content that needs JavaScript, and we should add a domain limit, relative link support, and a max depth so we do not crawl the whole web by accident. We also set a timeout and a user agent header to avoid hanging requests and to look like a normal browser.