๐Ÿ’พ Archived View for station.martinrue.com โ€บ clseibold โ€บ 93925dca35894a12aa5b69c823720e60 captured on 2024-08-18 at 21:21:51. Gemini links have been rewritten to link to archived content

View Raw

More Information

โฌ…๏ธ Previous capture (2024-07-09)

๐Ÿšง View Differences

-=-=-=-=-=-=-

๐Ÿ‘ฝ clseibold

I have made some improvements to my crawler that will allow for some interesting ideas that I have planned for AuraGem Search. For at least 2 years now my search engine has had a way of detecting which pages can be used as gemsub feeds and which cannot. With slight changes to my crawler, it can now query from the db a list of all URLs that are considered feeds and crawl only internal links from those pages - meaning it will crawl only the non-cross-host links of those feed pages. This will allow me to have a constantly updated feed aggregator based on my search engine, with no censorship and no requirement of having to submit a url.

5 months ago ยท ๐Ÿ‘ maxheadroom

Actions

๐Ÿ‘‹ Join Station

1 Reply

๐Ÿ‘ฝ clseibold

Note: If you want your pages to not be crawled by my search engine, be sure to use a robots.txt ยท 5 months ago