💾 Archived View for kennedy.gemi.dev › docs › crawling.gmi captured on 2023-09-28 at 15:49:05. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-10)
-=-=-=-=-=-=-
Kennedy creates its search index by crawling content only within Geminispace. It will not crawl or index other content from other protocols like Gopher or HTTP.
Kennedy crawls Geminispace using the following IP addresses:
Kennedy throttles itself and waits 1.5 seconds between making requests to the same IP address. This increases the amount of time it takes to crawl multiple capsules hosted from the same IP address, such as Flounder.online.
Kennedy will respect sites that are using the simplified robots.txt protocol defined for Gemini.
Specifically, Kennedy will follow the Deny rules defined for the follow user-agents:
Note: There are a number of robots.txt files in Geminispace which use rules outside of the simplified standard above. These include:
Kennedy does not currently respect these rules.
Kennedy has the following limits: