Back in July 2019 I was investigating some bad bots [1] on my website when I came across the bot that identified itself simply as “The Knowledge AI (Artificial Intelligence)” that was the number one robot hitting my site [2]. Most bots that identify themselves will give a URL to a page that describes their usage like Barkrowler [3] (to pick one that recently crawled my site). But not so “The Knowledge AI”. That was all it said, “The Knowledge AI”. It was very hard to Google, but I wouldn’t be surprised if it was OpenAI.
The earliest I can find “The Knowledge AI” crawling my site was April of 2018, and despite starting on April 16th, it was the second most active robot that month. In May it was the number one bot, and it stayed there through October of 2022, after which it pretty much dropped—from 32,000+ in October of 2022 to 85 in November of 2022 (about 4½ years). It was sporadic, showing up in single digit hits until January of 2024. It may be still crawling my site, but if it is, it is no longer identifying itself.
I don’t know if “The Knowledge AI” was an LLM company crawling, but if it was, not giving a link to explain the bot is suspicious. It’s the rare crawler that doesn’t identify itself with at least a URL to describe it. The fact that it took the number one crawling spot on my site for 4 ½ years is suspicious. As robots go, it didn’t affect the web server all that much (I’ve come across worse ones), and well over 90% of its requests were valid (unlike MJ12, which had a 75% failure rate). And my /robots.txt file doesn’t exclude any robot from scanning, so I can’t really complain about it.
“My comment on “Mitigating SourceHut's partial outage caused by aggressive crawlers | Lobsters” [4]”
Even though the log data is a few years old, I don't think that IPs change from ASN (Autonomous System Number) to ASN all that much (but I could be wrong on that). I checked the IPs used by “The Knowledge AI” in May 2018, and in October 2022, and they didn't change that much. They were still the same /24 networks across that time.
Looking up the information today is very disappointing—Hurricane Electric LLC. [5], a backbone provider.
So no real information about who “The Knowledge AI” might have been.
Sigh.
[3] https://www.babbar.tech/crawler
[4] https://lobste.rs/s/dmuad3/mitigating_sourcehut_s_partial_outage#c_mygeyl