I received a reply [1] about MJ12Bot [2]! Let's see …
From: Majestic <XXXXXXXXXXXXXXXXXXXXXXXXXXXXXX>
To: Sean Conner <sean@conman.org>
Subject: [Majestic] Re: Your robot is making bogus requests to my webserver
Date: Thu, 11 Jul 2019 08:34:13 +0000
> ##- Please type your reply above this line -##
Oh … really? Sigh.
Anyway, the only questionable bit in the email was this line:
The prefix // in a link of course refers to the same site as the current page, over the same protocol, so this is why these URL (Universal Resource Locator)s are being requested back from your server.
which is … somewhat correct. It does mean “use the same protocol” but the double slash denotes a “network path reference” (RFC (Request For Comments)-3986 [3], section 4.2) where, at a minimum, a hostname is required. If this is just a misunderstanding on the developers' part, it could explain the behavior I'm seeing.
And speaking of behavior, I decided to check the logs (again, using last month) one last time for two reports.
Table: User Agents, sorted by most requests, for June 2019 404 (not found) 200 (okay) Total requests User agent ------------------------------ 170 42676 46334 The Knowledge AI 21 36088 38097 Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html) 46 16633 17130 Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/) 5 15840 15928 Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/) 3 12304 12353 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) 36 8412 8929 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler) 7 8428 8908 Gigabot 5680 2015 7872 Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 28 6604 6942 Barkrowler/0.9 (+http://www.exensa.com/crawl) 0 4705 4737 istellabot/t.1.13
Table: User Agents, sorted by most bad requests (404), for June 2019 404 (not found) 200 (okay) Total requests User agent ------------------------------ 5680 2015 7872 Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/) 656 109 768 Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/) 177 45 553 Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2) 170 42676 46334 The Knowledge AI 120 0 120 Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)
(Note: The number of 404s and 200s might not add up to the total—there might be other requests that returned a different status not reported here.)
MJ12Bot is the 8th most active client on my site, yet it has the top two spots for bad requests, beating out #3 by over an order of magnitude (35 times the amount in fact).
But I don't have to worry about it since the email also stated they removed my site from their crawl list. Okay … I guess?