Yet more observations about the MJ12Bot

I received a reply [1] about MJ12Bot [2]! Let's see …

From: Majestic <XXXXX­XXXXX­XXXXX­XXXXX­XXXXX­XXXXX>
To: Sean Conner <sean@conman.org>
Subject: [Majestic] Re: Your robot is making bogus requests to my webserver
Date: Thu, 11 Jul 2019 08:34:13 +0000
> ##- Please type your reply above this line -##

Oh … really? Sigh.

Anyway, the only questionable bit in the email was this line:

The prefix // in a link of course refers to the same site as the current page, over the same protocol, so this is why these URL (Universal Resource Locator)s are being requested back from your server.

which is … somewhat correct. It does mean “use the same protocol” but the double slash denotes a “network path reference” (RFC (Request For Comments)-3986 [3], section 4.2) where, at a minimum, a hostname is required. If this is just a misunderstanding on the developers' part, it could explain the behavior I'm seeing.

And speaking of behavior, I decided to check the logs (again, using last month) one last time for two reports.

Table: User Agents, sorted by most requests, for June 2019
404 (not found)	200 (okay)	Total requests	User agent
------------------------------
170	42676	46334	The Knowledge AI
21	36088	38097	Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)
46	16633	17130	Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
5	15840	15928	Mozilla/5.0 (compatible; AhrefsBot/6.1; +http://ahrefs.com/robot/)
3	12304	12353	Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
36	8412	8929	Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
7	8428	8908	Gigabot
5680	2015	7872	Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
28	6604	6942	Barkrowler/0.9 (+http://www.exensa.com/crawl)
0	4705	4737	istellabot/t.1.13

Table: User Agents, sorted by most bad requests (404), for June 2019
404 (not found)	200 (okay)	Total requests	User agent
------------------------------
5680	2015	7872	Mozilla/5.0 (compatible; MJ12bot/v1.4.8; http://mj12bot.com/)
656	109	768	Mozilla/5.0 (compatible; MJ12bot/v1.4.7; http://mj12bot.com/)
177	45	553	Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2)
170	42676	46334	The Knowledge AI
120	0	120	Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)

(Note: The number of 404s and 200s might not add up to the total—there might be other requests that returned a different status not reported here.)

MJ12Bot is the 8th most active client on my site, yet it has the top two spots for bad requests, beating out #3 by over an order of magnitude (35 times the amount in fact).

But I don't have to worry about it since the email also stated they removed my site from their crawl list. Okay … I guess?

[1] /boston/2019/07/10.1

[2] https://mj12bot.com/

[3] https://www.ietf.org/rfc/rfc3986.txt

Gemini Mention this post

Contact the author