Notes on blocking the MJ12Bot

The MJ12Bot [1] is the first robot listed in the Wikipedia's [2] robots.txt [3] file, which I find amusing for obvious reasons [4]. In the Hacker News comments [5] there's a thread [6] specifically about the MJ12Bot, and I replied to a comment about blocking it [7]. It's not that easy, because it's a distributed bot that has used 136 unique IP (Internet Protocol) addresses just last month. Because of that comment, I decided I should expand on some of those numbers here.

The first table is the number of addresses from January through June, 2019 to show they're not all from a single netblock, The address format “A.B.C.D” will represent a unique IP address, like 172.16.15.2; “A.B.C” will represent the IP addresses 172.16.15.0 to 172.16.15.255; “A.B” will represent the range 172.16.0.0 to 172.16.255.255 and finally “A” will represent the range 172.0.0.0 to 172.255.255.255.

Table: Number of distinct IP addresses used by MJ12Bot in 2019 when hitting my site
Address format	number
------------------------------
A.B.C.D	312
A.B.C	256
A.B	86
A	53

Next are the unique addresses from all of 2018 used by MJ12Bot:

Table: Number of distinct IP addresses used by MJ12Bot in 2018 when hitting my site
Address format	number
------------------------------
A.B.C.D	474
A,B.C	370
A.B	125
A	66

This wide distribution can easily explain why Wikipedia found it to ignore any rate limits set. Each individual node of MJ12Bot probably followed the rate limit, but it's a hard problem to coordinate across … what? 500 machines across the world?

It seems the best bet is to ban MJ12Bot via robots.txt:

User-agent: MJ12bot
Disallow: /

While I haven't added MJ12Bot to my own robots.txt [8] file, it hasn't hit my site since they removed me from their crawl list [9], so it appears it can be tamed.

[1] https://mj12bot.com/

[2] https://www.wikipedia.org/

[3] https://en.wikipedia.org/robots.txt

[4] /boston/2019/07/09-12

[5] https://news.ycombinator.com/item?id=20453189

[6] https://news.ycombinator.com/item?id=20453542

[7] https://news.ycombinator.com/item?id=20455003

[8] http://boston.conman.org/robots.txt

[9] /boston/2019/07/12.1

Gemini Mention this post

Contact the author