A deeper dive into mapping web requests via ASN, not by IP address

I went ahead and replaced IP (Internet Protocol) addresses with ASN (Autonomous System Number)s in the log file to find the network that sent the most requests to my blog for the month of February.

Table: Top 10 networks requesting a page from blog
MICROSOFT-CORP-MSN-AS-BLOCK, US	78889
OVH, FR	31837
ALIBABA-CN-NET Alibaba US Technology Co., Ltd., CN	25019
HETZNER-AS, DE	23840
GOOGLE-CLOUD-PLATFORM, US	21431
CSTL, US	17225
HURRICANE, US	15495
AMAZON-AES, US	14430
FACEBOOK, US	13736
AKAMAI-LINODE-AP Akamai Connected Cloud, SG	12673

Even though Alibaba US has the most unique IPs hitting my blog [1], Microsoft is still the network making the most requests. So let's see how Microsoft presents itself to my web server. Here are the user agents it sends:

Table: Web agents from the Microsoft Network
agent	requests
------------------------------
Go-http-client/2.0	43236
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)	23978
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36	7953
Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0	2955
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36; compatible; OAI-SearchBot/1.0; +https://openai.com/searchbot	210
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko); compatible; ChatGPT-User/1.0; +https://openai.com/bot	161
DuckDuckBot/1.1; (+http://duckduckgo.com/duckduckbot.html)	123
'DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)'	122
Python/3.9 aiohttp/3.10.6	28
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.6478.36 Safari/537.36	14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.114 Safari/537.36	14
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.68	10
DuckAssistBot/1.2; (+http://duckduckgo.com/duckassistbot.html)	10
DuckAssistBot/1.1; (+http://duckduckgo.com/duckassistbot.html)	10
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36	6
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.143 Safari/537.36	6
python-requests/2.32.3	5
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.6422.142 Safari/537.36	5
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36	4
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:77.0) Gecko/20100101 Firefox/77.0	4
DuckDuckBot-Https/1.1; (+https://duckduckgo.com/duckduckbot)	4
Twingly Recon	3
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)	3
Mozilla/5.0 (compatible; Twingly Recon; twingly.com)	3
python-requests/2.28.2	2
newspaper/0.9.1	2
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36	2
Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534+ (KHTML, like Gecko) BingPreview/1.0b	2
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36	2
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/) Bot	1
http.rb/5.1.1 (Mastodon/4.2.10; +https://trystero.social/)	1
Mozilla/5.0 (Windows NT 6.1; WOW64) SkypeUriPreview Preview/0.5 skype-url-preview@microsoft.com	1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36	1
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36	1
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36 Edg/112.0.1722.48	1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/) Bot	1
Mastodon/4.4.0-alpha.2 (http.rb/5.2.0; +https://sns.mszpro.com/)	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/) Bot	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://the.voiceover.bar/)	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/) Bot	1
Mastodon/4.3.3 (http.rb/5.2.0; +https://discuss.systems/)	1

The top result comes from a single IP address and probably requires a separate post about it [2], since it's weird and annoying. But the rest—you got Bing, you got OpenAI, you got several Mastodon instances—it seems like most of these are from Microsoft's cloud offering. A mixture of things.

What about Facebook?

Table: Web agents from Facebook
agent	requests
------------------------------
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)	13497
facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)	207
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36	12
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/130.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36	4
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:58.0) Gecko/20100101 Firefox/59.0	4
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36 Edg/132.0.0.0	2
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36	2

Hmm … looks like I have a few readers at Facebook, but other than that, nothing terribly interesting.

Alibaba, on the other hand, is frightening. Out of 25,019 requests, it presented 581 different user agents. From looking at what was requested, I don't think it's 500 Chinese people reading my blog—it's defintely bots crawling my site (and amusingly, there are requests to /robots.txt file, but without a proper user agent to go by, it's hard to block it via that file).

I can think of one conclusion here—if you do filter by ASN, it can help tremendously, but it also comes with possibly blocking legitimate traffic.

[1] /boston/2025/03/21.1

[2] /boston/2025/03/21.4

Gemini Mention this post

Contact the author