2007-11-13 Bandwidth

I get an access logfile for each of the last fourteen days on emacswiki.org. I use a little Perl script I wrote (traffic) to add up all the bytes, and here’s what I get: **0.89G per day**.

traffic

Strange. I wonder why this is. I was using an estimate of about 10G per month. At the current rate we’ll be using between 25G and 30G per month. I wonder what my web host thinks about that.

Strangely enough this data is all from the end of September and the beginning of October. I’ll ask my web host whether log rotation is broken.

I also wonder whether I’ve made a mistake somewhere.

Here’s the raw data:

access.log.1
1217209942 1188681K 1188.7M 1.19G
access.log.2.gz
784404579 766020K 766.0M 0.77G
access.log.3.gz
1257662319 1228185K 1228.2M 1.23G
access.log.4.gz
759761181 741954K 742.0M 0.74G
access.log.5.gz
877536376 856969K 857.0M 0.86G
access.log.6.gz
639532616 624543K 624.5M 0.62G
access.log.7.gz
641691039 626651K 626.7M 0.63G
access.log.8.gz
785241214 766837K 766.8M 0.77G
access.log.9.gz
794831058 776202K 776.2M 0.78G
access.log.10.gz
1509654226 1474271K 1474.3M 1.47G
access.log.11.gz
873576240 853101K 853.1M 0.85G
access.log.12.gz
738810081 721494K 721.5M 0.72G
access.log.13.gz
849639516 829726K 829.7M 0.83G
access.log.14.gz
1054760511 1030039K 1030.0M 1.03G

​#Web ​#EmacsWiki

Comments

(Please contact me if you want to remove your comment.)

1. Either backups and tarballs are really big and people or robots are retrieving them.

2. Oddmuse:Surge_Protection is disabled

3. Or Emacs 22 was released and the site’s popularity has increased.

Oddmuse:Surge_Protection

AaronHawley

Hm... There are robots making backup copies of the tarballs. That must be it...

Then again:

aschroeder@thinkmo:~/logs$ zcat access.log.4.gz | traffic
-1 38875743K 38875.7M 38.88G
aschroeder@thinkmo:~/logs$ zcat access.log.4.gz | grep '/archives/.*\.tar\.gz' | traffic
1831687078 1788756K 1788.8M 1.79G

That’s a very small part. More investigation is called for...

A leech?

aschroeder@thinkmo:~$ zcat access.log.4.gz | leech-detector | head
         66.249.73.6     254504    11K  11%   13.5s  200 (87%), 404 (3%), 302 (3%), 301 (2%), 304 (1%), 503 (0%), 501 (0%), 403 (0%), 400 (0%), 500 (0%)
        64.1.215.164      30048     9K   1%  113.3s  200 (92%), 404 (5%), 302 (1%), 400 (0%), 501 (0%), 503 (0%), 301 (0%)
        65.214.44.29      23893    21K   1%  143.7s  200 (78%), 304 (20%), 503 (0%), 301 (0%)
        65.55.209.49      17042     1K   0%  201.2s  403 (96%), 200 (2%), 301 (1%), 302 (0%), 404 (0%)
     216.255.229.250      15564    13K   0%  164.0s  200 (97%), 302 (1%), 301 (0%), 404 (0%), 501 (0%), 500 (0%), 400 (0%), 503 (0%)
        64.1.215.165      14865     1K   0%  205.7s  403 (98%), 200 (1%), 301 (0%), 404 (0%), 302 (0%), 400 (0%)
       65.55.212.190      14113    11K   0%   63.1s  200 (62%), 503 (23%), 404 (7%), 301 (3%), 302 (2%), 400 (0%), 501 (0%), 500 (0%), 403 (0%)
        65.55.209.50      13861     1K   0%  247.5s  403 (95%), 200 (2%), 301 (1%), 302 (0%), 400 (0%), 404 (0%)
        38.99.44.104      13854     1K   0%  247.6s  403 (99%), 200 (0%), 301 (0%), 404 (0%)
        65.55.209.51      12940     3K   0%  265.1s  403 (95%), 200 (2%), 301 (1%), 302 (0%), 404 (0%)

This is data from my leech-detector script. The second column is the number of hits. Thats about 250’000 hits in 40 days for Google – 260 hits per hours. Hm...

leech-detector

||66.249.64.0 - 66.249.95.255||Google|| ||:--:||:--:|| ||64.0.0.0 - 64.3.255.255||XO Communications – what are they up to?|| ||65.192.0.0 - 65.223.255.255||MCI Communications Services, Inc. d/b/a Verizon Business|| ||65.52.0.0 - 65.55.255.255||Microsoft Corp|| ||216.255.224.0 - 216.255.239.255||International Digital Communications, Inc.||

– Alex Schroeder 2007-11-15 18:09 UTC

Alex Schroeder