I get an access logfile for each of the last fourteen days on emacswiki.org. I use a little Perl script I wrote (traffic) to add up all the bytes, and here’s what I get: **0.89G per day**.
Strange. I wonder why this is. I was using an estimate of about 10G per month. At the current rate we’ll be using between 25G and 30G per month. I wonder what my web host thinks about that.
Strangely enough this data is all from the end of September and the beginning of October. I’ll ask my web host whether log rotation is broken.
I also wonder whether I’ve made a mistake somewhere.
Here’s the raw data:
access.log.1 1217209942 1188681K 1188.7M 1.19G access.log.2.gz 784404579 766020K 766.0M 0.77G access.log.3.gz 1257662319 1228185K 1228.2M 1.23G access.log.4.gz 759761181 741954K 742.0M 0.74G access.log.5.gz 877536376 856969K 857.0M 0.86G access.log.6.gz 639532616 624543K 624.5M 0.62G access.log.7.gz 641691039 626651K 626.7M 0.63G access.log.8.gz 785241214 766837K 766.8M 0.77G access.log.9.gz 794831058 776202K 776.2M 0.78G access.log.10.gz 1509654226 1474271K 1474.3M 1.47G access.log.11.gz 873576240 853101K 853.1M 0.85G access.log.12.gz 738810081 721494K 721.5M 0.72G access.log.13.gz 849639516 829726K 829.7M 0.83G access.log.14.gz 1054760511 1030039K 1030.0M 1.03G
#Web #EmacsWiki
(Please contact me if you want to remove your comment.)
⁂
1. Either backups and tarballs are really big and people or robots are retrieving them.
2. Oddmuse:Surge_Protection is disabled
3. Or Emacs 22 was released and the site’s popularity has increased.
Hm... There are robots making backup copies of the tarballs. That must be it...
Then again:
aschroeder@thinkmo:~/logs$ zcat access.log.4.gz | traffic -1 38875743K 38875.7M 38.88G aschroeder@thinkmo:~/logs$ zcat access.log.4.gz | grep '/archives/.*\.tar\.gz' | traffic 1831687078 1788756K 1788.8M 1.79G
That’s a very small part. More investigation is called for...
A leech?
aschroeder@thinkmo:~$ zcat access.log.4.gz | leech-detector | head 66.249.73.6 254504 11K 11% 13.5s 200 (87%), 404 (3%), 302 (3%), 301 (2%), 304 (1%), 503 (0%), 501 (0%), 403 (0%), 400 (0%), 500 (0%) 64.1.215.164 30048 9K 1% 113.3s 200 (92%), 404 (5%), 302 (1%), 400 (0%), 501 (0%), 503 (0%), 301 (0%) 65.214.44.29 23893 21K 1% 143.7s 200 (78%), 304 (20%), 503 (0%), 301 (0%) 65.55.209.49 17042 1K 0% 201.2s 403 (96%), 200 (2%), 301 (1%), 302 (0%), 404 (0%) 216.255.229.250 15564 13K 0% 164.0s 200 (97%), 302 (1%), 301 (0%), 404 (0%), 501 (0%), 500 (0%), 400 (0%), 503 (0%) 64.1.215.165 14865 1K 0% 205.7s 403 (98%), 200 (1%), 301 (0%), 404 (0%), 302 (0%), 400 (0%) 65.55.212.190 14113 11K 0% 63.1s 200 (62%), 503 (23%), 404 (7%), 301 (3%), 302 (2%), 400 (0%), 501 (0%), 500 (0%), 403 (0%) 65.55.209.50 13861 1K 0% 247.5s 403 (95%), 200 (2%), 301 (1%), 302 (0%), 400 (0%), 404 (0%) 38.99.44.104 13854 1K 0% 247.6s 403 (99%), 200 (0%), 301 (0%), 404 (0%) 65.55.209.51 12940 3K 0% 265.1s 403 (95%), 200 (2%), 301 (1%), 302 (0%), 404 (0%)
This is data from my leech-detector script. The second column is the number of hits. Thats about 250’000 hits in 40 days for Google – 260 hits per hours. Hm...
||66.249.64.0 - 66.249.95.255||Google|| ||:--:||:--:|| ||64.0.0.0 - 64.3.255.255||XO Communications – what are they up to?|| ||65.192.0.0 - 65.223.255.255||MCI Communications Services, Inc. d/b/a Verizon Business|| ||65.52.0.0 - 65.55.255.255||Microsoft Corp|| ||216.255.224.0 - 216.255.239.255||International Digital Communications, Inc.||
– Alex Schroeder 2007-11-15 18:09 UTC