Trying to figure out what causes the occasional load problems on mirabel.epfarms.org. Some days the websites I host there – the most important ones are campaignwiki.org and communitywiki.org – will not work. Why? I have no idea. This is a Debian shared hosting system.
Sometimes one of the sites gets hit by spammers. You’ll notice this by seeing a username you have never seen before running a lot of php-cgi processes.
In their htdocs, update or create the `.htaccess` file with the following to disable PHP for a particular site:
disable PHP for a particular site
php_flag engine off
Sometimes these people run multiple sites. Here’s how to show their environment.
When I connect via ssh, I like to run `top`. This is what I see minutes after a server restart:
top - 17:32:24 up 56 days, 2:38, 6 users, load average: 85.19, 44.79, 64.64 Tasks: 273 total, 2 running, 262 sleeping, 0 stopped, 9 zombie Cpu(s): 4.3%us, 3.4%sy, 0.0%ni, 0.0%id, 92.0%wa, 0.0%hi, 0.3%si, 0.0%st Mem: 2062832k total, 2043120k used, 19712k free, 5792k buffers Swap: 8388600k total, 3274276k used, 5114324k free, 35860k cached
Load is shooting up into the air again. Practically no processes are running. There’s a lot of *waiting* going on.
The admin tells me the raid is ok. This is not about failing disks.
alex@mirabel:~$ sudo mdadm --detail /dev/md0 /dev/md0: Version : 00.90.03 Creation Time : Mon Mar 24 03:42:25 2008 Raid Level : raid5 Array Size : 180153984 (171.81 GiB 184.48 GB) Used Dev Size : 60051328 (57.27 GiB 61.49 GB) Raid Devices : 4 Total Devices : 4 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun May 16 17:57:40 2010 State : active Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric Chunk Size : 64K UUID : dd3b4aab:46063cd1:e368bf24:bd0fce41 Events : 0.48049 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 2 8 33 2 active sync /dev/sdc1 3 8 49 3 active sync /dev/sdd1
Another friend told me to look at stalling HTTP requests.
alex@mirabel:~$ sudo lsof -i :80|wc -l 165
But how to fix it?
Twice now the admin said that the problem was the system running out of diskspace. I think I checked it the last time around using `df -h` and `df -i` but didn’t notice anything unusual.
A few days later, we had the same situation... `load average: 151.56` – `sudo lsof -i :80|wc -l` → 9 – so this is not a cause – and `df -h` which I wanted to study the last time says:
Filesystem Size Used Avail Use% Mounted on /dev/md0 171G 140G 23G 86% / varrun 1008M 120K 1008M 1% /var/run varlock 1008M 0 1008M 0% /var/lock udev 1008M 60K 1008M 1% /dev devshm 1008M 0 1008M 0% /dev/shm lrm 1008M 45M 964M 5% /lib/modules/2.6.24-24-server/volatile /dev/sdd4 8.2G 221M 7.6G 3% /boot tmpfs 1008M 45M 964M 5% /lib/modules/2.6.24-27-server/volatile
I don’t see a device running out of disk space.
There aren’t many processes running, either:
top - 12:50:21 up 58 days, 21:56, 5 users, load average: 63.10, 106.94, 103.96 Tasks: 109 total, 2 running, 103 sleeping, 0 stopped, 4 zombie Cpu(s): 0.3%us, 1.0%sy, 0.0%ni, 49.5%id, 49.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2062832k total, 303556k used, 1759276k free, 4780k buffers Swap: 8388600k total, 2850064k used, 5538536k free, 40736k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32309 root 20 0 158m 16m 3168 D 2 0.8 0:00.22 apache2 21657 mysql 18 -2 3720m 59m 2340 S 0 2.9 234:30.98 mysqld 32211 alex 20 0 18992 1392 936 R 0 0.1 0:00.44 top 1 root 20 0 4020 140 64 S 0 0.0 0:18.47 init 2 root 15 -5 0 0 0 S 0 0.0 0:00.05 kthreadd 3 root RT -5 0 0 0 S 0 0.0 0:28.23 migration/0 4 root 15 -5 0 0 0 S 0 0.0 0:56.75 ksoftirqd/0 5 root RT -5 0 0 0 S 0 0.0 0:01.36 watchdog/0 6 root RT -5 0 0 0 S 0 0.0 0:50.89 migration/1 7 root 15 -5 0 0 0 S 0 0.0 0:28.72 ksoftirqd/1 8 root RT -5 0 0 0 S 0 0.0 0:01.06 watchdog/1 9 root 15 -5 0 0 0 S 0 0.0 1:38.27 events/0 10 root 15 -5 0 0 0 S 0 0.0 2:06.44 events/1 11 root 15 -5 0 0 0 S 0 0.0 0:00.01 khelper 44 root 15 -5 0 0 0 S 0 0.0 0:10.51 kblockd/0 45 root 15 -5 0 0 0 S 0 0.0 0:51.72 kblockd/1 48 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpid 49 root 15 -5 0 0 0 S 0 0.0 0:00.00 kacpi_notify 132 root 15 -5 0 0 0 S 0 0.0 0:00.00 kseriod 184 root 15 -5 0 0 0 S 0 0.0 18:39.70 kswapd0 227 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/0 228 root 15 -5 0 0 0 S 0 0.0 0:00.00 aio/1 1442 root 15 -5 0 0 0 S 0 0.0 0:00.00 ksuspend_usbd 1445 root 15 -5 0 0 0 S 0 0.0 0:00.00 khubd 1472 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/0 1474 root 15 -5 0 0 0 S 0 0.0 0:00.00 ata/1 etc.
Hm, strange. Load seems to be dropping fast! `load average: 13.22, 77.95, 93.90` – I wonder what’s going on!
Must remember something like the following:
(Please contact me if you want to remove your comment.)
⁂
Hopefully all is back and working just fine once more. Thanks for your persistence, Alex :D
– greywulf 2010-05-17 05:52 UTC
---
The problem with virtualized hosts is that it can be other virtual hosts (or the machine hosting the virtual hosts) causing the load and then you have no way to control your virtual machines’ behavior at all.
– Harald Wagener 2010-05-17 13:12 UTC