2014-12-24 Emacs Wiki Migration

(Continued from yesterday.)

Current status:

Read-only server is up.
Monit restarts Apache practically every hour because of load.
As far as I am concerned, using nginx as a caching proxy isn’t as effective as some people have been suggesting.
I’m noticing monit restarting apache practically every hour. Is some other user on this shared host running an hourly job that borks the system? I must have misconfigured Apache such that the resulting performance dip results in a death spiral.

Here’s the munin graph that makes me suspect that I’m not getting all of the CPU I need. The gaps happen when load is so high that munin doesn’t get to run. Yikes!

https://alexschroeder.ch/pics/15911905077_b82b4056a0_o.png

*Field**: Info

*system**: CPU time spent by the kernel in system activities

*user**: CPU time spent by normal programs and daemons

*nice**: CPU time spent by nice(1)d programs

*idle**: Idle CPU time

*iowait**: CPU time spent waiting for I/O operations to finish when there is nothing else to do.

*irq**: CPU time spent handling interrupts

*softirq**: CPU time spent handling “batched” interrupts

*steal**: The time that a virtual CPU had runnable tasks, but the virtual CPU itself was not running

I think what happens is that somebody else on this server is taking up %CPU and this results in Apache processes piling up.

Here’s from the log files:

Dec 24 09:46:37 localhost monit[29022]: 'apache' trying to restart
Dec 24 09:46:37 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
Dec 24 09:46:38 localhost sm-mta[27076]: rejecting connections on daemon MTA-v4: load average: 71
...
Dec 24 13:02:55 localhost monit[29022]: 'apache' trying to restart
Dec 24 13:02:55 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
Dec 24 13:03:01 localhost sm-mta[27076]: rejecting connections on daemon MTA-v4: load average: 37
...
Dec 24 15:55:35 localhost monit[29022]: 'apache' trying to restart
Dec 24 15:55:35 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
Dec 24 15:55:43 localhost sm-mta[27076]: rejecting connections on daemon MTA-v4: load average: 27
...
Dec 24 16:54:16 localhost monit[29022]: 'apache' trying to restart
Dec 24 16:54:16 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
Dec 24 16:54:25 localhost sm-mta[27076]: rejecting connections on daemon MTA-v4: load average: 34
...
Dec 24 17:56:52 localhost monit[29022]: 'apache' trying to restart
Dec 24 17:56:52 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
...
Dec 24 21:56:02 localhost monit[29022]: 'apache' trying to restart
Dec 24 21:56:02 localhost monit[29022]: 'apache' stop: /etc/init.d/apache2
Dec 24 21:56:03 localhost sm-mta[27076]: rejecting connections on daemon MTA-v4: load average: 37

OK, so how do I make Apache more robust against those spikes?

The current settings:

1. StartServers: initial number of server processes to start
1. MinSpareThreads: minimum number of worker threads which are kept spare
1. MaxSpareThreads: maximum number of worker threads which are kept spare
1. ThreadLimit: ThreadsPerChild can be changed to this maximum value during a
1.              graceful restart. ThreadLimit can only be changed by stopping
1.              and starting Apache.
1. ThreadsPerChild: constant number of worker threads in each server process
1. MaxClients: maximum number of simultaneous client connections
1. MaxRequestsPerChild: maximum number of requests a server process serves
1. <IfModule mpm_worker_module>
1.     StartServers          2
1.     MinSpareThreads      25
1.     MaxSpareThreads      75
1.     ThreadLimit          64
1.     ThreadsPerChild      25
1.     MaxClients          150
1.     MaxRequestsPerChild   0
1. </IfModule>
<IfModule mpm_worker_module>
    StartServers          2
    ServerLimit           3
    MinSpareThreads      25
    MaxSpareThreads      75
    ThreadLimit         100
    ThreadsPerChild     100
    MaxClients          300
    MaxRequestsPerChild  10000
</IfModule>

Nic said, I should drop those numbers to better match the cores I have available. Each thread actually running will do IO, so that’s an additional issue.

Two cores → no more than 2 servers. When a thread waits for the disk, another thread can run. How many more threads? Not many, because they will also need to use the disk. How about a 90% reduction: 10 threads per child. Looking at this German blog post again.

this German blog post

MaxClients = ServerLimit x ThreadsPerChild. OK.
MinSpareThreads = ThreadsPerChild _ 3 to handle spikes – but I think more threads jumping in will lead us to a death spiral. Keeping this lower than 30!_
MaxSpareThreads ≥ MinSpareThreads + ThreadsPerChild. StartServers 1 ServerLimit 2 ThreadsPerChild 10 MaxClients 20 MinSpareThreads 10 MaxSpareThreads 20 MaxRequestsPerChild 10000

☯

Hours later. It seems to have worked? Load stable, hovering around 2 – this makes sense.

https://alexschroeder.ch/pics/16100535652_84b896b86a_o.png

All is not perfect, unfortunately. Monit says Apache has 6h uptime. That means somebody restarted Apache during the night. Damn.

#Emacs #Wikis #Oddmuse #mod perl #Apache #devops #Administration