2014-12-18 Emacs Wiki Migration

I’m having problems with the git integration. The problem is that the user running git commands on the new server will be both www-data and alex.

On #git, I was told make sure all the permissions for the files in the .git are set up correctly.

sudo find .git -type f -exec chmod g+rw {} \;
git config core.sharedRepository true

And it worked! Thanks, Seveas.

Seveas

☯

Wow, what an exercise in humility this has been! 😨 I had switched the DNS entry and was awaiting the slow increase in traffic, wondering how this system would handle it. Thomas Waldmann of Waldmann had hosted the Emacs Wiki since the very early days and in recent years he had kept asking for me to switch from CGI to FastCGI. It just never seemed that urgent. I was about to learn *how damn urgent* it was!

Waldmann

Soon, load climbed to 40 and 50. It was getting hard to use Emacs. Load kept on climbing. When it reached 198 I gave up fiddling with the rewrite rules and just killed Apache. There was no point.

When the system had recovered, I started working on FCGI support. I was starting with some old info on the Oddmuse wiki and learning as a I went. The result is on the Using mod_fastcgi page. Rewrite rules weren’t working correctly. The language stuff was interfering. Holy cow and I just kept on hacking.

Using mod_fastcgi

The status right now:

1. The *other* sites are up (such as this one) and that’s already a win. 🙂

2. I can read the pages on Emacs Wiki. It is slow but it works. 👍

3. Sometimes I get a 503 error. 👎

4. When I’m on the server, running Emacs and stuff, these apps will sometimes get killed for no apparent reason. Out of memory errors? 😢

5. I realized that syslog had been uninstalled and the replacement rsyslog wasn’t running—that’s why I still don’t know whether this is a memory problem. 😣

6. I upgraded the hosting package but I’m not sure what I need to increase to solve that problem. Perhaps it can be resolved using some of the many fcgid parameters? 😪

many fcgid parameters

I’m currently using:

FcgidMaxProcesses 20
FcgidProcessLifeTime 300
FcgidMaxRequestsPerProcess 100

`top` lists a lot of zombies. The default for the FcgidZombieScanInterval directive is supposed to be 3s. I wonder why they keep hanging around? Oh well.

1. ps aux | grep fcgi
www-data 17386  0.0  0.0      0     0 ?        Z    19:55   0:05 [_emacs.fcgi] <defunct>
www-data 17394  0.0  0.0      0     0 ?        Z    19:55   0:05 [_emacs.fcgi] <defunct>
www-data 20349  0.0  0.0      0     0 ?        Z    20:16   0:04 [_emacs.fcgi] <defunct>
www-data 20420  0.0  0.0      0     0 ?        Z    20:17   0:03 [_emacs.fcgi] <defunct>
www-data 20438  0.1  0.0      0     0 ?        Z    20:17   0:05 [_emacs.fcgi] <defunct>
www-data 20969  0.0  0.0      0     0 ?        Z    20:22   0:00 [_emacs.fcgi] <defunct>
www-data 20970  0.0  0.0      0     0 ?        Z    20:22   0:00 [_emacs.fcgi] <defunct>
www-data 21394  0.0  0.0      0     0 ?        Z    20:26   0:02 [_emacs.fcgi] <defunct>
www-data 21446  0.1  0.0      0     0 ?        Z    20:27   0:05 [_emacs.fcgi] <defunct>
www-data 23966  0.1  0.0      0     0 ?        Z    20:47   0:04 [_emacs.fcgi] <defunct>
www-data 24036  0.7  0.0      0     0 ?        Z    20:48   0:19 [_emacs.fcgi] <defunct>
www-data 24050  0.2  0.0      0     0 ?        Z    20:49   0:06 [_emacs.fcgi] <defunct>
www-data 24055  0.0  0.0      0     0 ?        Z    20:49   0:00 [_emacs.fcgi] <defunct>
www-data 24056  0.0  0.0      0     0 ?        Z    20:49   0:00 [_emacs.fcgi] <defunct>
www-data 24475  0.1  0.0      0     0 ?        Z    20:50   0:03 [_emacs.fcgi] <defunct>
www-data 24482  0.9  0.0      0     0 ?        Z    20:51   0:23 [_emacs.fcgi] <defunct>
www-data 24483  1.2  0.0      0     0 ?        Z    20:51   0:32 [_emacs.fcgi] <defunct>
www-data 24486  2.7  0.0      0     0 ?        Z    20:51   1:08 [_emacs.fcgi] <defunct>
www-data 24581  0.2  0.0      0     0 ?        Z    20:54   0:05 [_emacs.fcgi] <defunct>
www-data 26669  0.3  0.0      0     0 ?        Z    21:02   0:05 [_emacs.fcgi] <defunct>

I wonder how Nic Ferrier and friends will handle this load once they take over Emacs Wiki. After all, that’s their long term plan. See this page from January 2013, for example.

Nic Ferrier

this page

☯

This unexpected quitting while I’m on the server is disconcerting. I’m going to try this, now:

FcgidMaxProcesses 5
FcgidProcessLifeTime 300
FcgidMaxRequestsPerProcess 100
FcgidZombieScanInterval 3

☯

More issues from the Apache error log:

[Fri Dec 19 00:15:48 2014] [warn] [client 68.180.228.96] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 00:15:49 2014] [error] [client 162.243.99.58] (13)Permission denied: exec of '/home/alex/emacswiki.org/emacs.pl' failed
[Fri Dec 19 00:15:49 2014] [error] [client 162.243.99.58] Premature end of script headers: emacs.pl
[Fri Dec 19 00:15:53 2014] [warn] [client 91.39.71.18] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 00:16:04 2014] [error] [client 65.49.14.147] File does not exist: /home/alex/emacswiki.org/cgi-bin, referer: http://www.emacswiki.org/test/?action=browse;diff=2;id=Comments_on_HomePage
[Fri Dec 19 00:16:05 2014] [error] [client 174.36.228.156] (13)Permission denied: exec of '/home/alex/emacswiki.org/emacs.pl' failed
[Fri Dec 19 00:16:05 2014] [error] [client 174.36.228.156] Premature end of script headers: emacs.pl

😲

“mod_fcgid: can’t apply process slot” → “This warning tells you that the FastCGI process pool is exhausted and it has a global limit of FcgidMaxProcesses and a per-script limit of FcgidMaxProcessesPerClass.” ¹

I guess 5 is not a good number?

Recommendation found elsewhere ², but I need to think about this.

☯

I keep starting at the error.log. Looking at a single IP address:

[Fri Dec 19 08:42:37 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 08:42:43 2014] [warn] [client XXX] mod_fcgid: error reading data, FastCGI server closed connection
[Fri Dec 19 08:42:43 2014] [error] [client XXX] Premature end of script headers: _emacs.fcgi

I have two cores and 2GB RAM in this openvz. Experimenting some more.

1. This determines how many processes each user can run:
FcgidMinProcessesPerClass 0
FcgidMaxProcessesPerClass 2
1. The following depends on the RAM we have on the machine:
FcgidMaxProcesses 10
1. Lifetime control:
FcgidIdleTimeout 60
FcgidIdleScanInterval 30
FcgidProcessLifeTime 120
1. Restart after a while
FcgidMaxRequestsPerProcess 1000
FcgidZombieScanInterval 3

The result is always the same: After restarting Apache, the site works for a while. I still have the feeling that eventually it stops working.

Let’s look at the processes over time:

alex@kallobombus:~$ ps aux | grep _emacs
www-data 10594  0.0  0.0  50160  1672 ?        S    Dec18   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 26812  0.7  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 28892  112 77.2 1680376 1619920 ?     R    08:50   0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps aux | grep _emacs
www-data 10594  0.0  0.0  50160  1672 ?        S    Dec18   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 26812  0.6  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 28898  118 85.9 1867532 1801836 ?     R    08:50   0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps aux | grep _emacs
www-data 10594  0.0  0.0  50160  1672 ?        S    Dec18   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 26812  0.6  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 28902 74.3 97.4 2234656 2044004 ?     R    08:50   0:04 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps aux | grep _emacs
www-data 10594  0.0  0.0  50160  1672 ?        S    Dec18   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 26812  0.6  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 28911 84.6 51.7 1142412 1084660 ?     R    08:51   0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps aux | grep _emacs
www-data 10594  0.0  0.0  50160  1672 ?        S    Dec18   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 26812  0.6  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 28916 67.5 97.5 2375680 2046584 ?     R    08:51   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi

So, what am I seeing?

1. PID 10594 keeps sleeping.

2. PID 26812 keeps being a zombie.

3. the third slot keeps doing all the work

I’m going to `kill` them.

alex@kallobombus:~$ sudo kill 10594
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29457 73.5 95.8 2251504 2010956 ?     R    08:56   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ sudo kill 26812
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29468 65.1 97.6 2375688 2048860 ?     R    08:56   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ sudo kill -9 26812
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29475 71.7 46.2 1010184 970136 ?      R    08:56   0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29475 64.8 97.5 2248324 2046564 ?     D    08:56   0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29489 66.8 97.0 2251764 2034580 ?     D    08:57   0:06 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29502 37.6  0.0      0     0 ?        Z    08:57   0:01 [_emacs.fcgi] <defunct>
alex@kallobombus:~$ ps -U www-data u | grep _emacs
www-data 26812  0.5  0.0      0     0 ?        Z    08:33   0:07 [_emacs.fcgi] <defunct>
www-data 29507 55.5  0.8  47356 18444 ?        R    08:57   0:01 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi

What am I seeing?

1. the zombie stays and isn’t cleaned up

2. no new fcgi process was started

I’m going to determine the zombie’s parent and kill it, too.

alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm | head
  PID  PPID COMMAND
24999 24998 bash
25019 25018 bash
26746 12667 /usr/sbin/apach
26747 12667 /usr/sbin/apach
26812 26747 _emacs.fcgi <defunct>
26814 12667 /usr/sbin/apach
28394 25019 sudo
28395 28394 emacs
28486 12667 /usr/sbin/apach
alex@kallobombus:~$ sudo kill 26747
alex@kallobombus:~$ ps -U www-data u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www-data 31118  0.0  0.2  99856  4480 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31119  0.0  0.2 101008  4460 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31120  0.2  0.4 382116  8700 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31121  0.6  0.5 581604 11648 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31180 17.8  0.9  48168 19256 ?        S    09:03   0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 31184 39.2 84.1 1833296 1765244 ?     R    09:03   0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
alex@kallobombus:~$ ps -U www-data u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www-data 31118  0.0  0.0  99856  1844 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31119  0.0  0.0 101008  1988 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31120  0.2  0.1 382116  4120 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31121  0.4  0.1 581604  3840 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31180 21.3 75.1 1647300 1575160 ?     R    09:03   0:04 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 31184 31.6  0.0      0     0 ?        Z    09:03   0:06 [_emacs.fcgi] <defunct>
alex@kallobombus:~$ ps -U www-data u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www-data 31118  0.0  0.0  99856   392 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31119  0.0  0.0 101008   624 ?        S    09:03   0:00 /usr/sbin/apache2 -k start
www-data 31120  0.2  0.0 382116  1736 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31121  0.4  0.1 581628  2156 ?        Sl   09:03   0:00 /usr/sbin/apache2 -k start
www-data 31180 24.0 96.9 2303540 2033992 ?     R    09:03   0:06 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
www-data 31197 32.0  0.8  44444 17060 ?        R    09:03   0:00 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi

What am I seeing?

1. the two fcgi processes are running

2. the two fcgi processes are sharing %CPU

Back to the error log. I still see several of these per minute. But the error frequency seems to be much lower.

[Fri Dec 19 09:09:09 2014] [warn] [client XXX] mod_fcgid: error reading data, FastCGI server closed connection
[Fri Dec 19 09:09:09 2014] [error] [client XXX] Premature end of script headers: _emacs.fcgi

Let’s look at the output of `top`:

top - 09:11:29 up 56 days,  1:49,  1 user,  load average: 2.21, 1.55, 1.18
Tasks:  26 total,   3 running,  23 sleeping,   0 stopped,   0 zombie
%Cpu(s): 24.5 us, 10.8 sy,  0.0 ni, 64.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.4 st
KiB Mem:   2097152 total,  2093688 used,     3464 free,        0 buffers
KiB Swap:   524288 total,   421636 used,   102652 free,      292 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
32061 www-data  20   0  128m  97m 1672 R  30.7  4.8   0:00.93 _emacs.fcgi
32053 www-data  20   0 2323m 1.9g 1672 R  26.4 93.3   0:04.98 _emacs.fcgi
31593 www-data  20   0  374m 2348 1508 S   1.3  0.1   0:00.66 /usr/sbin/apach
31121 www-data  20   0  569m 1896 1488 S   1.0  0.1   0:01.45 /usr/sbin/apach
31120 www-data  20   0  440m 2864 1640 S   0.7  0.1   0:01.24 /usr/sbin/apach
13266 root      20   0 51560 1892 1724 S   0.3  0.1   0:02.76 munin-node

What am I seeing?

1. I have two cores and load is around 2, I guess that’s good

2. I have very little memory free – an upgrade might be worth it

3. I still have two fcgi processes running, sharing %CPU, so that’s good

The site is still incredibly slow.

I got rid of the “Negotiation: discovered file(s) matching request: `/home/alex/emacswiki.org/emacs` (None could be negotiated).” error that had shown up after another tweak of my `.htaccess` file by disabling MultiViews. That’s the part that will run “emacs.pl” when requesting “emacs”, I guess.

disabling MultiViews

I also noticed that after a `sudo service apache graceful` I’m getting a lot of those “mod_fcgid: can’t apply process slot” errors again. Then I thought that a `sudo service apache restart` seemed to fix it. But no. After a while:

[Fri Dec 19 09:36:22 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:23 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:24 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:25 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:27 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:30 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
[Fri Dec 19 09:36:32 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi

What are the fcgi processes doing?

alex@kallobombus:~$ ps -U www-data u
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
www-data  1819  0.0  0.0  99856   388 ?        S    09:31   0:00 /usr/sbin/apache2 -k start
www-data  1820  0.1  0.0 101008   572 ?        S    09:31   0:00 /usr/sbin/apache2 -k start
www-data  1821  0.1  0.3 385656  7504 ?        Sl   09:31   0:00 /usr/sbin/apache2 -k start
www-data  1880  0.1  0.3 384996  8060 ?        Sl   09:31   0:00 /usr/sbin/apache2 -k start
www-data  1965  7.0  0.0      0     0 ?        Z    09:33   0:20 [_emacs.fcgi] <defunct>
www-data  1968  1.9  0.0      0     0 ?        Z    09:33   0:05 [_emacs.fcgi] <defunct>
www-data  1973  0.1  0.3 383852  8024 ?        Sl   09:33   0:00 /usr/sbin/apache2 -k start
www-data  2013  0.1  0.3 383692  8152 ?        Sl   09:34   0:00 /usr/sbin/apache2 -k start

Apparently, only a complete *sudo apachectl stop* will do what is required. The apache2 parent process isn’t reaping the zombies, and that’s why they hang around. It’s only if the parent is itself totally stopped that reaping happens and new fcgi processes will get started. How annoying!

The reason for the zombie epidemic remains unknown. Restart:

alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm
  PID  PPID COMMAND
 2941  2940 bash
 3150  2491 /usr/sbin/apach
 3151  2491 /usr/sbin/apach
 3152  2491 /usr/sbin/apach
 3153  2491 /usr/sbin/apach
 3237  3151 _emacs.fcgi
 3264  3151 _emacs.fcgi
 3268  3152 cw-en.pl
 3269  2941 ps

Wait for a bit:

alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm
  PID  PPID COMMAND
 3150  2491 /usr/sbin/apach
 3151  2491 /usr/sbin/apach
 3153  2491 /usr/sbin/apach
 3660  3151 _emacs.fcgi <defunct>
 3675  3151 _emacs.fcgi <defunct>
 3679  2491 /usr/sbin/apach
 3714  2491 /usr/sbin/apach
 3759  3758 bash
 3798  3153 cw-en.pl <defunct>
 3799  3759 ps

Why?

Well... Running `sudo service apache2 restart` and waiting for it to happen again. If you have any ideas, let me know!

alex@kallobombus:~/emacswiki/git$ ps -U www-data ao pid,ppid,comm
  PID  PPID COMMAND
 3759  3758 bash
 7229  7225 /usr/sbin/apach
 7230  7225 /usr/sbin/apach
 7231  7225 /usr/sbin/apach
 7235  7225 /usr/sbin/apach
 7289  7230 _emacs.fcgi
 7290  7230 _emacs.fcgi
 7291  3759 ps

Soon enough:

alex@kallobombus:~/emacswiki/git$ ps -U www-data ao pid,ppid,comm
  PID  PPID COMMAND
 3759  3758 bash
 7229  7225 /usr/sbin/apach
 7230  7225 /usr/sbin/apach
 7235  7225 /usr/sbin/apach
 7626  7230 _emacs.fcgi <defunct>
 7633  7225 /usr/sbin/apach
 7666  7230 _emacs.fcgi <defunct>
 7691  7225 /usr/sbin/apach
 7734  3759 ps

Gaaah! Also note this thread: Unreaped Zombie Children Of Mod_fcgid. Sounds like my problem. I’m running Apache 2.2.22-13+deb7u3 and mod_fcgid 2.3.6-1.2+deb7u1.

Unreaped Zombie Children Of Mod_fcgid

Another theory: I restarted Apache, waited for a few seconds, got booted off my server (out of memory?), reconnected, and found the fcgi processes defunct. Perhaps this is related?

I just wonder whether upgrading from 2GB of RAM to 4GB of RAM will “magically” fix this.

Continued 2014-12-19 Emacs Wiki Migration.

2014-12-19 Emacs Wiki Migration

#Emacs #Git #Oddmuse #Administration #devops