I’m having problems with the git integration. The problem is that the user running git commands on the new server will be both www-data and alex.
On #git, I was told make sure all the permissions for the files in the .git are set up correctly.
sudo find .git -type f -exec chmod g+rw {} \; git config core.sharedRepository true
And it worked! Thanks, Seveas.
☯
Wow, what an exercise in humility this has been! 😨 I had switched the DNS entry and was awaiting the slow increase in traffic, wondering how this system would handle it. Thomas Waldmann of Waldmann had hosted the Emacs Wiki since the very early days and in recent years he had kept asking for me to switch from CGI to FastCGI. It just never seemed that urgent. I was about to learn *how damn urgent* it was!
Soon, load climbed to 40 and 50. It was getting hard to use Emacs. Load kept on climbing. When it reached 198 I gave up fiddling with the rewrite rules and just killed Apache. There was no point.
When the system had recovered, I started working on FCGI support. I was starting with some old info on the Oddmuse wiki and learning as a I went. The result is on the Using mod_fastcgi page. Rewrite rules weren’t working correctly. The language stuff was interfering. Holy cow and I just kept on hacking.
The status right now:
1. The *other* sites are up (such as this one) and that’s already a win. 🙂
2. I can read the pages on Emacs Wiki. It is slow but it works. 👍
3. Sometimes I get a 503 error. 👎
4. When I’m on the server, running Emacs and stuff, these apps will sometimes get killed for no apparent reason. Out of memory errors? 😢
5. I realized that syslog had been uninstalled and the replacement rsyslog wasn’t running—that’s why I still don’t know whether this is a memory problem. 😣
6. I upgraded the hosting package but I’m not sure what I need to increase to solve that problem. Perhaps it can be resolved using some of the many fcgid parameters? 😪
I’m currently using:
FcgidMaxProcesses 20 FcgidProcessLifeTime 300 FcgidMaxRequestsPerProcess 100
`top` lists a lot of zombies. The default for the FcgidZombieScanInterval directive is supposed to be 3s. I wonder why they keep hanging around? Oh well.
1. ps aux | grep fcgi www-data 17386 0.0 0.0 0 0 ? Z 19:55 0:05 [_emacs.fcgi] <defunct> www-data 17394 0.0 0.0 0 0 ? Z 19:55 0:05 [_emacs.fcgi] <defunct> www-data 20349 0.0 0.0 0 0 ? Z 20:16 0:04 [_emacs.fcgi] <defunct> www-data 20420 0.0 0.0 0 0 ? Z 20:17 0:03 [_emacs.fcgi] <defunct> www-data 20438 0.1 0.0 0 0 ? Z 20:17 0:05 [_emacs.fcgi] <defunct> www-data 20969 0.0 0.0 0 0 ? Z 20:22 0:00 [_emacs.fcgi] <defunct> www-data 20970 0.0 0.0 0 0 ? Z 20:22 0:00 [_emacs.fcgi] <defunct> www-data 21394 0.0 0.0 0 0 ? Z 20:26 0:02 [_emacs.fcgi] <defunct> www-data 21446 0.1 0.0 0 0 ? Z 20:27 0:05 [_emacs.fcgi] <defunct> www-data 23966 0.1 0.0 0 0 ? Z 20:47 0:04 [_emacs.fcgi] <defunct> www-data 24036 0.7 0.0 0 0 ? Z 20:48 0:19 [_emacs.fcgi] <defunct> www-data 24050 0.2 0.0 0 0 ? Z 20:49 0:06 [_emacs.fcgi] <defunct> www-data 24055 0.0 0.0 0 0 ? Z 20:49 0:00 [_emacs.fcgi] <defunct> www-data 24056 0.0 0.0 0 0 ? Z 20:49 0:00 [_emacs.fcgi] <defunct> www-data 24475 0.1 0.0 0 0 ? Z 20:50 0:03 [_emacs.fcgi] <defunct> www-data 24482 0.9 0.0 0 0 ? Z 20:51 0:23 [_emacs.fcgi] <defunct> www-data 24483 1.2 0.0 0 0 ? Z 20:51 0:32 [_emacs.fcgi] <defunct> www-data 24486 2.7 0.0 0 0 ? Z 20:51 1:08 [_emacs.fcgi] <defunct> www-data 24581 0.2 0.0 0 0 ? Z 20:54 0:05 [_emacs.fcgi] <defunct> www-data 26669 0.3 0.0 0 0 ? Z 21:02 0:05 [_emacs.fcgi] <defunct>
I wonder how Nic Ferrier and friends will handle this load once they take over Emacs Wiki. After all, that’s their long term plan. See this page from January 2013, for example.
☯
This unexpected quitting while I’m on the server is disconcerting. I’m going to try this, now:
FcgidMaxProcesses 5 FcgidProcessLifeTime 300 FcgidMaxRequestsPerProcess 100 FcgidZombieScanInterval 3
☯
More issues from the Apache error log:
[Fri Dec 19 00:15:48 2014] [warn] [client 68.180.228.96] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 00:15:49 2014] [error] [client 162.243.99.58] (13)Permission denied: exec of '/home/alex/emacswiki.org/emacs.pl' failed [Fri Dec 19 00:15:49 2014] [error] [client 162.243.99.58] Premature end of script headers: emacs.pl [Fri Dec 19 00:15:53 2014] [warn] [client 91.39.71.18] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 00:16:04 2014] [error] [client 65.49.14.147] File does not exist: /home/alex/emacswiki.org/cgi-bin, referer: http://www.emacswiki.org/test/?action=browse;diff=2;id=Comments_on_HomePage [Fri Dec 19 00:16:05 2014] [error] [client 174.36.228.156] (13)Permission denied: exec of '/home/alex/emacswiki.org/emacs.pl' failed [Fri Dec 19 00:16:05 2014] [error] [client 174.36.228.156] Premature end of script headers: emacs.pl
😲
“mod_fcgid: can’t apply process slot” → “This warning tells you that the FastCGI process pool is exhausted and it has a global limit of FcgidMaxProcesses and a per-script limit of FcgidMaxProcessesPerClass.” ¹
I guess 5 is not a good number?
Recommendation found elsewhere ², but I need to think about this.
☯
I keep starting at the error.log. Looking at a single IP address:
[Fri Dec 19 08:42:37 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 08:42:43 2014] [warn] [client XXX] mod_fcgid: error reading data, FastCGI server closed connection [Fri Dec 19 08:42:43 2014] [error] [client XXX] Premature end of script headers: _emacs.fcgi
I have two cores and 2GB RAM in this openvz. Experimenting some more.
1. This determines how many processes each user can run: FcgidMinProcessesPerClass 0 FcgidMaxProcessesPerClass 2 1. The following depends on the RAM we have on the machine: FcgidMaxProcesses 10 1. Lifetime control: FcgidIdleTimeout 60 FcgidIdleScanInterval 30 FcgidProcessLifeTime 120 1. Restart after a while FcgidMaxRequestsPerProcess 1000 FcgidZombieScanInterval 3
The result is always the same: After restarting Apache, the site works for a while. I still have the feeling that eventually it stops working.
Let’s look at the processes over time:
alex@kallobombus:~$ ps aux | grep _emacs www-data 10594 0.0 0.0 50160 1672 ? S Dec18 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 26812 0.7 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 28892 112 77.2 1680376 1619920 ? R 08:50 0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps aux | grep _emacs www-data 10594 0.0 0.0 50160 1672 ? S Dec18 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 26812 0.6 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 28898 118 85.9 1867532 1801836 ? R 08:50 0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps aux | grep _emacs www-data 10594 0.0 0.0 50160 1672 ? S Dec18 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 26812 0.6 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 28902 74.3 97.4 2234656 2044004 ? R 08:50 0:04 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps aux | grep _emacs www-data 10594 0.0 0.0 50160 1672 ? S Dec18 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 26812 0.6 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 28911 84.6 51.7 1142412 1084660 ? R 08:51 0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps aux | grep _emacs www-data 10594 0.0 0.0 50160 1672 ? S Dec18 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 26812 0.6 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 28916 67.5 97.5 2375680 2046584 ? R 08:51 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
So, what am I seeing?
1. PID 10594 keeps sleeping.
2. PID 26812 keeps being a zombie.
3. the third slot keeps doing all the work
I’m going to `kill` them.
alex@kallobombus:~$ sudo kill 10594 alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29457 73.5 95.8 2251504 2010956 ? R 08:56 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ sudo kill 26812 alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29468 65.1 97.6 2375688 2048860 ? R 08:56 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ sudo kill -9 26812 alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29475 71.7 46.2 1010184 970136 ? R 08:56 0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29475 64.8 97.5 2248324 2046564 ? D 08:56 0:05 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29489 66.8 97.0 2251764 2034580 ? D 08:57 0:06 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29502 37.6 0.0 0 0 ? Z 08:57 0:01 [_emacs.fcgi] <defunct> alex@kallobombus:~$ ps -U www-data u | grep _emacs www-data 26812 0.5 0.0 0 0 ? Z 08:33 0:07 [_emacs.fcgi] <defunct> www-data 29507 55.5 0.8 47356 18444 ? R 08:57 0:01 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
What am I seeing?
1. the zombie stays and isn’t cleaned up
2. no new fcgi process was started
I’m going to determine the zombie’s parent and kill it, too.
alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm | head PID PPID COMMAND 24999 24998 bash 25019 25018 bash 26746 12667 /usr/sbin/apach 26747 12667 /usr/sbin/apach 26812 26747 _emacs.fcgi <defunct> 26814 12667 /usr/sbin/apach 28394 25019 sudo 28395 28394 emacs 28486 12667 /usr/sbin/apach alex@kallobombus:~$ sudo kill 26747 alex@kallobombus:~$ ps -U www-data u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 31118 0.0 0.2 99856 4480 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31119 0.0 0.2 101008 4460 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31120 0.2 0.4 382116 8700 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31121 0.6 0.5 581604 11648 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31180 17.8 0.9 48168 19256 ? S 09:03 0:02 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 31184 39.2 84.1 1833296 1765244 ? R 09:03 0:03 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi alex@kallobombus:~$ ps -U www-data u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 31118 0.0 0.0 99856 1844 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31119 0.0 0.0 101008 1988 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31120 0.2 0.1 382116 4120 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31121 0.4 0.1 581604 3840 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31180 21.3 75.1 1647300 1575160 ? R 09:03 0:04 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 31184 31.6 0.0 0 0 ? Z 09:03 0:06 [_emacs.fcgi] <defunct> alex@kallobombus:~$ ps -U www-data u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 31118 0.0 0.0 99856 392 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31119 0.0 0.0 101008 624 ? S 09:03 0:00 /usr/sbin/apache2 -k start www-data 31120 0.2 0.0 382116 1736 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31121 0.4 0.1 581628 2156 ? Sl 09:03 0:00 /usr/sbin/apache2 -k start www-data 31180 24.0 96.9 2303540 2033992 ? R 09:03 0:06 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi www-data 31197 32.0 0.8 44444 17060 ? R 09:03 0:00 /usr/bin/perl /home/alex/emacswiki.org/_emacs.fcgi
What am I seeing?
1. the two fcgi processes are running
2. the two fcgi processes are sharing %CPU
Back to the error log. I still see several of these per minute. But the error frequency seems to be much lower.
[Fri Dec 19 09:09:09 2014] [warn] [client XXX] mod_fcgid: error reading data, FastCGI server closed connection [Fri Dec 19 09:09:09 2014] [error] [client XXX] Premature end of script headers: _emacs.fcgi
Let’s look at the output of `top`:
top - 09:11:29 up 56 days, 1:49, 1 user, load average: 2.21, 1.55, 1.18 Tasks: 26 total, 3 running, 23 sleeping, 0 stopped, 0 zombie %Cpu(s): 24.5 us, 10.8 sy, 0.0 ni, 64.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.4 st KiB Mem: 2097152 total, 2093688 used, 3464 free, 0 buffers KiB Swap: 524288 total, 421636 used, 102652 free, 292 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 32061 www-data 20 0 128m 97m 1672 R 30.7 4.8 0:00.93 _emacs.fcgi 32053 www-data 20 0 2323m 1.9g 1672 R 26.4 93.3 0:04.98 _emacs.fcgi 31593 www-data 20 0 374m 2348 1508 S 1.3 0.1 0:00.66 /usr/sbin/apach 31121 www-data 20 0 569m 1896 1488 S 1.0 0.1 0:01.45 /usr/sbin/apach 31120 www-data 20 0 440m 2864 1640 S 0.7 0.1 0:01.24 /usr/sbin/apach 13266 root 20 0 51560 1892 1724 S 0.3 0.1 0:02.76 munin-node
What am I seeing?
1. I have two cores and load is around 2, I guess that’s good
2. I have very little memory free – an upgrade might be worth it
3. I still have two fcgi processes running, sharing %CPU, so that’s good
The site is still incredibly slow.
I got rid of the “Negotiation: discovered file(s) matching request: `/home/alex/emacswiki.org/emacs` (None could be negotiated).” error that had shown up after another tweak of my `.htaccess` file by disabling MultiViews. That’s the part that will run “emacs.pl” when requesting “emacs”, I guess.
I also noticed that after a `sudo service apache graceful` I’m getting a lot of those “mod_fcgid: can’t apply process slot” errors again. Then I thought that a `sudo service apache restart` seemed to fix it. But no. After a while:
[Fri Dec 19 09:36:22 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:23 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:24 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:25 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:27 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:30 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi [Fri Dec 19 09:36:32 2014] [warn] [client XXX] mod_fcgid: can't apply process slot for /home/alex/emacswiki.org/_emacs.fcgi
What are the fcgi processes doing?
alex@kallobombus:~$ ps -U www-data u USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND www-data 1819 0.0 0.0 99856 388 ? S 09:31 0:00 /usr/sbin/apache2 -k start www-data 1820 0.1 0.0 101008 572 ? S 09:31 0:00 /usr/sbin/apache2 -k start www-data 1821 0.1 0.3 385656 7504 ? Sl 09:31 0:00 /usr/sbin/apache2 -k start www-data 1880 0.1 0.3 384996 8060 ? Sl 09:31 0:00 /usr/sbin/apache2 -k start www-data 1965 7.0 0.0 0 0 ? Z 09:33 0:20 [_emacs.fcgi] <defunct> www-data 1968 1.9 0.0 0 0 ? Z 09:33 0:05 [_emacs.fcgi] <defunct> www-data 1973 0.1 0.3 383852 8024 ? Sl 09:33 0:00 /usr/sbin/apache2 -k start www-data 2013 0.1 0.3 383692 8152 ? Sl 09:34 0:00 /usr/sbin/apache2 -k start
Apparently, only a complete *sudo apachectl stop* will do what is required. The apache2 parent process isn’t reaping the zombies, and that’s why they hang around. It’s only if the parent is itself totally stopped that reaping happens and new fcgi processes will get started. How annoying!
The reason for the zombie epidemic remains unknown. Restart:
alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm PID PPID COMMAND 2941 2940 bash 3150 2491 /usr/sbin/apach 3151 2491 /usr/sbin/apach 3152 2491 /usr/sbin/apach 3153 2491 /usr/sbin/apach 3237 3151 _emacs.fcgi 3264 3151 _emacs.fcgi 3268 3152 cw-en.pl 3269 2941 ps
Wait for a bit:
alex@kallobombus:~$ ps -U www-data ao pid,ppid,comm PID PPID COMMAND 3150 2491 /usr/sbin/apach 3151 2491 /usr/sbin/apach 3153 2491 /usr/sbin/apach 3660 3151 _emacs.fcgi <defunct> 3675 3151 _emacs.fcgi <defunct> 3679 2491 /usr/sbin/apach 3714 2491 /usr/sbin/apach 3759 3758 bash 3798 3153 cw-en.pl <defunct> 3799 3759 ps
Why?
Well... Running `sudo service apache2 restart` and waiting for it to happen again. If you have any ideas, let me know!
alex@kallobombus:~/emacswiki/git$ ps -U www-data ao pid,ppid,comm PID PPID COMMAND 3759 3758 bash 7229 7225 /usr/sbin/apach 7230 7225 /usr/sbin/apach 7231 7225 /usr/sbin/apach 7235 7225 /usr/sbin/apach 7289 7230 _emacs.fcgi 7290 7230 _emacs.fcgi 7291 3759 ps
Soon enough:
alex@kallobombus:~/emacswiki/git$ ps -U www-data ao pid,ppid,comm PID PPID COMMAND 3759 3758 bash 7229 7225 /usr/sbin/apach 7230 7225 /usr/sbin/apach 7235 7225 /usr/sbin/apach 7626 7230 _emacs.fcgi <defunct> 7633 7225 /usr/sbin/apach 7666 7230 _emacs.fcgi <defunct> 7691 7225 /usr/sbin/apach 7734 3759 ps
Gaaah! Also note this thread: Unreaped Zombie Children Of Mod_fcgid. Sounds like my problem. I’m running Apache 2.2.22-13+deb7u3 and mod_fcgid 2.3.6-1.2+deb7u1.
Unreaped Zombie Children Of Mod_fcgid
Another theory: I restarted Apache, waited for a few seconds, got booted off my server (out of memory?), reconnected, and found the fcgi processes defunct. Perhaps this is related?
I just wonder whether upgrading from 2GB of RAM to 4GB of RAM will “magically” fix this.
Continued 2014-12-19 Emacs Wiki Migration.
2014-12-19 Emacs Wiki Migration
#Emacs #Git #Oddmuse #Administration #devops
(Please contact me if you want to remove your comment.)
⁂
Hello, why do you use apache, instead of nginx? For these tasks, nginx is best suited. And memory consumes much less and works faster.
– Denis Evsyukov 2014-12-21 06:31 UTC
---
I would have to relearn everything I know about Apache.
– Alex Schroeder 2014-12-21 08:52 UTC
---
May the Force be with you!
– AlexDaniel 2014-12-22 08:33 UTC
---
Thanks! 😄
To quote from 30 Most Memorable ‘Star Wars’ Quotes: “What a piece of junk!”
30 Most Memorable ‘Star Wars’ Quotes
– Alex Schroeder 2014-12-22 12:25 UTC