Did I repeat myself? Did I repeat myself? Did I repeat myself? Let me fix that.

“Did you mean to post four copies of your post [1]?” asked Bunny.

“No, why?”

“Because there are four copies of your post [2], that's why,” said Bunny.

“How do you link to the post in question like that when you speak?” I asked. She shook her head and walked away. “Bunny?”

* * * * *

So yes, there was a glitch last night. The only way for four copies to show up like that would be for four copies to be sent via email (which is my preferred interface). And in checking the logs, I did indeed see four copies show up:

>
```
Jan 11 05:03:58 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=3, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x09cf8d38 *** )
Jan 11 05:27:10 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=1395, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x08646d38 *** )
Jan 11 06:00:29 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=3394, status=deferred (Command died with signal 6: "/home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new". Command output: *** glibc detected *** double free or corruption (fasttop): 0x0934bd38 *** )
Jan 11 07:07:09 brevard postfix/local: 084412EB72E4: to=<XXXXXXXXXXXXXXXXXXXXX>, relay=local, delay=7394, status=sent (delivered to command: /home/spc/web/sites/boston.conman.org/htdocs/boston.cgi --config /home/spc/web/sites/boston.conman.org/journal/boston.cnf --email --cmd new)
```

The fact that four showed up showed that mod_blog crashed when exiting. I find it disturbing that I've been running the latest version since the 1^st [3] and this is the first time it crashed. Not only that, but it crashed three times in a row then worked.

Such bugs are terrifying.

So, inspired by the crash reports in Redis [4], I decided to roll my own [5] and it was surprisingly easy given the output (from a test of the code):

>
```
CRASH: pid=6103 signal='Aborted'
CRASH: reason='Unspecified/untranslated error'
CRASH: CS=0073 DS=C02E007B ES=007B FS=0000 GS=0033
CRASH: EIP=00B767A2 EFL=00000246 ESP=BFE162F8 EBP=BFE1630C ESI=000017D7 EDI=00CBBFF4
CRASH: EAX=00000000 EBX=000017D7 ECX=000017D7 EDX=00000006
CRASH: UESP=BFE162F8 TRAPNO=00000000 ERR=00000000
CRASH: STACK DUMP
CRASH: BFE162F8: C5 78 BB 00 F4 BF CB 00
CRASH: BFE16300: F4 BF CB 00 00 00 00 00 C0 C6 F0 B7 3C 64 E1 BF
CRASH: BFE16310: 29 93 BB 00 06 00 00 00 20 63 E1 BF 00 00 00 00
CRASH: BFE16320: 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16330: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16340: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16350: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16360: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16370: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE16390: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
CRASH: BFE163A0: A7 C4 CB 00 A8 C4 CB 00 F4 BF CB 00 A7 C4 CB 00
CRASH: BFE163B0: 60 C4 CB 00 D8 63 E1 BF 73 D0 C4 00 F4 BF CB 00
CRASH: BFE163C0: 4F E7 BE 00 02 00 00 00 B8 C6 54 08 42 00 00 00
CRASH: BFE163D0: B8 C6 54 08 42 00 00 00 14 64 E1 BF 9C E9 BE 00
CRASH: BFE163E0: 60 C4 CB 00 B8 C6 54 08 42 00 00 00 32 30 31 34
CRASH: BFE163F0: A7 C4 CB 00 00 00 00 00
CRASH: STACK TRACE
CRASH: /home/spc/source/boston/build/boston[0x805d3c4]
CRASH: /home/spc/source/boston/build/boston[0x805daa2]
CRASH: /lib/tls/libc.so.6[0xbb79b0]
CRASH: /lib/tls/libc.so.6(abort+0xe9)[0xbb9329]
CRASH: /lib/tls/libc.so.6(__assert_fail+0x101)[0xbb0e41]
CRASH: /home/spc/source/boston/build/boston[0x8053f30]
CRASH: /home/spc/source/boston/build/boston[0x805d0b9]
CRASH: /home/spc/source/boston/build/boston(ChunkProcessStream+0xc8)[0x805d32e]
CRASH: /home/spc/source/boston/build/boston(ChunkProcess+0xcb)[0x805d240]
CRASH: /home/spc/source/boston/build/boston(generic_cb+0x74)[0x8053a68]
CRASH: /home/spc/source/boston/build/boston(pagegen_days+0x1c4)[0x8057f5c]
CRASH: /home/spc/source/boston/build/boston(generate_pages+0xd0)[0x8057bfc]
CRASH: /home/spc/source/boston/build/boston[0x80501bb]
CRASH: /home/spc/source/boston/build/boston(main_cli+0x1e5)[0x80500cd]
CRASH: /home/spc/source/boston/build/boston(main+0x133)[0x8051eb3]
CRASH: /lib/tls/libc.so.6(__libc_start_main+0xd3)[0xba4e93]
CRASH: /home/spc/source/boston/build/boston[0x804cdbd]
```

POSIX allows you to register a signal handler with additional contextual information. This contextual information includes the contents of the CPU (Central Processing Unit) registers and it's just a matter of finding the structure definition in a header file. Once I have the registers, it's not hard to get the stack dump.

The stack trace was even eaiser than that. It seems that the Linux C library comes with a few functions to do just that—backtrace() collects the return addresses off the stack; backtrace_symbols() will resolve the addresses back to function names and backtrace_symbols_fd() does the same, only it writes the informtaion to a file. I ended up using backtrace_symbols_fd() since I'm executing in the context of a signal handler, and I know that open(), read() and close() are marked as “async-signal-safe” (that is, safe to call in a signal handler) by POSIX, but malloc() and free() are not, and backtrace_symbols() is known to call malloc() (and thus, I would need to call free()) and I don't want to risk a crash in the crash report code. I'm taking a risk because backtrace() and backtrace_symbols_fd() could call non-async-signal-safe functions, but I'm doing as much as I can to get as much context as I can as safely as I can.

[1] /boston/2013/01/10.1

[2] /boston/2013/01/10.1

[3] /boston/2013/01/01.1

[4] http://antirez.com/news/43

[5] https://github.com/spc476/CGILib/blob/master/src/crashreport.c

Gemini Mention this post

Contact the author