Heisenbugs … they're everywhere!

So I ran the greylist daemon [1] for over eight hours under valgrind [2] without it once hanging. I then restarted the server, this time running alone.

A few hours later, it hung.

And just for the record, when I normally attach to the running processing using gdb, it's where I would expect it to be:

>
```
(gdb) where
#0 0x008067a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x003c2dd1 in recvfrom () from /lib/tls/libc.so.6
#2 0x08049411 in mainloop (sock=0) at src/main.c:88
#3 0x080493a6 in main (argc=1, argv=0xbfe5c084) at src/main.c:68
```

but when the process hangs:

>
```
(gdb) where
#0 0x00dff7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x00955e5e in __lll_mutex_lock_wait () from /lib/tls/libc.so.6
#2 0x008e7e4f in _L_mutex_lock_10231 () from /lib/tls/libc.so.6
#3 0x00000000 in ?? ()
```

I have no clue as to what's going on (and neither does gdb apparently). Running the program under valgrind obviously changes the environment, enough to mask the condition that causes the bug in the first place.

This is proving to be a very difficult bug to find.

[1] /boston/2007/08/16.1

[2] /boston/2007/10/15.1

Gemini Mention this post

Contact the author