💾 Archived View for bbs.geminispace.org › u › byte › 11561 captured on 2024-08-18 at 20:32:07. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2024-07-09)
-=-=-=-=-=-=-
Re: "help debugging likely disk write issue?"
first thing that came into my mind is that you have some internal deadlock issue, or you run out of file descriptors.
either way, try to reproduce it separately from the server part and print out the errno and strerror( errno ). also maybe it's better to read/write with pipe in a non-blocking async way, so server speed won't be affected by the logger.
2023-11-07 · 9 months ago
@tjp did it work out?
☕️ Morgan · 2023-11-09 at 14:16:
Usually with this kind of issue I would try to add any kind of alternative log output possible ... aka "printf debugging" :) ... to learn more next time it happens. In cases where printf is not visible I would append to a file somewhere.
🌊 tjp [OP] · 2023-11-09 at 15:59:
Thanks for the suggestions @byte. I'd be really surprised if there was a deadlock issue in svlogd as that's a heavily tested and used program as it's the logger in runit. I also don't think it should run out of file descriptors (it'd have to leak them), but is there a way I can see file descriptor counts for a process in FreeBSD? That's the kind of debugging I'm looking for.
🌊 tjp [OP] · 2023-11-09 at 16:02:
Nothing new to report as it hasn't happened again since I first posted here (it generally runs for a few weeks before this lock-up occurs). I'm hoping to gain a little knowledge for when it does happen again: how can I inspect a running process, for example to see how many file descriptors it has open? Or whether it's currently blocked on a file write call, or has some other syscall open but not completed?
just strip everything from the server part, and write some logging calls in a loop there, this way you can reproduce it without waiting for a week and try to debug it.
🌊 tjp [OP] · 2023-11-13 at 18:21:
I finally figured this out.
Based on the frequency of the problem arising I thought it might be connected with svlogd's log rotation, and the svlogd manpage refers to a possibility of it locking up when an external log file processor fails. So to induce it more quickly I reduced the max log file size and generated a bunch of traffic to my server with a shell script.
I was using the !processor functionality of svlogd to run rotated log files through a script (`!gzip`), but also starting svlogd with `chpst -p 1`, which sets a soft limit of 1 process. I removed that limit and it works fine now.
help debugging likely disk write issue? — I have a server service running in a freebsd server I manage, which is connected via a pipe to a logging service which writes to disk. Periodically (generally after a few weeks), the server entirely stops doing anything. It doesn't respond to any requests, and even after a restart it doesn't bind to any ports and none of the usual startup log messages come through. The only way I've found to get it un-stuck is to restart the logging service in the...