💾 Archived View for thrig.me › tech › debugging.gmi captured on 2024-09-29 at 01:01:22. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-06-16)

-=-=-=-=-=-=-

On the Rejection of Debuggers

I do not not use debuggers. Rather, they are among the last tools I might employ when working a problem. Others may put debuggers more towards the fore of their tool-box.

GDB Briefly Entertained

The tool analogy is apt; not all tools suit a problem and opinions differ as to their use. Debuggers are a specific tool for specific tasks; given a problem, a debugger may not be relevant, or might fit only a small role within a larger problem. An example may help. Consider a problem in production. Sure, you could attach gdb to a random httpd process, but that would generally not be a first step in debugging a problem. Perhaps there was noise from a monitoring system. The on-call might look at a dashboard, or could check the logs. Time and resources are limited, or maybe the on-call is burnt out; a misbehaving httpd could simply be killed--assuming the issue is not due to DNS, a load balancer, power outage, or a false alarm.

A persistently misbehaving httpd might eventually have a debugger attached to it. This would probably happen well after strace, increased log levels, and various other steps are taken, or

    */5 * * * * /root/clear-memory-leaks

is something I have seen multiple times in production; for those who do not speak cron, that is code to kill the service every five minutes because it leaks too much memory, and anyways was never intended to run for so long or become so popular, and management never did approve the time nor resources required to fix it. What are you going to do? Attach your debugger to the political layer?

Assuming we limit our problems to just programming problems, gdb may still not be relevant; with a good problem report a programmer might understand what the problem is and know where to look in the code to fix it. No need for a debugger. Indeed, if one does not understand the codebase, gdb may be useless, many steps through a mysterious forest. In this case the issue should be turned over to someone who knows the code, one who may or may not use a debugger as they see fit. And would you learn a huge codebase with a debugger? Probably not.

Even where a debugger might be used, I am more prone to write a minimal test case and work on that. One method here is vi in one terminal and entr(1) in another; on file save entr re-runs some code, and I can then inspect the output and think about what to do next. Yes, thinking!

"I recognize this is largely a matter of style. Some people insist on line-by-line tool-driven debugging for everything. But I now believe that thinking--without looking at the code--is the best debugging tool of all, because it leads to better software." -- Rob Pike

One result of the many minimal test cases is several directories full of example uses of various functions, libraries, algorithms, data structures, etc in various languages which can be put to future study and use, not lost in some unversioned and unsearchable REPL backlog. With a minimal test case there is generally no problem sharing that should a bug or curiosity surface; this is not the case with a debugger attached to who knows what code under who knows what state. You would need to spend time to extract the relevant bits into a minimal example. Why not start with the minimal example?

As for live coding, I mostly reject that: debugging is hard enough without a persistent mutable global state complicating matters. Shell one liners might be considered live coding, but that code is by definition throwaway (no strict, no warnings, and mostly no tests) and is to solve some very specific and easy to visualize task. Anything more complicated or oft used will be rewritten into a script, ideally with tests and documentation. My shell histories are very short (64 entries, not saved); the shell configuration hasn't got much state; the shells are frequently reset to a known good state. Anything else, it's time for a text file with vi.

When would I use gdb? A backtrace might be handy to know, and sometimes I will step through something to look at the registers--"layout regs" might help with that. gdb is probably useful if one is dealing with black box libraries from third parties, but I encounter those almost never, and the scripts I do write tend to be simple unix programs, not run forever state mutating monsters. The state mutating monsters tend to be curses applications (rogue) that are difficult to attach a debugger to, so invariably suitable logs get written to stderr and reviewed. Or a minimal test case of the relevant code is extracted.

Backtrace As Considered Harmful

The debugger backtrace is sometimes of use, sometimes not. Something could blow up down call tree B on account of code run under tree A. Example. bitmap(1) of X11 parsed the the -size flag incorrectly. Elsewhere in the code the incorrect value got used. On OpenBSD the process is killed; on other operating systems, who knows, maybe BadAlloc. Sometimes. The backtrace provided no help; understanding the source code did. Now, sometimes a backtrace will be useful, perhaps an orientation to where the code burst into flames at the bottom of the ravine, but I generally disfavor backtrace by default. Here's why.

Backtrace can be worse than useless, for example when the disk fills up because each and every log message comes with a 200 line curriculum vitae, or when the logging velocities tickle Linux kernel bugs but only when the Java process is launched from cron(8). Yeah. Could not upgrade, would have had to re-verify the system, and that would have taken too long. Experience plays a role here. I now assume by default the programmers will make a run on the disk, especially where debug levels of backtrace meet production levels of traffic, and would look for it, early. One might ponder why programmers are logging so much instead of, I don't know, using a debugger. ("Because they haven't accepted the debugger as their lord and savior!!" says the debugger fan here.)

Now with a system of certain complexity I could see a backtrace being of good use. Sometimes. Full backtraces everywhere all the time is malicious; do I really need to wade through 200 lines per transaction to hopefully not miss that a socket failed to connect? Please think about these things. Put another way: if a file is missing, how relevant is your call tree to that?

    USP PAIRC) DO (LET ((X #) (Y #)) (IF (STRING-EQUAL X "--rules") (LOA
    D-RULES Y) (SETF # Y))) (DECF PAIRC 2)) (WHEN ARGV (SETF *AXIOM* (CH
    ECK-TERM (POP ARGV)))) (LIMIT-SPAM (FORMAT NIL "1  ~a" *AXIOM*)) (LO
    OP FOR X FROM 2 TO 11 DO (SETF *AXIOM* (L-SYSTEM *AXIOM*)) (LIMIT-SP
    AM (FORMAT NIL "~%~2@<~D~> ~a" X *AXIOM*))) (FRESH-LINE)) :CURRENT-I
    NDEX 13)
    13: (SB-C::%DO-FORMS-FROM-INFO #<FUNCTION (LAMBDA (SB-KERNEL:FORM &K
    EY :CURRENT-INDEX &ALLOW-OTHER-KEYS) :IN SB-INT:LOAD-AS-SOURCE) {100
    1835CDB}> #<SB-C::SOURCE-INFO {1001835CA3}> SB-C::INPUT-ERROR-IN-LOA
    D)
    14: (SB-INT:LOAD-AS-SOURCE #<SB-SYS:FD-STREAM for "file /opt/system/
    bin/amd64/run/lsystem" {1001830B93}> :VERBOSE NIL :PRINT NIL :CONTEX
    T "loading")
    15: ((LABELS SB-FASL::LOAD-STREAM-1 :IN LOAD) #<SB-SYS:FD-STREAM for
     "file /opt/system/bin/amd64/run/lsystem" {1001830B93}> NIL)
    16: (SB-FASL::CALL-WITH-LOAD-BINDINGS #<FUNCTION (LABELS SB-FASL::LO
    AD-STREAM-1 :IN LOAD) {58BF80B}> #<SB-SYS:FD-STREAM for "file /opt/s
    ystem/bin/amd64/run/lsystem" {1001830B93}> NIL #<SB-SYS:FD-STREAM fo
    r "file /opt/system/bin/amd64/run/lsystem" {1001830B93}>)
    17: (LOAD #<SB-SYS:FD-STREAM for "file /opt/system/bin/amd64/run/lsy
    stem" {1001830B93}> :VERBOSE NIL :PRINT NIL :IF-DOES-NOT-EXIST T :EX
    TERNAL-FORMAT :DEFAULT)
    18: ((FLET SB-IMPL::LOAD-SCRIPT :IN SB-IMPL::PROCESS-SCRIPT) #<SB-SY
    S:FD-STREAM for "file /opt/system/bin/amd64/run/lsystem" {1001830B93
    }>)
    19: ((FLET SB-UNIX::BODY :IN SB-IMPL::PROCESS-SCRIPT))
    20: ((FLET "WITHOUT-INTERRUPTS-BODY-11" :IN SB-IMPL::PROCESS-SCRIPT)
    )
    21: (SB-IMPL::PROCESS-SCRIPT "/opt/system/bin/amd64/run/lsystem")
    22: (SB-IMPL::TOPLEVEL-INIT)
    23: ((FLET SB-UNIX::BODY :IN SB-IMPL::START-LISP))
    24: ((FLET "WITHOUT-INTERRUPTS-BODY-3" :IN SB-IMPL::START-LISP))
    25: (SB-IMPL::START-LISP)

You would need to scroll up to find the error. I've seen worse. And now the same error, minus the stack trace spam:

    $ lsystem --rules nope
    lsystem: load-rules "nope" failed: #<FILE-DOES-NOT-EXIST {1001FCDDB3

See how much more readable and relevant that is?

There are trade-offs here; a programmer might want a backtrace as maybe it helps find some rare bug lost by laconic logging. Or maybe the code is so stupidly complicated that things are quite hopeless without a backtrace. With skill, one might learn to more quickly process the backtrace--or to write a tool to collapse or throw away "enough" of all that noise so that the good bits are more in the fore--but being two orders of magnitude worse than a sensible error message, 4841 versus 70 characters, isn't a good place to start. And do you really need to be firing backtraces every time, all the time?

Go Panic

Less verbose than SBCL. Still bad.

    panic: invalid argument to Int63n

    goroutine 1 [running]:
    math/rand.(*Rand).Int63n(0xc0000741b0?, 0xa?)
            /usr/local/go/src/math/rand/rand.go:121 +0xe8
    math/rand.Int63n(...)
            /usr/local/go/src/math/rand/rand.go:348
    main.irand(...)
            .../orbit.go:64
    main.make_orbit()
            .../orbit.go:98 +0x1e8
    main.with_gif({0x4c7413?, 0xc0000061a0?})
            .../orbit.go:135 +0x217
    main.main()
            .../orbit.go:112 +0xd6

Note how the panic has helpfully omitted what, exactly, the invalid argument was. Debugger time? Maybe, or one could add a print statement right before the problematic line,

    fmt.Fprintf(os.Stderr, "dbg %v -> ", size)
    orb.iters = irand( int( size / 1000 ) )

which on the next run showed that the size was (sometimes) small enough to become zero, therefore we need a irand( 42 + int( ... to avoid that. Others might reach for a single-step debugger here, or I could type PUFF and then fill in some printf flags. Quick, easy, mostly gets the job done:

    $ grep PUFF ~/.exrc.go
    ab PUFF fmt.Fprintf(os.Stderr, "dbg

Still haven't learned the Go debugger. Doubtless would forget how to use it. Why should I learn it, when printf debugging has solved everything I've had problems with, so far?

Interactivity Theorem

In a LISP system one could potentially enter a debugger, correct the filename, and then go on your merry way. Maybe I need to practice that more. But the SBCL REPL is bad; I assume it must be wired up to emacs to be not terrible? No less difficult in this instance is to re-run the command with a corrected filename--escape, k, f, n, C; other shells or emacs mode will vary--something I have lots of experience with. Maybe if the LISP machine debugger was the only choice, for all softwares and at all layers, but that's not the world that happened, not even when LISP machines were a thing (and were being rebooted every few days because of memory leaks).

Imagine if a heavyweight operating system such as Chrome or Windows dropped into a debugger on error. Maybe Bill Gates could have fixed up that Windows 95 USB issue on stage, live. Nope, did not happen. Could not happen, unless Bill Gates knew that USB stack (and the hypothetical all-system debugger) very well, or if there was some programmer around who knew the space. What if it was a hardware problem? No problem, just use the debugger and flash the ROM, (add-hook 'high-voltage-programmer-mode) and away you go. What are the implications of having a pervasive debugger everywhere? How would that balance against, say, corporations fond of Digital Rights Management, or concerns from the security team?

External

https://ntietz.com/blog/how-i-debug-2023/

in valgrind we trust?