It's still cargo cult computer security

My first question to you, as someone who is, shall we say, “sensitive” to security issues, why are you exposing a network based program to the Internet without an update in the past 14 years?

Granted, measures such as ASLR (Address Space Layout Randomization) and W^X (Write exclusive-or eXecute) can make life more difficult for an attacker, and you might notice w3m crashing as the attackers try to get the stars to line up for their ROP (Return-Oriented Programming) gadget to work as you (or some automation) try to download a malicious page over and over. Or, you could get unlucky and they are now running whatever code they want, or reading all your files.

“Attacks [1]”

I have my own issues with ASLR (I think it's the wrong thing to do—much better would have been to separate the stack into two, a return stack and a parameter (or data) stack, but I suspect we won't ever see such an approach because of the entrenchment of the C ABI (Application Binary Interface)) so I won't get into this.

> > What I would like to see how opening a text editor with the contents of an HTML (HyperText Markup Language) <TEXTAREA> could be attacked. What are the actual attack surfaces? And no, I won't accept “just … bad things, man!” as an answer. What, exactly?
>
Where is your formal verification for the lack of errors?

I did not assert the code was free of error. I was asking for examples of actual attacks.

Otherwise, there is some amount of code executed to make that textarea work, all of which is the “actual attack surface”. If you look at the CVE (Common Vulnerabilities and Exposures) for w3m (nevermind the code w3m uses from SSL (Secure Sockets Layer), curses, iconv, intl, libc, etc.) one may find:
> * Format string vulnerability in the inputAnswer function in file.c in w3m before 0.5.2, when run with the dump or backend option, allows remote attackers to execute arbitrary code via format string specifiers in the Common Name (CN) field of an SSL certificate associated with an https URL (Uniform Resource Locator).
* w3m before 0.3.2.2 does not properly escape HTML tags in the ALT attribute of an IMG tag, which could allow remote attackers to access files or cookies.
* Buffer overflow in w3m 0.2.1 and earlier allows a remote attacker to execute arbitrary code via a long base64 encoded MIME header.

Was that so hard?

The first bug you mention, the “format string vulnerability” seems to be related to this one-line fix (and yes, I did download the source code for this):

@@ -1,4 +1,4 @@
-/* $Id: file.c,v 1.249 2006/12/10 11:06:12 inu Exp $ */
+/* $Id: file.c,v 1.250 2006/12/27 02:15:24 ukai Exp $ */
 #include "fm.h"
 #include <sys/types.h>
 #include "myctype.h"
@@ -8021,7 +8021,7 @@ inputAnswer(char *prompt)
 	ans = inputChar(prompt);
     }
     else {
-	printf(prompt);
+	printf("%s", prompt);
 	fflush(stdout);
 	ans = Strfgets(stdin)->ptr;
     }

It would be easy to dimiss this as a rookie mistake, but I admit, it can be hard to use C safely, which is why I keep asking for examples and in some cases, even a proof-of-concept so others can understand how it works, and how to mitigate them.

But just keep crying pledge() and see how things improve.

The second bug you mentioned seems to be CVE-2002-1335 [2], which is 23 years old by now and none of the links on that page show any details about this bug. I also fail to see how this could lead to an “arbitrary file access” back to the attacker unless there's some additional JavaScript required. The constant banging on the pledge() drum does nothing to show how such an attack works so as to educate programmers on what to look for and how to think about mitigations. When I asked “What are the actual attack surfaces?” I actually meant that. How does this lead to an “arbitrary file access?” It always appears to be “just assume the nukes have been launched” type of rhetoric. It doesn't help educate us “dumb” programmers. Please, tell me, how is this exploitable? Or is that forbidden knowledge not to be given out for fear it will be used by those less intentioned?

This is the crux of my frustration here—all I see is “programs bad, mmmmmmkay?” and magic pixie dust to solve the issues.

I've had to explain to programmers in a well regarded CSE (Computer Science and Engineering) department recently why their code was … sub-optimal. Less polite words could be used. They were running remote, user-supplied strings through a system(3) call, and it took a few emails to convince them that this was kind of bad.

And I can bitch about having to teach opererations how to configure syslog and “no, we can't have a single configuration file for two different, geographical sites and besides, we maintain the configuration files, not you!” so this cuts both ways.

Moreover, it's fairly simple to pledge and unveil a process to remove classes of system calls (such as executing other programs) or remove access to swathes of the filesystem (so an attacker will have a harder time to run off with your SSH keys).
And how, exactly, is adding pledge and unveil onerous? …

Easy huh?

The man page [3] doesn't say anything about limiting calls to open(). It appears that is handled by unveil() [4] which doesn't seem all that easy to me:

… Directories are remembered at the time of a call to unveil(). This means that a directory that is removed and recreated after a call to unveil() will appear to not exist.
unveil() use can be tricky because programs misbehave badly when their files unexpectedly disappear. In many cases it is easier to unveil the directories in which an application makes use of files.

“unveil(2) - OpenBSD manual pages [5]”

To me, I read “in some cases, code may be difficult to debug.”

And while it may be easy for you to add a call to unveil() or pledge(), I assure you that it's not at all easy for the kernel to support such calls. Now, in addition to all the normal Unix checks that need to happen (and in the past, gone wrong on occasion) that a whole slew of new checks need to be added which complicate the kernel. Just as an example, pass “dns” promise to pledge() and the calls to socket(), connect(), sendto() and recvfrom() are disabled until the file /etc/resolv.conf is opened. Then they're enabled, but probably only to allow UDP (User Datagram Protocol) port 53 through. Unless the “inet” promise is given, then socket(), connect(), etc. are allowed. That's … a lot of logic to puzzle through. And as someone who doesn't trust programmers (as you stated), this isn't a problem for you?

As a programmer, it can also make it hard to reason about some scenarios—like, if I use “stdio” promise, but not the “inet” promise, can I open files served up by NFS (Network File System)? I mean, probably, but “probably” isn't “yes” and there are a lot of programming sins commited because “it worked for me.”

I did say that using pledge() helps, but it doesn't solve all attacks. For instance, there's not special promise I can give to pledge() that states “I will not send escape codes to the terminal” even though that's an attack vector, espcially if the terminal in question supports remapping the keyboard! Any special recomendations for that attack? Do I really need to embed \e[13;"rm -rf ~/*"p to drive the point home?

Also (because I do not use OpenBSD) do I still have access to every system call after this?

pledge(
    " stdio rpath wpath cpath  dpath     tmppath inet   mcast"
    " fattr chown flock unix   dns       getpw   sendfd recvfd"
    " tape  tty   proc  exec   prot_exec settime ps     vminfo"
    " id    pf    route wroute audio     video   bpf    unveil"
    "  error");

If not, why not? That's a potential area to look for bugs.

How, exactly, is adding pledge and unveil to w3m “helplessness”, and then iterating on that design as one gains more experience?

As you said yourself: “I do not trust programmers (nor myself) to not write errors, so look to pledge and unveil by default, especially for ‘runs anything, accesses remote content’ browser code.” What am I to make of this, except for “Oh, all I have to do is add pledge() and unveil() to my program, and then it'll be safe to execute!”

In my opinion, banging on the pledge() drum doesn't help educate programmers on potential problems. It doesn't help programmers to write code to be anal when dealing with input. It doesn't help programmers to think about potential exploits. It just punts the problem with magic pixie dust that will solve all the problems.

… It took much less time to add to w3m than writing this post did; most of the time for w3m was spent figuring out how to disable color support, kill off images, and to get the CFLAGS aright. It is almost zero maintenance once done and documented.

What, exactly, is your threat model? Because that's … I don't know what to say. You remove features just because they might be insecure. I guess that's one way to approach security. Another approach might be to cut the network cable.

I only ask as I was hacked once. Bad. Lost two servers (file system wiped clean), almost lost a third. And you know what? Not only did it not change my stance around computer security, there wasn't a XXXXX­XXXXX thing I could do about it either! It was an inside job [6]. Is that part of your threat model?

By the way, /usr/bin/vi -S is used to edit the temporary file. This does a pledge so that vi cannot run random programs.

But what's stopping an attacker from adding commands to your ~/.bashrc file to do all the nasty things it wants do to the next time you start a shell? That's the thing—pledge() by itself won't stop all attacks, but by dismissing the question of “what attack surfaces” can lead one to believe that all that's needed is pledge(). It leads (in my opinion) to a false sense of security.

It is rather easy to find CVE for errors in HTML parsing code, besides the “did not properly escape HTML tags in the ALT attribute” thing w3m was doing that lead to arbitrary file access.
CVE-2021-23346, CVE-2024-52595, CVE-2022-0801, CVE-2021-40444, CVE-2024-45338, CVE-2022-24839, CVE-2022-36033, CVE-2023-33733, …

You might want to be more careful in the future, as one of those CVE's you listed has nothing do to with parsing HTML. I'll leave it as an exercise for you to find which one it is.

I also get the feeling that we don't see eye-to-eye on this issue, which is normal for me. I have some opinions that are not mainstream, are quite nuanced, and thus, aren't easy to get across (ask me about defensive programming sometime).

My point with all this—talk about computer security is all cargo cultish and is not helping with actual computer security. And what is being done is making other things way more difficult than it should be.

[1] gemini://thrig.me/blog/2025/01/04/attacks.gmi

[2] https://www.cvedetails.com/cve/CVE-2002-1335

[3] https://man.openbsd.org/pledge.2

[4] https://man.openbsd.org/unveil.2

[5] https://man.openbsd.org/unveil.2

[6] /boston/2004/09/19.1

Gemini Mention this post

Contact the author