💾 Archived View for gemini.kaction.cc › log › 2022-03-31.1.gmi captured on 2024-08-31 at 11:45:53. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-02-05)

-=-=-=-=-=-=-

Too much freedom for the layman

Unix file name is not a string in sense people usually think about it, it is actually bytestring that can contain any bytes except 0x2F (slash which is directory separator) and 0x00 (C string terminator). You can create file with name that contains every possible valid byte with following simple C program:

#include <stdio.h>

int main()
{
	char buffer[256];
	FILE *fp;

	for (unsigned int i = 1; i != 257; ++i) {
		buffer[i-1] = (i != 0x2F) ? i : 1;
	}
	buffer[0] = 'x'; // for autocompletion tests
	fp = fopen(buffer, "w");
	if (!fp) {
		perror("failed to open file for writing");
		return 1;
	}
	fputs("Hello world", fp);
	fclose(fp);
	return 0;
}

Resulting file will upset many other programs:

This is perfectly in the spirit of Unix. Unix just provides mechanism and never tries to bar user from doing dangerous things because it would also bar user from doing smart things. We all are responsible adults, right?

Used to be, not anymore. People do crazy shit, like including "$" or "`" into filenames or creating zip archives under Windows, and they don't see the problem because it happens to work in their favorite file manager or office package.

This world would have been much better place should file names been restricted to "^[._a-zA-Z][._0-9a-zA-Z]$" from day one, but who, at the dawn of time, may have anticipated that computers will fall into the hands of layman?

Today we see and feel it every day. World of computers belong to the layman, the kind of unix wizards is on the brink of extinction and yet, same mistake is being commited once again.

Unicode in domain names, unicode in source files, unicode fucking everywhere. People thinks that it is neat to have domains in their native language, but what they really should think about is difference between "ё.com", "ë.com" and "ë.com" and confusion, attack vectors and unnecessary work it will create. Yes, I know what is punycode -- solution to a problem that should not existed in first place.

Don't get me wrong, unicode is a definitely improvement over zillions single-byte encodings, but that is it. No unicode for sake of unicode, please.

Yet, I understand why it happens. Being able to do something, even as insignificant as putting fancy symbols in unexpected places, is shiny and impresses people, while problems it creates... just kick the can down the road.

The hand of Hary Seldon feels so heavy.