One of these late nights again. I should sleep. We both should sleep. I’m typing on the laptop as fast as possible, trying to finish this post before she realizes that it is long past our bedtime, and I feel bad about that. She also does it, of course, but that thought is not helping.
I just saw a talk by @nasser called “A Personal Computer for Children of All Cultures”, via a boost by @neauoire. It’s about English being the language you need to know if you want to program and how English names and its alphabet influence our thinking, our culture. It’s all over the world because it’s our current lingua franca, but it’s also a requirement if you want to be a programmer. If then else. Begin end. For loop print. Keywords are a problem. But also libraries. They way we build libraries these days is that the function names end up in the binaries. You cannot use the library without using the language of its authors, or using a translation layer. Their names are the real ones. If you translate them, your names are secondary names. From there we go to character encodings, and so on.
I remember deciding that I’d be Alex Schroeder instead of Schröder because I was using email when MIME messages were new. I was using computers at a time when ASCII was right and proper and the upper part, the bytes 128 to 255 would mean whatever, depending on your current code page or locale or font. The battles that had to be fought until people finally admitted that perhaps 7bit systems just had to go. The rage I saw when Emacs added multibyte buffers. ISO-2022 and friends is a way to encode text using escape sequences so that you can shift character encoding as you go, and there’s an option to do it all in 7 bits, too. It was hard, of course. Unicode was so much better. The anger around Han unification. Do you remember? Traditional Chinese, Simplified Chinese, Japanese Kanji, all using the same code points? And what about the 50,000 variants all over history? And that’s just the resistance in tech circles.
Anyway, the excellent example in Ramsey Nasser’s talk is how git manages to have all the canonical names be hashes and all the branches and tags and all that be in whatever language you want. The example he gives is a programming language where the symbols in the binaries just have hashes as names and the human-readable dictionary is a separate artifact that gets created. You could use a different language mapping, and use the same binary in your own programming.
Just last week I asked about the language convention in code comments for this new project I’m in. Everybody in this company speaks German. You cannot get a job in this company if you don’t speak German. Our customer for this new project speaks German. We are in the German speaking part of Switzerland. The official languages of Switzerland are German, French, Italian, and Rumantsch. But of course, you know how it goes. All code comments are to be in English. The customer specifies the label of a field in German. The class implementing gets an English name. The database column gets an English name. The translation layer uses an English key. The English used is often recognisably German. This is the programming culture we adopt when we train to be programmers.
The problem is also a problem of our culture. We have internalised the need for English because it permeates everything. It’s how we look for help online, because everybody else is also using English, of course. We ridicule efforts like translated Visual Basic because programmers run into problems running these problems on their English machines even though the users who wrote these programs for their own use in Excel or Word are happy. We forget that learning English imposes a cost just because we were willing and able to pay it, and we are justifiably proud of our achievement, but it still is a cost. When I see my German speaking friends in tabletop role-playing games struggling with English rule books, I start to remember. People are still paying the cost.
English is everywhere. It’s limitations are everywhere. Nobody thinks about bidirectional text, double width characters, mandatory ligatures, or vertical writing, when they start with ASCII characters. I certainly didn’t.
If you know me, you know I love Emacs. It is the great lisp machine that actually exists. You can also edit texts with it, but I mostly use it to browse stuff online, to listen to music, to chat, to manage files. It’s great. It’s self documenting. It comes with an English Emacs manual. It comes with an English Emacs Lisp manual. It comes with an English Emacs Lisp introduction. The functions and variables have English names. The functions and variables have English documentation strings. The menus are in English. It’s ideal for English reading programmers. I don’t know how to change that. I follow the same conventions in my code.
Und diesen deprimierenden Artikel habe ich ja auch auf Englisch geschrieben, wie auch mein Blog fast komplett auf Englisch geschrieben ist. Und schon gar nicht auf Französisch oder Portugiesisch. So trage ich zur Misere selber bei.
Here’s an interesting thought by @wim_v12e:
It goes further, most programming languages are based on English grammar as well. Even assembly language usually has verb-object syntax. This is one of the reasons that led me to create Haku, my Japanese programming language that I have been posting about recently: what happens if we use a different language with a different writing system and a different grammar?
It does look interesting:
A toy functional programming language based on literary Japanese. – Haku, on Codeberg
Now I’m trying to remember that programming language that looks like classical Chinese.
文言, or wenyan, is an esoteric programming language that closely follows the grammar and tone of classical Chinese literature. – 文言 / wenyan‑lang
The new language’s developer, Lingdong Huang, previously designed an infinite computer-generated Chinese landscape painting. He also helped create the first and so far only AI-generated Chinese opera. He graduated with a degree in computer science and art from Carnegie Mellon University in December. – World's First Classical Chinese Programming Language, by Charles Q. Choi
World's First Classical Chinese Programming Language, by Charles Q. Choi
Then again, I don’t actually want to replace English with a different language. I want a multilingual world.
#English #Languages #Emacs #Programming
(Please contact me if you want to remove your comment.)
⁂
Using internal hashes for names is not a new idea—Cornerstone (a database system written by Infocom—yes, the text adventure game company) did that back in the mid-80s. But the problem with such a language is that you’ll need a specialized editor specifically for the language. Could you even edit a Cornerstone program these days?
– Sean Conner 2021-09-23 00:41 UTC
---
Would it really be an issue? It sounds like you’d need a *compiler* translation layer. For a library concept in the original post you write it in whatever language you want (the source code is stuck in the OG language), and then when it’s compiled it swaps everything for hashes and uses an external translation file. To then translate the library so all languages can use it, you just copy and update the language file.
Obviously not helpful for people working on the original source code, but it’d be a start that doesn’t require ripping up all existing programming infastructure in order to work.
For source code itself as a stop-gap hack there could be something like CVS where the original hash-based source code is stored, and then you “check out” the code which will translate it all for you, and then check it back in which will de-translate it.
---
I like Nasser’s ideas around this so the following isn’t meant as a general rage, more a nitpick around the specific example of git hashes not being a great example. We use branch names all the time. You can have multiple branch names pointing to the same commit hash, yes, but then they are separate branches. Not the same branch.
– Sandra 2021-09-23 06:23 UTC
---
For creating universally translated programming languages, the ideas of Chuck Moore deserve to be studied. Whitespace as the prime delimiter, concatenative style, compositional techniques, exporting complexity to edit-time and relying on the programmer to factor so as to keep the compiler simple, etc.
– jouka 2021-09-23 15:29 UTC