💾 Archived View for splint.rs › gen › phil_gnu.gmi captured on 2023-05-24 at 17:55:04. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-04-26)
-=-=-=-=-=-=-
-- date: 2023-02-15T00:00:00 tags: [ "Philosophy" ]
--------------------------------------------------------------------------------
I've found a few problems in the FOSS space that Philosophers may be interested in.
AI Dungeon is technically FOSS, given the MIT licence; anyone can download the code and run it as they please. However, the code is useless without the data-set - you can't actually run the program.
You might say:
This code is there to train a data-model, which you can do, and after that you can play with the data-model. You have access to all the same tools as those who wrote the code, except for their big graphics card (not covered under the MIT licence), and their website which lets people feed in more training data (also not covered under the licence, and how could it be?).
But this seems a little disingenuous. Surely "the program", refers to what people actually play online. Is that program FOSS? Can I run it on my computer if I have a big enough GPU?
Just like downloading GIMP will not let you get someone's source-files for an image created by GIMP, this program will not give you what other people have made by it. FOSS licensing is about code, not about being able to demand anyone's work, such as music, or training-data, even if it was created with a FOSS program, and even if you only want that FOSS program so that you can play with the output file they made.
But it's not clear if this argument holds either. You don't need a .png file to use GIMP. The training data for AI Dungeon seems more like a binary blob (albeit a very large binary blob), which is necessary to play the game. If you downloaded a computer game but the AI element had to be trained with thousands of hours of play, then you wouldn't have the full game. Instead, you would have part of a game, and would have to make the rest yourself - the program is incomplete.
The training data is necessarily a binary output, and something which began life as a binary file has never been included under any code licence, and all FOSS-compatible licences are indeed licences about code. Training data is not code.
This seems technically correct, but it also sounds like weasel-words. Perhaps a GPLv4 is is required to get round these problems, or a GPLAv2.
Let's say you take clean, FOSS, code and compile it. The output should be safe to run, by definition, but it is not. The outlier here is the compiler - if the compiler sneaks in a back-door, then the code will not be safe to run.
Now we can solve this by compiling a compiler, but how do we do that? This will need another compiler.
How do we confirm that the code is definitely clean, when we only seem to push the question back. Various modern compilers share common roots - old compilers created each of them, and the farther back we go, the fewer compilers we find. If one ancestor had a backdoor, and included instructions to place a backdoor in other compilers, all compilers would be tainted.
This is hardly your standard epistemological problem, but I think it's a fun problem nevertheless.