Osiris 0.1.0 Release

After we had some discussion on BBS with the community (mainly the idea belongs to u/ps), I decided to implement a file sharing thingie for gemini, so it would be possible to do it without leaving the protocol or having to host your own capsule.

initial discussion

continuation

I've just recently added titan protocol support to my server gabbro, so it was time to put it to real test.

First, let's think what operation this platform would need to support. We'll need a way to upload and view files (ofc), then also to delete files. Viewing should be for everyone, but deletion is only for the original uploader, I think it makes sense.

And to store this original owner we'll need some sort of metadata file. Also gemini protocol needs files to be sent out by the server like that:

20 mime/type\r\n{content}

We'll need to store the mime type that user(or browser) set when the file was uploaded somewhere too, so we'll stick it into the same metadata file.

I decided to approach this as a simple problem for starters - don't mess with the multidirectory setup, just store all the files in one directory, and all the metadata in another. I named this way of organizing things "protocol v1". It can be useful for a small team of people working on some project or sharing files temporarily and them deleting them. Or even by one person as a gateway to their blog to manage it via uploading/deleting files without the need to SSH to the server and doint it directly.

V1

So file structure looks like this:

root
|-files
| |-file1 # titan uploads files as raw bytes so we lose extension
| |-file2
|-metadata
  |-file1.json
  |-file2.json

And the metadata is just something like:

{
   mime_type: text/gemini,
   owner: ... , # client certificate fingerprint 
   uploaded: 12312314234 # some unix time for potential cleaning
}

This way we'll be able to track the original creator for deletion, upload time for cleaning, and mime time for viewing/downloading the file.

Here's the first problem with it - how to chose the file name on the server side? Titan just gives us raw bytes to work with, so we'll need to come up with some unique file name each time.

I decided to go with a simple random base62 string of length 16. This will give us 62^16 possible combinations which should be enough until the end of time and is definitely an overkill lol.

First the set of CGIs generates this filename while making sure that it's unique and we won't overwrite some file on the disk.

This ID is used to generate 3 links - to upload the file, view and delete it

The whole system is split like that:

1. CGI to generate id (rust)

2. CGI to upload files (rust)

3. CGI to view files (rust)

4. CGI to delete files (rust)

5. CGI to glue it all together and serve as a dynamic page (python)

Why rust? I know it pretty well at this point, it's fun to use and very stable. I am running both my server and osiris without deamonizing them even, they're just running in background and they never crashed. There's always a posibility that i'm using some library wrong and it can crash, but the code that I'm writing doesn't containg unwrap()-s where it shouldn't, panics that can crash the whole thing, etc.

So after doing all that and testing it to make sure that my idea is working as a whole, it was time to do it in a more serious way with separation of directories for each user. I called it "protocol v2" (very original, i know)

V2

File structure changes a bit:

root
|-files
  |-1a2b3b4b5c1a2b3b4b5c
  | |-5225
  | | |-file1
  | | |-file1.json
  |-01234abcd01234abcd
    |-5226
      |-file1
      |-file1.json

This time we generate user directories based on their client certificate using 16 chars. This may potentially lead to some collisions, but it's unkikely, and still won't allow users to delete each other's files because we still have metadata.

The whole point of separating directories like that is to make it easier to implement user-specific functions in the future, and to make it easier on underlying FS.

We won't need to parse all theoretical 10k files to find all of those who belong to some user, somehow maintain that list and then do something with it (like generate a gallery).

Metadatas are now placed in the same directory as a file also to make it simpler. This way directories will be smaller so we can just stuff everything into the same dir.

Attentive readers might also notice that the structure now includes some misterious directories like "5225" and "5226", what could that be?! :D

It's the number of the week since the unix epoch, at which the file was uploaded. Janky, I know, but this way we won't have overinflated user directories, it all will be a collection of smaller dirs that are automatically created by the uploading CGI.

This whole structure is then used as a unique ID for the file like that - "1a2b3b4b5c1a2b3b4b5c5225file1", where "file1" is also randomly generated base62 string. This ID is then used to view and delete the file without revealing the underlying FS structure and having to deal with slashes in the links' parameters.

To make sure that every part of the ID stays of the same lenght and it's always 24 chars in total, I used the crate named "arraystrings" which turned out to be really an amazing discovery. It helped me a lot!

I also wanted to add some statistics to the metadata like viewer count initially, but it's not really in the spirit of gemini to do so. Also it'll be a headache to increment it atomically with some file locks and i'd hate it. Let's keep it simple and reliable.

Using both of them

I decided that I don't want to maintain two versions of those CGIs as I see use for both of them. Also some files have already been uploaded to the server during the testing and I decided to keep them accessible and not clean it as a challenge. It's too easy to deprecate what's being done, and we don't look for easy ways :)

So now when server gets old request to view a file (v1 protocol, 16 chars ID) - it will lead users to the files just OK.

When someone tries to upload a new file - they'll be hadled v2 ID that's 24 chars long.

And I think it's neat

Code

This whole thing can be run by any server that allows to set env vars for CGIs. It relies on a couple of them like QUERY_STRING, TITAN_EXTRA, HOSTNAME and OSIRIS_PROTOCOL.

First two are automatically set by my server implementation "gabbro", it just contains what comes after "?" sign when CGIs are called. And OSIRIS_PROTOCOL is there for ID generator to tell it what version we'd like to use - "v1" or "v2"

v0.1.0 on sourcehut

gabbro server

back