💾 Archived View for warmedal.se › ~bjorn › posts › self-contained-version-controlled-plain-text.gmi captured on 2023-09-08 at 17:11:28. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Self-Contained Version Controlled Plain Text

I read this status about "a file format to store multiple versions of a plain text file in the same file", and it's one of those ideas that just inspires me instantly.

Why? Well... The use-case is pretty limited, but that also means a solution can be simple and tailored exactly to it. As Valenoern puts it:

there are a number of people out there who don't seem to like git and prefer other version control systems, and I like the notion of a simple revision format which doesn't lock you into a specific version control program

I have a few ideas for implementation, depending on your specific goals.

First of all, if the files are meant to be entirely readable by humans without "correct tools" I'd suggest that each version is represented in full and only separated by special tags. Something like this:

---20210308T12:07:31Z
This is the first version of this file.
(Actually the second, but I just made a short addendum)
---20210301T08:51:02Z
This is the first version of this file.

It's arguably the simplest, and my first goto. If you want to use existing libraries to parse it you might benefit from using XML markup of some kind instead. This would allow easier addition of metadata, as well. Either your own format:

<?xml version='1.0' encoding='UTF-8'?>
<file>
  <version>
    <updatetime>20210308T12:07:31Z</updatetime>
    <content>This is the first version of this file.
(Actually the second, but I just made a short addendum)</content>
  </version>
  <version>
    <updatetime>20210301T08:51:02Z</updatetime>
    <content>This is the first version of this file.</content>
  </version>
</file>

Or even just Atom:

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns='http://www.w3.org/2005/Atom'>
  <id>The intended filename here</id>
  <title>Maybe the same as 'id', mostly?</title>
  <updated>20210308T12:07:31Z</updated>
  <author>
    <name>Björn Wärmedal</name>
    <email>bjorn.warmedal@gmail.com</email>
  </author>
  <link href='https://warmedal.se/~bjorn/atom.xml' rel='self'/>
  <link href='https://warmedal.se/~bjorn/' rel='alternate'/>
  <entry>
    <id>2</id>
    <title>This could be a 'commit comment'</title>
    <author>
      <name>Björn Wärmedal</name>
      <email>bjorn.warmedal@gmail.com</email>
    </author>
    <updated>20210308T12:07:31Z</updated>
    <content>This is the first version of this file.
(Actually the second, but I just made a short addendum)</content>
  </entry>
  <entry>
    <id>1</id>
    <title>First version</title>
    <author>
      <name>Someone Else</name>
      <email>someone@else.com</email>
    </author>
    <updated>20210301T08:51:02Z</updated>
    <content>This is the first version of this file.</content>
  </entry>
</feed>

Depending on how much tooling you're willing to tailor and use, and which information you want in meta-data, you may want to save deltas instead of the full content for each revision. This would definitely be preferable to large files with many small changes, but we're talking plain text here. You may save more room and CPU by just compressing the file.

Disclaimer: I haven't run the above atom through a validator. It could be that I've missed some mandatory element, or got the timestamp wrong. Generally validators will want the entry id tags to reference a URL, but having a unique identifier of other kind there is not uncommon.

-- CC0 ew0k, 2021-03-26