Anu: A sound, distributed version control systema

Author: lelf

Score: 165

Comments: 135

Date: 2020-11-05 20:16:25

________________________________________________________________________________

dan-robertson wrote at 2020-11-05 22:38:16:

It seems this was posted very shortly after the webpage became live and that the website is probably not in a fully fleshed out state. Therefore it’s missing some details. Probably the website is mostly targeted at people who already know what pijul is.

Here are some attempts at short descriptions of what I think this aims to achieve:

- pijul 1.0. A stable repo format with performance problems resolved and a good foundation for further work

- darcs except the algorithm is more convincingly correct and merges don’t take exponential time

- A version control system where certain things behave in reasonable ways avoiding potential strange behaviour, eg it doesn’t matter what order you merge things in, you always get the same result.

- A version control system that provides a good user interface to humans, a simple mental model, and asymptotically good performance.

ChrisMarshallNY wrote at 2020-11-05 22:17:15:

Good luck. I mean that.

Git isn't perfect, but I've been using version control since Apple Projector (in the late 1980s), and Git has done the best for me. I've been using it for many years.

I don't miss Projector one tiny bit.

VSS (Visual SourceSafe) was a dog. It was direct file-based, and server connections would get _very_ busy. It was the old-fashioned kind, with the need to check out files.

But it had one very cool feature: You could create "aliases" of repo components; essentially creating a virtual repo that pointed into several other repos, taking just a couple of files from each.

I could see how that would be a technical nightmare to implement, but I like it a lot more than "the whole kit & kaboodle" approach that Git takes.

I also used Perforce for many years. It was a robust and dependable system, but had that need to check out files to work on them, and that drove me nuts.

I like Git, because it is "team-friendly," and has a really light touch. It encourages many small checkins, which is how I think I should usually work.

I wish it handled big files better, but that's not really a big deal to me. I think this might be why Perforce is still preferred for game development (their asset libraries get _big_).

Oh, also Submodules _suck_ like a supermassive, galaxy-core black hole.

skissane wrote at 2020-11-05 22:54:41:

> I've been using version control since Apple Projector (in the late 1980s)

Never heard of Apple Projector before. I've always been interested in the history of version control systems, so I would like to learn more about it. But when I search for the term, almost all I find is stuff about using projectors with Macs/iPhones/iPads/etc. Can anyone point to any information sources on it?

ChrisMarshallNY wrote at 2020-11-05 23:04:04:

Yah...you're right. I'll see if I can scare it up. It was integrated into MPW.

_UPDATE:_ It gets a brief mention in the "Legacy" section of the MPW Wikipedia page:

https://en.wikipedia.org/wiki/Macintosh_Programmer%27s_Works...

It's obscure for a reason. It was the best back then, but was still a nasty bear.

karlding wrote at 2020-11-05 23:38:35:

It's mentioned in the book Programmer's Guide to MPW: Exploring the Macintosh Programmer's Workshop by Mark Andrews [0].

[0]

https://vintageapple.org/macprogramming/pdf/Programmers_Guid...

ChrisMarshallNY wrote at 2020-11-06 00:41:23:

Oh...Gods. Reading that, brings back the horror...

fsflyer wrote at 2020-11-06 04:10:41:

After Metrowerks' CodeWarrior became popular, Projector was still used. There was a CodeWarrior plugin for it:

https://www.mactech.com/1997/12/20/md1-cwprojector-1-0-relea...

In later versions of MPW, Projector was split off as a separate executable, SourceServer. Searching for MPW SourceServer finds a number of hits.

boris wrote at 2020-11-06 11:16:30:

> Oh, also Submodules suck like a supermassive, galaxy-core black hole.

They suck like a chainsaw would sucks for cutting twigs. But they are very effective (together with symlinks) when you need to stitch an amalgamation of multiple repositories (which themselves could be stitched from multiple repositories). That's not to say you don't loose a finger or two now and again.

user-the-name wrote at 2020-11-06 12:42:06:

No, no, no. They just suck. It's not that the concept is bad - it is in fact very good, for the exact reason you suggest.

They are just abysmally badly implemented. They barely work at all and frequently get your repo into nonsensical states for NO REASON other than that the tooling is utterly worthless.

Mercurial has subrepos that offer conceptually the exact same functionality, but is actually implemented in a way that works WITH you rather than against you, and that will not constantly break.

Git submodules are just a completely unfinished feature that is nearly unusable.

ChrisMarshallNY wrote at 2020-11-06 11:36:48:

True, dat. I used them for this project:

https://riftvalleysoftware.com/work/open-source-projects/#ba...

It's a long chain of submodules, and making tweaks to the lowest layer means a _lot_ of pulls. I was able to semi-automate it with a couple of batch files.

I didn't use Composer, because I figured that this was a project that would remain fairly static, and submodules are "native" (I am always a bit leery about depending on third-party package managers).

Which has been the case, except that I'm writing an app that uses it as a backend, so I've had to make a few changes lately.

FpUser wrote at 2020-11-05 23:11:35:

VSS - I was using it as well but I bought some product (forgot the name) that was a server facing the Internet and handling all the communication to VSS on behalf of multiple clients.

Perforce - I used it too but after their upgrade killed the repo I got rid of it. I understand that it could've been my fault but still ...

After that it was Git

bobuk wrote at 2020-11-05 20:53:23:

For lazy people: Anu is Pijul (modern distributed version control system) rewriten from scratch, also with rust.

It's on very early stage and now interesting mainly for academic/research.

rudedogg wrote at 2020-11-05 21:01:09:

Is Anu GPL2 like Pijul? I can't find a license in

https://nest.anu.dev/anu/anu

Edit: crates.io says gpl2

nerdponx wrote at 2020-11-05 21:08:53:

So can I read Pijul repos with Anu and vice versa?

bobuk wrote at 2020-11-05 21:28:37:

It's rewriten from scratch and authors didn't answer about compatibility so I checked it right now.

No, it doesnt' work with pijul repo at least in my scenarios

glandium wrote at 2020-11-05 22:12:21:

So... pijul is dead?

dan-robertson wrote at 2020-11-05 22:26:51:

Pijul has been dead waiting for pijul 1.0 for about a year. People were already expecting the file format to change with pijul 1.0, however I think the name change was unexpected.

bobuk wrote at 2020-11-05 22:51:15:

It's like a zombie now, whole body is working but no brain activity at least for last 6 months.

keeganpoppen wrote at 2020-11-06 09:59:32:

this project would be what the zombies were working on, from what i can tell...

consultutah wrote at 2020-11-05 21:06:11:

I’m excited to see there is still work being done on new VCS. Git will be hard to beat, but it looks like Anu is hitting on some of its weak points.

alquemist wrote at 2020-11-05 21:26:23:

What are those weak points and how does Anu fix them? I found a technical example in

https://anu.dev/documentation/associativity.html

, but it was unconvincing. In their example, three way merge should produce a conflict for a human to resolve it. Which is good, we don't want a 'smart' tool to do the wrong thing and silently introduce bugs. If the argument is that pijul/anu reduce the number of conflicts by exploiting history, is there a quantization of this benefit for practical workloads, e.g. popular git repos?

andolanra wrote at 2020-11-05 22:15:10:

That example, while technically correct, is a little bit misleading. From a practical point of view, the thing that Pijul/Anu both do is not "automatically resolve conflicts" but rather "allow repo operations to happen even in the presence of conflicts". In Git, if you've got a conflict, Git will require you to fix it before doing anything else. In Pijul or Anu, you can continue applying changes—possibly creating more conflicts!—in a way that's guaranteed to never throw away changes. At the end of that, a human still needs to resolve those merges manually.

But there are scenarios in which this avoids tedious human merges. Consider that I'm applying a series of patches which make changes in a file and later on walk those changes back, and run into a merge conflict there. In Git, I could squash those changes to avoid dealing with the conflict, but then I've lost history. I could apply the changes, skipping the relevant patches, but if those patches still contained useful work elsewhere, then I'd have to go in and resolve those problems manually.

In contrast, this same scenario in Pijul and Anu would just trivially work in a way that didn't produce conflicts. I would apply the sequence of patches, and one patch would produce a conflict… but because they can keep doing work in the presence of conflicts, then they could keep applying subsequent patches and apply the patches which walk back the changes, and in that resolve the conflict automatically, but unlike the Git approach where I flattened the changes first, I would still have the full commit history associated with that sequence.

Now, that doesn't mean that Pijul or Anu will automatically fix all merges. If you have two separate code edits to reconcile, you might still need a human in the loop to reconcile them. But the fact that they can keep making changes in the presence of conflicts allows them to avoid a certain kind of "busywork" that comes with managing git history.

apostacy wrote at 2020-11-06 00:16:25:

> That example, while technically correct, is a little bit misleading. From a practical point of view, the thing that Pijul/Anu both do is not "automatically resolve conflicts" but rather "allow repo operations to happen even in the presence of conflicts". In Git, if you've got a conflict, Git will require you to fix it before doing anything else. In Pijul or Anu, you can continue applying changes—possibly creating more conflicts!—in a way that's guaranteed to never throw away changes. At the end of that, a human still needs to resolve those merges manually.

I don't think it is really true that git requires you to fix conflicts before continuing. There are strategies that can let you emulate deferred merging. I rarely let merging hold me back.

If I don't have time to merge into master, just do git push server master:synced/master.

ricardobeat wrote at 2020-11-06 11:05:40:

Git requires you to fix conflicts to continue in that branch, as commits will not have been applied until you resolve them. You can work around that, but the longer you postpone the harder it becomes to resolve them, and the result will depend on the order you decide to merge / rebase everything.

merijnv wrote at 2020-11-06 13:12:59:

Lemme give you one example of one workflow I've encountered a bunch of times that's super annoying with git.

I port/test a project to a new OS. I run into a bunch of issues, like linkers, environments, etc. that are broken that I need to fix. I try and make clean commits tackling one issue at a time, so we get 1 commit for the linker issues, 1 for the docs, 1 for the environment, etc. eventually I have 5 commits of fixes.

These fixes are all orthogonal, so ideally I want to make separate PRs and separate reviews for them. But locally they're all tied together (in chronological order), since I need all of them there to continue development.

In git I can either open 1 big PR including all fixes at once (annoying) or I can make a PR for one commit, wait until it's merged, then PR the next, etc. The only way to get them nicely separate is if I take all those 5 commits, rebase each of them onto master in its own branch and PR those 5 branches. But that ruins my ability to work locally.

The associativity of (non-conflicting) patches in a patch based VCS like Anu/Pijul means that this "ordering" of patches doesn't exist. I have 5 patches you don't, I can PR each of them independently without needing to manipulate history or anything, because fundamentally the patches aren't related and therefore there's no reason for an ordering like the git DAG forces upon you.

Of course this is a fairly niche (but hopefully concrete enough) example of how this enforced ordering of commits can actively harm workflows/collaboration.

rvdginste wrote at 2020-11-07 05:33:58:

Personally, with git, I would work on one "work" branch with the 5 orthogonal commits in succession. For each commit that is finished, I would create a new branch and cherry-pick that specific commit on it. I would create a PR for those branches.

Once the PR (possibly with some changes still) is approved and merged, I would rebase my "work" branch on top of the updated master. Since it was an orthogonal commit, you'll typically see that the merged work just disappears from your branch during the rebase. If it really was orthogonal work, you'll not have any conflicts.

This is an approach I've used whenever I need to work on something that depends on other work that was not merged yet. You have a work branch that includes all the work, since you depend on it, but you want to offer smaller pieces as PR to make the review process more efficient.

gnufx wrote at 2020-11-06 22:10:42:

Yes, except that I don't think that's niche. With darcs, I have a collection of patches available from developments/contributions, and a release is a set I'm happy with on top of the previous release tag. If one of them causes trouble during testing, it can typically just be removed from the set. (Obviously it doesn't always work that cleanly, but it typically does.) When happy, tag and release. It's a pleasant model that's straightforward enough for someone who's only been maintaining large projects since the early days of networked CVS.

alquemist wrote at 2020-11-06 15:18:22:

Thanks for the example. History manipulation is a nono. While I'm not a git/diff3 expert, could one say 'make patch K from commit triple K0,K1,K2', meaning 'all changes between commits K1 and K2, with respect to origin K0', then make independent PRs out of them? Maybe will be only 95% correct, but that's what tests are for.

pmeunier wrote at 2020-11-06 07:05:09:

No, 3-way merge does _not_ produce a conflict in that case, nor shouldn't it. Alice adds lines at the very beginning of the file, Bob adds line at the end, and 3-way-merge merges Bob's lines into Alice's new line.

That is incorrect and is not a conflict.

alquemist wrote at 2020-11-06 15:03:02:

You are correct, thanks for the clarification. From

https://www.gnu.org/software/diffutils/manual/diffutils.html...

> You can think of this as subtracting older from yours and adding the result to mine, or as merging into mine the changes that would turn older into yours.

The point stands: taking history in account is unconvincing, as it assumes intent where none is explicitly given. In the example, if the top branch had a single commit, AB => ABGAB, then there is no way to reliably infer that A 'intended' [+ABG]AB, vs. AB[+GAB]. Even with the given history, maybe the author of the top branch really intended AB => AB[+GAB], but made an error, corrected in the 2nd commit. The problem is fundamentally ambiguous. We can argue which of the diff3 or anu/pijul heuristics are better. In practice I suspect the difference is not that large.

For completeness, here's a guess of what diff3 does, assuming merging top into bottom, following

https://blog.jcoglan.com/2017/05/08/merging-with-diff3

      [bottom] => A [+X] B
    [top] => AB [+GAB]

    B    O    T

    A    A    A

    +X 

    B    B    B

              +G
              +A  
              +B

Resulting in A[+X]B[+GAB] without conflicts.

pmeunier wrote at 2020-11-06 07:05:56:

> Which is good, we don't want a 'smart' tool to do the wrong thing and silently introduce bugs

Then that example you link means that you should stop using 3-way merge / git / svn / mercurial.

alquemist wrote at 2020-11-06 15:31:13:

You are correct, there are no diff3 conflicts, thanks again for the clarification. The problem is fundamentally ambiguous, and the argument a bit more subtle. As a rule of thumb, when two heuristics are available, pick the simplest one (e.g. the one requiring fewer inputs), especially if a human needs to wrap their head around the rare cases where the heuristic goes wrong.

IshKebab wrote at 2020-11-05 22:14:58:

I agree. Also in their example they lose a nice property of git - invariance to squashing. If Alice squashes her two changes then the final merge behaves differently to if she hadn't. Very confusing!

Git definitely could be much smart about merge conflicts, but it's a hard research problem so I'm not surprised it isn't.

5d749d7da7d5 wrote at 2020-11-06 00:13:33:

> Git will be hard to beat

If I were the Anu people, I would focus on having a seamless compatibility layer that could manage Git <-> Anu repositories (there are undoubtedly many headaches that would occur synchronizing the two different models). This would allow developers to silently interact with ongoing git repos using the "better" tool. Getting wholesale migration to a new platform seems a significant challenge, but allowing developers to slowly build mind share with an improved workflow would be possible.

Disclaimer: I hate git.

pmeunier wrote at 2020-11-06 07:07:25:

`cargo install anu --version 1.0.0-alpha --features git` gives you a one-way incremental import, with the command `anu git`.

jhvkjhk wrote at 2020-11-06 00:36:14:

Compatible with old projects is the main reason why modern software suck, look at zsh, c++. Some times we need to shift to a brand new paradigm.

HelloNurse wrote at 2020-11-06 11:30:51:

Git (and less popular competitors that are close in age and design principles, like Mercurial, Darcs and Fossil) is what everyone is using and therefore what needs compatibility, not an "old project" holding back progress.

That role is filled by Subversion, CVS, VSS etc. with their tragic anti-features.

gnufx wrote at 2020-11-06 10:20:20:

For what it's worth, darcs can consume and produce git import/export streams. I've synced a darcs repo with git and mercurial mirrors with cron (to avoid slowing down commits by doing it in a hook).

minerjoe wrote at 2020-11-05 21:24:24:

> Git will be hard to beat.

With the latest kurfuffle at Github, I've started moving to fossil. Having everything, wiki, pull requests, etc. as part of the repo is looking like a good move.

Why let yet another corporation have control over something they should have never been given?

https://fossil-scm.org

andrewzah wrote at 2020-11-05 21:33:22:

github != git ... I'm not sure why people strongly conflate these two so much.

It would be equally as valid to self-host gitea/gogs/sourcehut/gitlab and/or an issue tracker of your choice, which arguably is preferable to adopting a completely different tool over what is a provider issue.

dan-robertson wrote at 2020-11-05 22:48:41:

I think all git has going for it is its existing inertia and GitHub. I think the foundations were a bigger deal when git was newer. Other DVCSes have decent foundations.

Going against git is an atrocious user interface (if it were good then [1] would be neither funny nor sad). Most people just memorise a few commands and if they stop working they transfer their changes elsewhere, delete the repo, and start again. Sometimes a team will have a “git expert” who has merely memorised a few more commands and is better able to get a repo out of a broken state. Git fails badly at an important for a developer tool: largely getting out of the way.

[1]

https://git-man-page-generator.lokaltog.net/

dreamcompiler wrote at 2020-11-06 05:31:38:

The light finally came on for me when I bit the bullet and dug into git's internals. It's a beautifully simple model. Git's internals are much easier to learn than its dumpster fire UI, and once you understand the internals, the UI can be understood as something that started out reflecting the internals but then had a boatload of edge case convenience features glued on without an overall design plan.

merijnv wrote at 2020-11-06 13:14:58:

Can you enlighten me how git's internal model is more beautiful than, say, Mercurial? Which has the added bonus of having a UI which isn't a dumpster fire...

dreamcompiler wrote at 2020-11-06 17:00:38:

I haven't studied Hg's internals. I much prefer Hg to git (because its UI makes sense) but I no longer have the happy option of using it. So rather than continue to fight a battle that was already lost, I decided to embrace git.

reificator wrote at 2020-11-05 22:56:24:

You'll never hear me say that git's interface is good, but this seems to be blowing things way out of proportion. I haven't seen someone blow up and recreate a git repo in maybe a decade.

I've definitely pulled out the BFG here and there to clean up credentials but that's an issue in any VCS.

Maybe I'm biased because I'm "better able to get a repo out of a broken state", but for the record it's definitely not because I've "memorised a few more commands".

theon144 wrote at 2020-11-05 23:25:52:

Your attitude seems to mirror this xkcd

https://xkcd.com/1597/

and I honestly never understood it.

I'm by no means a git expert (I've actually just recently learned about bisect for instance), but I have _never_ in my entire career been in a state where I'd just delete the repo and recreate it from scratch.

I've only ever used a handful of commands, the most advanced of which could be probably considered `reflog` when I wanted to revert some changes; or `rebase` (because strictly speaking, it is more complex than merge I guess), but I never ran a command I did not understand or had to memorize.

I actually do share the sentiment about the tool getting out of your way, and my knee-jerk reaction to learning about git internals is just repulsion, because you're right! I'm not there to tinker around with version control, I'm there to solve problems. That said, I've never felt like Git got in my way.

andrewzah wrote at 2020-11-06 21:40:40:

I find the vast majority of complaints come from people who refuse to put any time into learning a mildly complex tool, and thus I have no sympathy for them. The documentation is right there in your terminal, in addition to many websites, videos, and books.

The one and only time I messed up a repo beyond repair was when I deleted some git pack files while trying to delete some binary files from the git history. This is known as user error.

In my day to day use I find that I rarely have to venture beyond rebase, bisect, reflog, cherry-pick, and the standard commands.

sterlind wrote at 2020-11-06 04:48:33:

I nearly did once, because I couldn't stage a file. Git's content-hash store had gotten corrupted, so the object for the staged blob had a bunch of zeros in it and didn't match its sha. That was more the filesystem's fault than Git's, though.

reificator wrote at 2020-11-05 21:43:13:

While that's a common conflation I don't think the GP was doing that. While I tend to self-host git, I can see the value they're claiming fossil has.

Whether self-hosted git or hosting on Github, your issue trackers and such are typically separate from your main repository. Most platforms offer wikis as a side-by-side repository so that should be easy to move, but the rest is at the whims of the platform.

The GP is claiming they moved to fossil because the one repository contains all of this data.

wtracy wrote at 2020-11-05 21:46:52:

I think minerjoe is trying to emphasize that fossil has all the features of GitHub included in the VCS itself, eliminating the need for any of the tools you listed above.

I haven't followed Fossil, so hearing that it includes things like a wiki is news to me.

badsectoracula wrote at 2020-11-06 04:10:56:

minerjoe is mentioning Fossil specifically because unlike Git it also provides the features (in a broad sense) that GitHub provides (wiki, bug tracker, discussion forum, news/blog, "released" files and of course version control) while remaining fully distributed - these are stored and versioned as part of the repository itself.

As a nice bonus Fossil is a single executable/binary file you can drop anywhere and can act as both the CLI for working with the repository and as the web backend with a bunch of ways to access it including CGI, it's own web server or even as a fake script parser (you can upload the linux binary to any shared host that supports custom script parsers -many do- and use a "script" with a shebang that calls the binary with the path to the repository file, thus allowing you to use Fossil with shared hosting services that do not even know about it).

swiley wrote at 2020-11-06 13:54:13:

>Why let yet another corporation have control over something they should have never been given?

I'm pretty sure github doesn't control git.

minerjoe wrote at 2020-11-06 18:11:28:

That's not what I was implying. They have control over the pull-requests, the wiki, and all the other meta information.

ComputerGuru wrote at 2020-11-05 21:15:15:

The lack of a clear “why this rewrite was needed” somewhere accessible is a pretty big “f u” to anyone that evangelized for Pijul in the past.

gwenzek wrote at 2020-11-05 21:30:16:

My uderstanding is that they wanted to change the algorithm and that the codebase needed a major refactoring. I think there was performance issues with the first implementation and design so they had to make very large change.

https://discourse.pijul.org/t/is-this-project-still-active-y...

pmeunier wrote at 2020-11-06 07:10:37:

More on that very soon on the website. The reason is performance and scalability.

Pijul was always advertised as experimental, and Anu is the result of that experiment.

yjftsjthsd-h wrote at 2020-11-06 14:24:06:

> Pijul was always advertised as experimental, and Anu is the result of that experiment.

So it should actually be "safe" to buy in to now? If I put my project into Anu, I shouldn't get stranded in 5 years?

ComputerGuru wrote at 2020-11-06 15:20:33:

Experimental doesn’t mean leaving everyone in the dark to go develop a complete replacement in secret with nary an update or heads-up.

pmeunier wrote at 2020-11-06 17:20:33:

On the other hand, libpijul received 0 external contributions over the three years during which it was public.

The biggest contribution I got was from someone who rediscovered it independently after asking me about it, and didn't even care to respect the license.

Moreover, the formats of an experimental tool, especially when it is based on new math, need to change frequently. Every single time we've done it in the past, we heard weird comments here, on Reddit and Twitter that our theory would never work because the implementation was not there yet.

There were also unfortunate professional choices I don't want to comment on, which forbade me to work on Pijul other than sometimes on the weekends. I ended up resigning in July 2020, and have worked on the new Pijul more or less 100% since then.

ComputerGuru wrote at 2020-11-06 20:47:11:

Best of luck, from a fellow open source developer and maintainer.

Koshkin wrote at 2020-11-05 21:38:17:

> _why this rewrite was needed_

We all know _why_

hobofan wrote at 2020-11-05 21:52:24:

No we don't. Care to enlighten us?

FpUser wrote at 2020-11-05 23:20:43:

My guess would be Rust. That is assuming the original reason was not not in fact making a "better Git".

steveklabnik wrote at 2020-11-05 23:35:50:

The old code base was also in Rust.

FpUser wrote at 2020-11-06 01:55:28:

I said "assuming". In no way I was pretending to be correct. Just a wild guess.

steveklabnik wrote at 2020-11-06 13:10:08:

Totally! Just letting you know that’s not it.

bausano_michael wrote at 2020-11-05 23:14:35:

The new codebase is apparently in Rust. Perhaps the parent is being sarcastic with respect to the quantity of results here:

https://hn.algolia.com/?q=rewrite+in+Rust

eikenberry wrote at 2020-11-05 23:28:46:

The old codebase was also in Rust. So I don't think this applies here.

Ygg2 wrote at 2020-11-05 21:49:43:

Why though?

wtracy wrote at 2020-11-05 21:41:09:

Soaking as someone who is completely unfamiliar with Pijul, the explanation on this page is pretty lackluster.

"It is based on changes rather than snapshots"

Well, every VCS I'm familiar with is based on changes/deltas. I assume that these terms have specific meanings here that I'm not familiar with, but it manages to sound like the author has never heard of git or Mercurial.

zanecodes wrote at 2020-11-05 21:56:11:

Git actually does not work with deltas; each commit contains the hash of a tree object [0], which contains the hashes of the files within it [1].

This tree is effectively a snapshot, since it contains every file hash in the working directory at a certain point in time.

Since Git uses the file hash instead of just the file path, it doesn't have to download the files whose hashes haven't changed since the last commit, which is what makes it behave somewhat as if it is operating on a delta.

[0]

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects#_gi...

[1]

https://git-scm.com/book/en/v2/Git-Internals-Git-Objects#_tr...

tomjakubowski wrote at 2020-11-05 21:51:15:

git fundamentally tracks

and stores snapshots, not changes. I believe mentioning this is meant to emphasize a difference between Anu (or Pijul) and git.

ben509 wrote at 2020-11-05 22:42:56:

This is correct. The common misunderstanding is due to the fact that gits pack files are delta compressed, but that's a implementation detail.

feanaro wrote at 2020-11-06 10:07:58:

It's important to keep in mind that while the Pijul/Anu model brings many improvements over the git model, we still need to pair it with semantic merge algorithms if we hope to avoid nonsensical merges in the context of programming languages. Pijul/Anu alone cannot solve this problem.

ricardobeat wrote at 2020-11-06 11:04:04:

Where can one find examples of that?

gnufx wrote at 2020-11-06 22:21:47:

I can't remember for sure whether Toolpack's revision control component actually operated with ASTs, but I think it did. Certainly there was a semantic diff and patch. (Toolpack was quite an advanced a Fortran77 engineering tool set from the 1980s.)

I don't know how general they could be, but darcs can have different patch types. However, the only extra one implemented, as far as I know, is token replacement.

feanaro wrote at 2020-11-06 12:02:24:

I'm personally aware of

https://www.semanticmerge.com/

, though I'm not affiliated and have used it only briefly.

gmueckl wrote at 2020-11-06 11:24:56:

Plastic SCM is already at least partially aware of programming language semantics when diffing and merging. But that's a commercial offering (acquired by Unity recently).

gnufx wrote at 2020-11-06 09:43:11:

It would be interesting to have a comparison of the bases of this and Darcs3 (which is still in development).

Hooray for work on patch-based systems anyway.

harikb wrote at 2020-11-05 22:26:25:

I was hoping to get a vcs that distributes it files via sound

rhizome31 wrote at 2020-11-05 22:32:53:

And I was expecting version control for sound files.

AceJohnny2 wrote at 2020-11-05 22:51:07:

And I was expecting sound version control for files

ben509 wrote at 2020-11-05 23:04:40:

I'll file this version under sound control.

LkpPo wrote at 2020-11-06 01:56:24:

Your feelings about this version seem out of control...

Zecc wrote at 2020-11-06 09:28:49:

I was expecting people making silly puns around the word "sound".

agbell wrote at 2020-11-05 23:55:00:

When I interviewed Jim Blandy, the creator of Subversion, he said one of his mistakes was trying to be clever with merges. In Git a merge is whatever you say it is and that is actually probably the flexibility that people what.

Is that where the innovation is here, Darcs style patch sets?

pmeunier wrote at 2020-11-06 07:13:46:

Anu is not "trying to be clever" in merges, it implements a mathematical theory. This contrasts with Git and its default 3-way merge, which can sometimes shuffle your file ups, as shown there:

https://anu.dev/documentation/associativity.html

fanf2 wrote at 2020-11-06 11:05:47:

Do you have a link to that interview?

It's surprising to hear that svn tried to be clever with merges, because its merge support was no better than CVS until svn 1.5, which was released some years after git. (This is one of the reasons I went straight from CVS to git.)

gmueckl wrote at 2020-11-06 11:42:54:

SVN started to try to be clever with merges by tracking past merges between copies of files. This required tracking a lot of new information for each file. The developers used a combination of implicitly inferred merge history and annotations stored in SVN properties on each file for that.

Combined with the fact that merge and commit were two separate steps in the workflow with manual conflict resolution in-between, this lead to sometimes severe usability issues. The merge would adjust the mergeinfo properties along with the files. Users would sometimes go and do svn revert on some of those files as part of conflict resolution. This would also silently reset the mergeinfo properties that were just updated. The current merge would still work out OK. But future merges involving the current branch or its descendants would get horribly mangled because SVN ends up applying sets of changes that are out of sync with the actual state of the files involved. That's part of what gave SVN its bad reputation.

ahaferburg wrote at 2020-11-06 13:54:12:

AFAIK the default use case is to only track merge info on the top level directory, as long as you only do branch/merge operations on the top level. By not doing that, yes, you can shoot yourself in the foot. But it's not necessary at all.

Maybe we're talking about different versions, though. I used to use SVN 1.6+ IIRC.

agbell wrote at 2020-11-06 12:06:45:

Yep, here it is:

https://corecursive.com/software-that-doesnt-suck-with-jim-b...

apostacy wrote at 2020-11-06 00:17:00:

I totally agree with that. Git's flexibility is great. In practice, I can synthesize the functionality I want.

Unless a merge is fairly straight forward, I often use --no-commit and only select the files I want to merge, and defer a complete merge for later. It is so easy to make intermediate branches, or just pick what I want from one branch to another.

I haven't used Darcs or Pijul, but I feel like trying to "solve" merging and source code isn't fully possible.

gnufx wrote at 2020-11-06 10:12:51:

> Git's flexibility is great. In practice, I can synthesize the functionality I want.

Not if the functionality you want a patch-based system, which is just so much easier to use. After 30 years or so experience of using revision control systems with distributed projects, I can't see the appeal of git.

some1else wrote at 2020-11-06 13:31:54:

Author explains the state of documentation / mentions name change:

There’s a frightening amount of stuff to write, and it will take me a while to explain everything.
  My current plan is, I’m actually releasing it as I’m writing this answer. There is almost no documentation, but I’ll write it one page at a time in the next few days. I’ll also write a blog post tomorrow to explain what I’ve been doing.
  Oh, and after talking to Florent, we have finally decided to change the name. More on that in my blog post tomorrow.

https://discourse.pijul.org/t/is-this-project-still-active-y...

okokok___ wrote at 2020-11-05 22:28:46:

https://nest.anu.dev/anu/manual

This is returning "Not found" for me.

rkangel wrote at 2020-11-06 15:14:46:

I still have the same question about Pijul/Anu that I had before:

Can somebody give me a real world situation in which Anu would work better than git?

The underlying model sounds like a big improvement, but I still can't map it to the benefit that I as a user would have.

Buttons840 wrote at 2020-11-06 02:45:16:

Why are they starting at version 1.0? A few months of public use might catch some important bugs before you commit to version 1.0.

fabianhjr wrote at 2020-11-06 03:07:07:

This is techincally 1.0.0-alpha and it is a rewrite, by the same authors, of pijul that went from 0.1.0 to 0.12.1 during several years.

The rebranding seems to be for several reasons including:

- Breaking changes from pijul releases

- "Quicker" writing of the command "anu" vs "pijul" (they also mentioned on Birdsite that anu is easier and faster to type on Dvorak)

- Easier to spell out and pronounce vs pijul for non-latin-language-speaking users.

chrisweekly wrote at 2020-11-05 23:32:44:

Mods: "systema" title typo

wocram wrote at 2020-11-05 20:47:03:

Is there demand for this?

Most real complaints about git are around scalability of giant monorepos, and a lot of work has gone into various solutions.

The secondary complaints about usability seem to be papered over by popularity, and of course the relevant xkcd:

https://xkcd.com/1597/

miloignis wrote at 2020-11-05 20:57:44:

The site has

https://anu.dev/documentation/why.html

For me, I'm excited about better, more rigorous merging and being able to cherry-pick & rollback changes without causing conflicts later on. (Cherry-pick in Git makes a new, unrelated commit, so merging with the branch you cherry-picked from can often cause merge conflicts, etc)

In general, tracking and working with actual dependence between patches seems to open up more workflows, and less hacky ones.

yyyk wrote at 2020-11-05 23:54:18:

There's a lot of internet content praising git's 'simple immutable data tree structure' and how simple it is to implement. git's data structure is also the #1 reason behind limitations that many people try to bypass...

e.g. giant monorepos weren't a problem even for ancient centralized VCs because these were file based, and if you wanted to work on a part of the tree it didn't matter - just push and pull the part you care about. But git has a data structure that forces everything in a single tree, so you have to use hacks (submodules etc.).

Same thing for the much of the UX and the other complaints (changeset/patch model). When you get down to it, the data structure is behind 90% of difficulties with git.

alkonaut wrote at 2020-11-05 22:25:23:

I’d take a new git with just the UX fixed. If the underlying implementation of a new VCS is also better, that’s great, but my main problem with git isn’t that it’s not sound but that the UX is a steaming pile of legacy cruft, and it’s really more an API for version control than a polished interactive app for humans.

pmeunier wrote at 2020-11-06 07:16:07:

> the UX is a steaming pile of legacy cruft

This is due to the lack of a solid theory to match the intuition. The "git way" of trying to match each use case is to add a new command for each new use case.

I don't think you can fix "just the UX" without fixing the underlying algorithms.

rkangel wrote at 2020-11-06 14:30:42:

> The "git way" of trying to match each use case is to add a new command for each new use case.

To me, git's problem is the exact opposite - the commands are based on how the underlying technology works, not on the use case for the user.

The best example is 'reset'. I understand git's internals reasonably well so I know the reason why, but it's not obvious that you need the same command to "un-add" a file, or to wind history back two commits.

alkonaut wrote at 2020-11-06 09:35:03:

As an example of the UX issue: just making command line switches consistent (e.g. delete always being the same switch) would be a very easy fix and a UX win, and it's only compatibility holding that back.

ptx wrote at 2020-11-05 23:42:48:

The OpenBSD developers are apparently working on something[1] like this. Another major focus seems to be privilege separation and security.

[1]

https://gameoftrees.org/

sideeffffect wrote at 2020-11-06 11:38:57:

have a look at

https://gitless.com/

A talk introducing the issues with git and how gitless improves upon them

https://www.youtube.com/watch?v=31XZYMjg93o

Cloudef wrote at 2020-11-06 13:58:28:

I think people should learn to use git rebase.

astine wrote at 2020-11-05 21:44:33:

People definitely complain about merging difficulties with Git. The idea with Git is that the data model is simple enough that you can basically manually fix issues that come up. That unfortunately means that you have to take the time to understand Git's data model and not just memorize the interface and a _lot_ of people have complained about that over the years. I think the idea with Anu is that issues don't come up in the first place and that it's hopefully more intuitive to use in the long run.

rudedogg wrote at 2020-11-05 20:57:31:

Take a look at

https://anu.dev/documentation/associativity.html

The patch model is simpler IMO

spockz wrote at 2020-11-05 21:22:56:

Wasn’t the patching fixed already with darcs and mercurial?

cbzehner wrote at 2020-11-05 23:13:03:

My understanding was that darcs fixed the model but had fundamental performance problems at _some scale_. I think pijul took the same concepts and tried to streamline them. And this rewrite does that...again?

gwenzek wrote at 2020-11-05 22:15:09:

I don't think Mercurial is patch based.

Pijul is more similar to Darcs. They claim to have a sounder and faster patch algorithm.

And Anu is apparently even sounder and faster?

https://pijul.org/manual/why_pijul.html#pijul-for-darcs-user...

pmeunier wrote at 2020-11-06 07:19:46:

Sounder, no. The theoretical complexity of Anu is improved compared to Pijul. The complexity of Pijul is in O(log l) where l is the number of lines written since the beginning of history, whereas Anu is in O(e) where e is the number of edits. Since each edit has at least one line, this is always better, and Anu can in fact handle large repositories (Linux kernel, Nixpkgs), that Pijul couldln't handle.

samatman wrote at 2020-11-06 11:21:41:

Did you mean O(log e)?

Aeolun wrote at 2020-11-06 00:15:41:

How can something be sounder. Either it’s sound or it isn’t.

rakoo wrote at 2020-11-05 21:41:18:

My complaints with git are a bit different, having never felt the burden of giant monorepos (but it's definitely related)

git was built as a tool for completely distributed source versioning, but most of us are using it in a centralized way. It's nice to be able to work offline, but when we need to synhcronize there's always a huge dance of fetching first, see if it has moved, merge/rebase, etc... git is good at storing what we did, but it doesn't help at all at saving what we are _doing_: all changes to the working directory are ephemeral, like files stored in ramfs. When working on public repositories, you can't push a branch prefixed with your name; you have to fork the whole project _and_ push a branch before you can start interacting. Instead of having one server and a client, you now have 1 central server, 1 other server that only _you_ can access and will in practice contain 1 branch, and will be abandoned as soon as you're tired of it, and a client. Rights can't be managed at the branch level, so I'm just going to copy-paste the whole thing from the beginning of history and give it to you.

What I would like to see in a VCS:

- There is one central place where people coordinate

- There is exactly one commit associated to a branch, and that association is the same on all machines at the same time (I don't want to git fetch)

- If you want to do changes to a branch, you do a sub-branch

- That sub-branch, along with your local changes in or out of the staging area, is synchronized to the server. If authorized, other clients can have a view of those as well

It seems it already exists with fossil (

https://fossil-scm.org/home/doc/trunk/www/concepts.wiki#work...

) and with older SCMs, although older SCMs are plagued with the locking problem.

In a way the work that is done to handle giant monorepos is helping git move in this direction: all branches are automatically synchronized, and the vision with this kind of repo is that it's ok to commit often, even in small batches. But it's not quite there yet. I've read an account of how things are done in Google (

https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

) and it's closer to my dream system.

dreamcompiler wrote at 2020-11-06 05:47:13:

Sounds like you want what I want: A system where code development has the automatic synchronization and backup of Dropbox, coupled with the programmer-controlled atomic change sets, branching, and merging of git.

rakoo wrote at 2020-11-06 08:20:10:

Exactly. Have the "local changes" in Intellij be synchronized, and let the developer rearrange them for commits.

felipelemos wrote at 2020-11-05 22:36:26:

> When working on public repositories, you can't push a branch prefixed with your name; you have to fork the whole project _and_ push a branch before you can start interacting.

I believe this is more of a issue with GitHub then with git itself.

rakoo wrote at 2020-11-05 23:50:43:

Yes, but it's a model that every github clone has copied since, and it's a problem in the overall process of developing. Even gitlab doesn't have notions of rights per branch. gitolite can do it but sadly its model hasn't taken on the dev world.

Still, it's not enough in my taste because I want all content to be synchronized everywhere

gigatexal wrote at 2020-11-05 21:22:52:

missing docs ugh

gigatexal wrote at 2020-11-05 22:13:05:

i can't get this to build on ubuntu for it not being able to find libssl

thefurman wrote at 2020-11-05 21:50:21:

Taking into account the history of how lines have changed isn't much better, sorry Anu. (Or if you think it is, please give some very compelling real world examples).

I believe that you need to understand the semantics of the code to truly do what you are trying to do well, and for all other cases the snapshot model is more than good enough and given how we structure and modify code, it works out really well in practice. Code dealing with a single aspect should and almost always is co-located, so to get a conflict of intention in a merge is very rare. There are other human aspects like code ownership and collaborating teams which makes the issue even less of a problem.

strogonoff wrote at 2020-11-06 03:53:05:

I don’t think there are any open implementations of data type aware DVCS yet (would be glad to be proved wrong). However, I believe a reliable file/line DVCS based on sound patch theory would be a step in the right direction. A type-aware DVCS _not_ based on sound patch theory would probably be a disaster.

garmaine wrote at 2020-11-05 22:00:11:

I don't know about Anu (haven't looked at it yet), but with Pijul it would be perfectly possible to take advantage of semantic knowledge. Line-based changes is a default, but you could certainly apply file deltas based on a richer understanding of the underlying filetype.

dan-robertson wrote at 2020-11-05 22:23:43:

I’m not convinced by this but I’m also not convinced by the argument of the comment you’re replying to. The theoretical foundation Pijul/Anu works by starting with files as lists of lines (or some other thing) and patches as (injective) mappings from one list of lines to another which preserve the relative order between lines, then constructing the smallest generalisation of this structure to one where all merges exist and are, in some sense, well behaved. This generalisation is from lists of lines to partial orders of lines, where “B is preceded by A” becomes “A<B”.

To do something similar with more structured files, one must find the corresponding idea to “a list of lines”, and this must work in a good way (e.g. changes like x -> (x); [a; b] -> [a] foo [b]; [[p, q], [r, s]] -> [p, q, r, s] must in some sense be natural operations in your structure (and diffs need to be reasonably easy to compute)). And of course it still needs to work in a sane way for unstructured data in big comments. Therefore I don’t agree that Anu would be easily generalised to this.

I think this is basically impossible to do for situations where you want to capture all the structure (such that a patch to rename something merges well with other patches). I think it’s likely extremely hard for a part way solution.

Finally I’m not convinced that the change would be that useful. Much of the structure of computer programs is implicit in the scoping rules in such a way that the “move blocks around” changes that line-based VCSes often struggle with will still be invalid with structural diffs.

garmaine wrote at 2020-11-05 22:49:58:

This is the same underlying theory as the “operational semantics” that is used by Google docs to merge out-of-order changes by simultaneous editors and resolve into a single consistent shared global state. So take that as a proof of principle that it works for more complex structured information.

dan-robertson wrote at 2020-11-05 23:36:00:

The underlying theory is not really the same. The practice is also not the same.

Google doesn’t need a different representation where all push outs exist because they rely on a centralised server, low latency, and arbitrarily choosing how to resolve conflicts. In a DVCS, you can rely on none of these.

garmaine wrote at 2020-11-06 00:23:40:

In not sure if Google still used operational semantics for Docs, but that is not how operational semantics works. The theory allows you to take two quite different stacks of changes and interleave them in a consistent way. It does not rely on low latency or a centralized server. The choice of arbitrary tie breaker vs. manual resolution in the case of conflicts is an application domain choice not mandated by the theory. Obviously in the case of Docs the tie breaker makes more sense.

dan-robertson wrote at 2020-11-06 00:47:16:

I think this is getting off topic as Anu/Pijul is not doing operational transformations (I assume this is what you meant when you wrote operational semantics).

I still claim that the reason OT works well with google docs is that it can rely on a centralised server, low latency and tie breaking.

Tie breaking means one doesn’t need to worry about representations of conflicts (and allowing changes to merge in sound ways) which is in some sense the main thing pijul does.

Low latency means that users are able to cope with the tie breaking rules doing the wrong thing

A centralised server means that there is less need for the merges to work in the sound way that pijul aims to make them work.

Therefore I put it to you that google docs is neither an example of the same theory that pijul is based on not evidence that OT would work for some kind of well-behaved structure-aware DVCS.

thewebcount wrote at 2020-11-05 21:29:16:

I've been very disappointed with the pains of using git. I would really like something like this, but the steps to install it are:

Anu is written in Rust, and can be installed by first installing Rust, and then...

Yeah, I'm not installing an entire language just to use your tool. I don't need to install a C or C++ compiler to run Photoshop or Microsoft Word. Why do I need to install a compiler, libraries, etc. just to try out your tool? No thanks.

astine wrote at 2020-11-05 21:39:19:

Because that's traditionally how open source has been done? For decades? Especially if you're in a Unix/Linux environment? Eventually we started getting package managers and the distro maintainers started creating binaries of most of the packages you'd want to install, but the distro maintainers usually build those packages from source. This is new software and I imagine the distro maintainers will package it up if it starts to gain steam.

Doing it from source also has the advantage that the package maintainers can customize the build process so that it works better with their system. Photoshoto and MS Word are closed source and proprietary, which creates issues if you want to package them.

wtetzner wrote at 2020-11-05 21:34:07:

I mean, it's apparently still in alpha. I imagine there will be installers and it'll be included in package managers when it gets to 1.0.