Here I'm republishing an old blog post of mine originally from June 2016. The article has been slightly improved.

Version control (pt. 2): Generations and intended use

In the previous post I gave a little introduction to Version control, explaining a few basics that help to understand the topic. This post assumes you know these things that version control systems have in common. So now it's time to discuss what sets them apart - not individually, yet, but in terms of characteristics some of them share with each other.

Version control (pt. 1): An introduction

Generations

Each version control system (VCS) has its unique advantages and disadvantages. However there are some traits which are common to several of them. And since the programs that share those traits were typically released rather close to each other, it makes sense to speak of _generations_ of version control systems.

So far there are three of them with the first obviously being the oldest and the third generation the newest. What may surprise you is the fact that the earlier versions, even though being much more limited, have not disappeared completely. How come? Well, just keep it in mind while we take a look at those generations. Perhaps you can see where tools of older generations may still make sense!

The previous article discussed manual "version control" and its limitations. In short: It can work for you if your project is rather small, you're working on it alone and you've got the discipline to always do proper file backup before you make bigger changes. In a world of more sophisticated software it's quite unlikely that many projects meet all of those requirements. Therefore it totally makes sense to develop programs that would assist you in doing proper version control on your projects.

The first generation

The first generation did exactly that: It preserved any changes you made to a _single file_. Yes, it is one characteristic trait of the first generation that it works on a per-file basis. Version control is entirely separate for each and every file that you choose to record changes for. For each file it manages, the VCS creates a "history file" which contains all the differences from one version to the next plus a comment.

Originally, it was common to use VCS of the first generation on multi-user systems. If multiple users can work on the same project at the same time (via different logins), it's quite possible that conflicts arise. If two persons make changes to the same file, the one who saves last "wins" - overwriting all changes that somebody else may have made in the meantime. To avoid that, locking was invented. Files can be locked while they are being edited. In case somebody decides to edit a file, causes a lock and then does something else, the file remains locked. An administrator can however break a lock if something like this happens.

Today VCS of the first generation are more or less obsolete. There are niches where they managed to survive and are still being used. One example is management of configuration files on *nix systems without centralized configuration management. These configuration files are only relevant to the system they exist on, so that missing networking capabilities (which the second generation introduced) do not mean any disadvantage. And most of the time these configuration files are separate entities not related to other files and for that reason not even the limitation of only managing a single file is a problem here.

The second generation

There are two important aspects that set tools of the second generation apart from those of the first: They offer _network capabilities_ and they can _manage multiple files_ in one project! The later ability solves a whole bunch of problems which made 1st generation tools hard to work with on anything but very small projects. Managing each file separately does not sound too bad at first. But think about it for a minute.

Let's imagine, we work on a simple project. Nothing too fancy: A few C source code files, one header file. Currently the program is broken and we decided to go back to a working version. Good thing that we have version control, right? Right... Sort of. The file _main.c_ is currently at revision 96, _foo.c_ at revision 44, _bar.c_ at 24 and _baz.h_ at revision 7. See the problem? After we found out that revision 89 broke the program and we reverted back to 88, how do we find out which revision number of the other files belongs to revision 88 of _main.c_?

Yes, we have time stamps and we can find out which revisions all of our files had when the program was working when the main file was at revision 88. Maybe it's not even _that_ bad when we only have four files. But what if we have 20? 100? It's cumbersome and really a waste of time. Keep things like this in mind and you'll definitely come to appreciate the ability to manage multiple files together in one project where the revision number increases whatever file was changed and however many files were modified!

Now that larger projects are possible because the whole project is managed together in one repository, it makes sense to use the network as well. This networking capability is achieved by providing one centralized repository which all project members (or even everybody interested in the project) can _checkout_ to create a _local working copy_ of the latest revision (or any older one if needed). Changes can be made locally and after committing them they are checked in back into the centralized repository. Since the tools of the second generation will only allow checking in if nobody else did a check-in in the meantime (if somebody did, you need to update your working copy first and merge your and the remote changes if at least one file was modified by both) locking is also not necessary anymore!

Today tools of the second generation still play an important role. Their attractivity is declining, however. This is due to a few shortcomings which the third generation tries to address.

The third generation

The big innovation that is common to all tools of the third generation is that they work _decentralized_. Users usually don't checkout files from a central (probably remote) repository. Instead they _clone_ the full repository and then checkout the files from their local clone. Since the local repository is exactly the same as the original one, there's no longer one central repository - at least from a technical view. And while cloning requires to transfer a lot more data over the wire (especially for large projects), there are some huge benefits to it.

If you have a local clone, you can work on the project even when you're not online. You can see the complete history and checkout earlier revisions if you need to - all without having to access a central repository. If you're online, you can always sync your local clone with the original repo (_pull_ down changes) or even the original one with your local repository (_push_ up changes) if you have write access.

One of the biggest advantages of decentralized tools is that they make forking much easier. While forking a project has been something not well liked in the past, you'll often see projects asking you to fork and play with their code today (e.g. the well-known "fork me on GitHub"). Experience has shown that quite some people fork a project, add a feature they need and then give their code back to the project (this is done by creating a _pull request_ which invites the administrators of the original project to pull in the changes if they want).

Which one to use?

If you like software history, there's nothing wrong with trying out tools of the older generations. But if you're just starting out with version control and you want to learn something now, it makes sense to choose a tool of the third generation. Which one would I recommend? I can only give the usual answer to such a question: It depends. Each one has its strengths and weaknesses. In the next blog posts we'll take a closer look at some of the open source VCS of all generations. This might help you to choose the right one for your purpose. [For some reason I never got to publish that third post. I have written a detailed overview of multiple VCS but never got around to polish it.]

BACK TO 2016 OVERVIEW