Here I'm republishing an old blog post of mine originally from May 2016. The article has been slightly improved.

Version control (pt. 1): An introduction

This post is the first part of a series on version control. It provides an introduction by explaining what that actually is, why you should probably use it and how it works in general.

Important terms

_Version control_ (also: _revision control_) is a means of preserving various versions of a file or of multiple files. This can be done in a lot of different ways but over time some best-practices have emerged that are more or less followed in all modern version control systems.

There are lots of cases where version control makes sense. One of the most common ones is software development where using version control is virtually mandatory. That is why there's even a separate term describing this form of version control: _source (code) control_ or _source code management_.

We can group various version control systems together in two groups: _local_ as well as _network-based_ systems. The latter can be further differentiated between _centralized_ and _distributed_ ones.

In short: Why you may need it

Depending on the choice of the tool there can be various situation where you can benefit from version control:

Or as a former colleague of mine put it in a presentation slide (in a very vivid way):

Why to use version control! (PNG)

Manual version control

The simplest form of version control is working with a backup copy. You make a copy of e.g. a configuration file before making a change to the live file. Afterwards you test your changes and if they seem to work you either delete the backup or keep it for reference. If the changes had undesired effects, the live file is overwritten with the backup (and the latter usually deleted again). This is actually a (rather primitive but sometimes sufficient) form of version control: Thanks to the backup copy you have two versions of the file at your hands!

Backup copy – we've all done it (PNG)

Another variant is to make a copy after a fixed amount of time (or at random). Often people prepend the date to the file name. If all you want to accomplish is that e.g. the data as it was at of the first of each month is preserved for one year, that's also a sufficient method (along with rotating the backup copies so that you don't keep around more of them than you need to).

Those means of manual version control are however pretty limited. And worse: There's plenty of room for making mistakes!

Manual version control (PNG)

How a version control system (VCS) works

Let's think about a simplified versioning process by pretending to do things by hand. For each recorded change of a file you'd make a copy of the file and keep all the old versions of it. A VCS does not forget a single version of a file it monitors! That's what it is meant to do, after all. Each new version of the file gets a _comment_ that is meant to briefly sum up the changes that were made.

Keeping probably dozens or even hundreds of files around because one file has that many revisions, would really clutter your filesystem. Also it would not make sense to keep nearly the same file twice if only one line was changed! That's why local VCS which also version on a per-file base, keep a "history file" around for each versioned file. That file records only the changes between the various versions as well as the comments.

Network-based VCS are able to organize a whole project (multiple files together instead of each one separately). They also record the changes instead of the full files for each revision as well as the comments. All of that data is collected in a so-called _repository_.

If a new team member wants to start working on the project, he or she first needs to get the files of that project. For centralized VCS this is done by checking out the most current revisions of all the project files from the _remote repository_. By doing so, a _local working copy_ of the project files is created to be worked on by the user. When using a _distributed VCS_, the remote repository is _cloned_ instead (thus receiving the full repository with all revisions and not just the most current version of each file). The working copy is then checked out from the local repository clone.

At the beginning of a new project there is no repository, yet. In this case either an empty repository is created, checked out and the new working directory is populated with files. In a next step, those files are placed under version control (which means that the VCS is told to watch them and record changes). Then all changes (all of the files since they are all new right now) are placed into the repository by doing a _commit_.

After every change made to the project you do a commit again, recording the changed state inside the repository. Other project members can now get a current copy from the repository. This way it's easy to work together on the same project without the risk of (unknowingly) get into the way of somebody else.

BACK TO 2016 OVERVIEW