Here I'm republishing an old blog post of mine originally from October 2016. The article has been slightly improved.

Bacula on FreeBSD (pt. 4): Jobs, volumes, pools & a restore

This is part four of my Bacula tutorial. The first part covered some basics as well as installing Bacula and starting the three daemons that it consists of.

Part two dealt with modifying the default configuration files in a way that allowed all components of Bacula to interact with each other locally. A deliberate configuration error was debugged and finally a test backup was done (without knowing details like what exactly would even be backed up!) just to ensure that communication between the daemons really works.

In part three, the configuration was cleaned up and split into smaller parts, the first self-created resources (fileset, device and storage) were added and a backup job customized using the bconsole.

Bacula on FreeBSD (pt. 1): Introduction - Bacula backup basics

Bacula on FreeBSD (pt. 2): Bconsole - ruling the Director

Bacula on FreeBSD (pt. 3): Customizing configuration

Part four will discuss _jobs_, and show how to do a _restore_. We will change the default settings for the backup job so that it's no longer necessary to modify it using the bconsole. Also _volumes_, _labels_ and _pools_ will discussed.

The fourth part continues where the first one left off. During this tutorial series we use _VirtualBox VMs_ managed by _Vagrant_ to make things as simple as possible. If you don't know how to use them, have a look at two articles I wrote for a detailed introduction into using Vagrant and on how to prepare a VM template with FreeBSD that is used throughout all parts of this Bacula tutorial.

Vagrant: Creating a FreeBSD 11 base box (virtualbox) - pt. 1

Vagrant: Creating a FreeBSD 11 base box (virtualbox) - pt. 2

Jobs

We've already seen and used backup jobs. There are also jobs for different actions like _restore_ (and some others). But what is a _job_? It basically is your way of telling Bacula what to do and how to do it. The _job type_ or _action_ defines what Bacula should do in the first place. Back up data? Restore it? Verify (compare) data? The _client_ determines who to operate on; it answers the question: Which host to back up data from or restore data to?

Then there's the _fileset_ which in case of a backup determines which files should be included and the _pool_ that determines where to store the data. And finally there's the _schedule_ that determines when a job is run. So far we have only started jobs manually using the bconsole - but that's certainly not what you have in mind for your backup strategy (or maybe for simple backups it is. But in that case using Bacula for your backups is most likely overkill and you might want to look for a simpler backup utility)!

When we did our second backup, we changed a lot of settings using the bconsole. Now let's modify the configuration instead so that those will be the new defaults for the backup job. Of course we'll first have Vagrant spin up the VM, SSH into it and so on (you now the score by now):

% cd ~/vagrant/backuphost
% vagrant snapshot restore tut_3_end
% vagrant ssh
% sudo su -

Then we can edit the job defaults (if you haven't read the previous part(s) and wonder why you don't have that file or even the directory - that's because we've split the configuration for better readability!):

# vi /usr/local/etc/bacula/includes/dir_job.conf

The topmost resource should be the _JobDefs_ one that has the name _DefaultJob_. This is a kind of template for all jobs which only overwrite directives that differ from the default one and just use the rest as set here. Change _FileSet_ to _etc_, _Storage_ to _File3_ and _Pool_ to _Default_. Save the file and exit the editor.

Now restart the director and prepare to run the backup job again using the bconsole:

# service bacula-dir restart
# bconsole
* run
Automatically selected Catalog: MyCatalog
Using Catalog "MyCatalog"
A job name must be specified.
The defined Job resources are:
1: Backuphost.local
2: BackupCatalog
3: RestoreFiles
Select Job resource (1-3):

Choose 1:

Run Backup job
JobName: Backuphost.local
Level: Incremental
Client: backuphost.local-fd
FileSet: etc
Pool: Default (From Job resource)
Storage: File3 (From Job resource)
When: 2016-09-23 23:48:01
Priority: 10
OK to run? (yes/mod/no):

That looks like it should. No more need to use _mod_ multiple times! Type _no_ now as we don't actually need to do another backup at this time.

Restoring files

Instead we'll be doing a restore next. Issuing the command to initiate a restore job leads to a long list of choices:

* restore
[...]
To select the JobIds, you have the following choices:
1: List last 20 Jobs run
2: List Jobs where a given File is saved
3: Enter list of comma separated JobIds to select
4: Enter SQL list command
5: Select the most recent backup for a client
6: Select backup for a client before a specified time
7: Enter a list of files to restore
8: Enter a list of files to restore before a specified time
9: Find the JobIds of the most recent backup for a client
10: Find the JobIds for a backup for a client before a specified time
11: Enter a list of directories to restore for found JobIds
12: Select full restore to a specified Job date
13: Cancel
Select item: (1-13):

Pick option 5 - the most frequent one that I use, BTW:

Defined Clients:
1: backuphost.local-fd
2: fbsd-template.local-fd
Select the Client (1-2):

Huh? Where does that _fbsd-template.local_ client come from? Haven't we removed it from the configuration completely? Yes, we have. However we did our very first backup when this was still the hostname of the virtual machine and the _catalog_ remembers that it holds a backup for that client! Ignore that for now and select 1:

[...]
424 files inserted into the tree.
You are now entering file selection mode where you add (mark) and
remove (unmark) files to be restored. No files are initially added, unless
you used the "all" keyword on the command line.
Enter "done" to leave this mode.
cwd is: /

Notice that the prompt symbol changed (to $)? You're in a _virtual shell_ now from which you can navigate through a filesystem rebuilt from the files contained in the selected backup. It is fairly limited, however. The most obvious limitation is that it does not provide auto-completion. You'll have to live with that. And of course it does only provide a basic set of commands that allow you to change the current working directory, list files, etc.

Bacula said that we're in /. Let's see what we have there:

$ ls
etc/
usr/

Ok, so obviously our filesystem consists of a subset of both /etc and /usr (subset because we've excluded /etc/caspar from /etc and only included /usr/local/etc and not the whole /usr, remember?).

Let's see what is in /usr/local/etc, shall we:

$ ls /usr/local/etc

Nothing? Ouch. Does that mean that the backup is broken for whatever reason? No, in fact everything is fine. The problem here is that even this rather simple command is too advanced for Bacula! You want to see the contents of some directory? __Go there__ and have a look again:

$ cd /usr/local/etc
cwd is: /usr/local/etc/
$ ls
X11/
bacula/
bash_completion.d/
drirc
man.d/
pam.d/
periodic/
pkg.conf
pkg.conf.sample
rc.d/
sudoers
sudoers.d/
sudoers.sample
xdg/

There you go, it's all there. Let's _mark_ the sudoers file so that it'll be added to the restore job (you can also use the mark command, but _add_ is shorter!):

$ add sudoers
1 file marked.

Ok, that worked. Just bear in mind that you always have to enter the directory first before you can mark (or view) any files. Even if you know where in the filesystem something is, Bacula can't cope with anything more complicated than the very basic way of doing things.

Now let's change to /etc:

$ cd /etc
cwd is: /etc/

I won't show an _ls_ here since that'd be too much output. But do it yourself and see if _/etc/casper_ was really left out from the backup. Alright. Now let's assume we want to restore _csh.cshrc_, _csh.login_ and _csh.logout_ as well. Thankfully Bacula's virtual shell does support _globbing_ (wildcard expansion):

$ add csh*
3 files marked.

After selecting a bunch of files, let's tell Bacula that we've finished adding files:

$ done
[...]
4 files selected to be restored.
Run Restore job
JobName: RestoreFiles
Bootstrap: /var/db/bacula/backuphost.local-dir.restore.1.bsr
Where: /tmp/bacula-restores
Replace: Always
FileSet: Full Set
Backup Client: backuphost.local-fd
Restore Client: backuphost.local-fd
Storage: File3
When: 2016-09-25 08:02:20
Catalog: MyCatalog
Priority: 10
Plugin Options:
OK to run? (yes/mod/no):

Bacula has prepared a restore job and shows us a summary so we can either run, modify or cancel it. One thing to take note of is the _Where:_ line. All files that are restored will have their path prefixed with _/tmp/bacula-restores_. You could choose another directory or set it to just / if you want Bacula to overwrite the current files in-place. For now accept the current settings by entering yes:

Job queued. JobId=3

Wait a moment and hit Enter to see if Bacula has any news for you. It should:

You have messages.

Let's take a look at those:

* mes

You know the job report by now. Look for the following line that shows that everything went right:

Termination: Restore OK

The restore job completed successfully. There are some more useful commands that you can use when you select the files for the restore. I just want to mention two of them: _unmark_ and _lsmark_. What the former does should be pretty obvious: It deselcts files that were marked for restore before. This allows you to e.g. _add *_ and then unmark a few files which can be a much less painful way if you have more files that are to be restored than files that shouldn't! The other one _shows marked files in and below the current directory_. That means if you want to see the full list of marked files, change to / before you use lsmark!

File examination

Let's quit the bconsole now and take a look at the files that we just recovered from the backup:

* exit
# ls -1 /tmp/bacula-restores/etc/
csh.cshrc
csh.login
csh.logout

Looks like something was indeed restored. Since the original files have not actually been modified since we've backed them up, comparing the original and the restored ones should assure us of the files being intact:

# diff -q /usr/local/etc/sudoers /tmp/bacula-restores/usr/local/etc/sudoers

No output means that the files match exactly. Good! But where did those files get restored from? Remember what we did when we configured our backup device. Let's take a look at the directory that we specified there:

# ls -lh /var/backup/
total 1952
-rw-r—– 1 bacula bacula 1.9M Sep 24 22:08 file3a

This is the volume that we specified in the configuration and that was actually created when we had Bacula label it.

For learning purposes our very simple setup (just one volume) worked great. But before we move on, it's time to take care of creating a storage system that's a little bit more advanced: We need a pool! But how do those work?

Volumes, pools and labels

Speaking of labels... In the previous part we had to create one before the job that we queued could actually start. To be able to come up with a sensible backup solution for your use case you will have to understand how Bacula stores backup data. It uses so-called _volumes_. Think of a _volume_ as some kind of storage medium. This could either be a tape or disk-backed storage (i.e. a file). Backup data can be written to a volume until the maximum capacity is reached. Additional data will have to be written to another volume.

We're not really talking about using tapes here (which comes with its own set of problems from what I've read in Bacula's manual). Still it makes sense to remember that tapes are the reason for some design choices of Bacula. Volumes are such a case. While supporting multiple files may not seem like a huge benefit (for one host that is), it's easy to see that supporting more than one tape does. Once it's full, write to the next. But to be able to distinguish them, Bacula needs some means of telling them apart. This is where the _label_ comes in. A label basically means that some medium is marked as a volume that Bacula may use combined with a unique name so multiple volumes won't get confused. So each volume needs a label before Bacula will use it to put data on.

If backup jobs were tied to a volume this _could_ work for some cases but would probably lead to problems sooner or later. Imagine the case when a volume is probably only half full but nevertheless the next backup won't fit on it. That backup would have to be written to the next volume, wasting the free space on the former. Issues and inflexibilities like that are solved by introducing _pools_. A pool is basically a list of volumes (plus some options). If your job targets a pool, it no longer matters which volume to put it on - Bacula can take care of that for you in a dynamic way. Pools also allow enforcing some restrictions (like maximum size, maximum time to use) on volumes depending on what your needs are and what you are trying to do.

Since this post is already long enough, it's time to end this part. As always, let's save our progress by shutting down the VM and taking the next snapshot (when the status has reached poweroff:

# shutdown -p now
% vagrant status
% vagrant snapshot save tut_4_end

Intermission

After this part of the series we finally know how to restore files from a backup. We also have a better understanding of what jobs, volumes, labels and pools are.

In the next post we'll create and test a new pool, do some configuration cleanup and reset the catalog. This should conclude the single node part of the tutorial.

Bacula on FreeBSD (pt. 5): A day at the pool

BACK TO 2016 OVERVIEW