I like my servers to run the least code possible, and the least services running in general, this ease maintenance and let room for other thing to run. I recently wrote about monitoring software to gather metrics and render them, but they are all overkill if you just want to keep track of a single value over time, and graph it for visualization.
Fortunately, we have an old and robust tool doing the job fine, it's perfectly documented and called RRDtool.
RRDtool stands for "Round Robin Database Tool", it's a set of programs and a specific file format to gather metrics. The trick with RRD files is that they have a fixed size, when you create it, you need to define how many values you want to store in it, at which frequency, for how long. This can't be changed after the file creation.
In addition, RRD files allow you to create derivated time series to keep track of computed values on a longer timespan, but with a lesser resolution. Think of the following use case: you want to monitor your home temperature every 10 minutes for the past 48 hours, but you want to keep track of some information for the past year, you can tell RRD to compute the average temperature for every hour, but for a week, or the average temperature for four hours but for a month, and the average temperature per day for a year. All of this will be fixed size.
RRD files can be dumped as XML, this will give you a glimpse that may ease the understanding of this special file format.
Let's create a file to monitor the battery level of your computer every 20 seconds, with the last 5 values, don't focus at understanding the whole command line now:
rrdtool create test.rrd --step 10 DS:battery:GAUGE:20:0:100 RRA:AVERAGE:0.5:1:5
If we dump the created file using the according command, we get this result (stripped a bit to make it fit better):
<!-- Round Robin Database Dump --> <rrd> <version>0003</version> <step>10</step> <!-- Seconds --> <lastupdate>1676569107</lastupdate> <!-- 2023-02-16 18:38:27 CET --> <ds> <name> battery </name> <type> GAUGE </type> <minimal_heartbeat>20</minimal_heartbeat> <min>0.0000000000e+00</min> <max>1.0000000000e+02</max> <!-- PDP Status --> <last_ds>U</last_ds> <value>NaN</value> <unknown_sec> 7 </unknown_sec> </ds> <!-- Round Robin Archives --> <rra> <cf>AVERAGE</cf> <pdp_per_row>1</pdp_per_row> <!-- 10 seconds --> <params> <xff>5.0000000000e-01</xff> </params> <cdp_prep> <ds> <primary_value>0.0000000000e+00</primary_value> <secondary_value>0.0000000000e+00</secondary_value> <value>NaN</value> <unknown_datapoints>0</unknown_datapoints> </ds> </cdp_prep> <database> <!-- 2023-02-16 18:37:40 CET / 1676569060 --> <row><v>NaN</v></row> <!-- 2023-02-16 18:37:50 CET / 1676569070 --> <row><v>NaN</v></row> <!-- 2023-02-16 18:38:00 CET / 1676569080 --> <row><v>NaN</v></row> <!-- 2023-02-16 18:38:10 CET / 1676569090 --> <row><v>NaN</v></row> <!-- 2023-02-16 18:38:20 CET / 1676569100 --> <row><v>NaN</v></row> </database> </rra> </rrd>
The most important thing to understand here, is that we have a "ds" (data serie) named battery of type GAUGE with no last value (I never updated it), but also a "RRA" (Round Robin Archive) for our average value that contain timestamp and no value associated to each. You can see that internally, we already have our 5 slots that exist with a null value associated. If I update the file, the first null value will disappear, and a new record will be added at the end with the actual value.
In this guide, I would like to share my experience at using rrdtool to monitor my solar panel power output over the last few hours, which can be easily displayed on my local dashboard. The data are also collected and sent to a graphana server, but it's not local and displaying to know the last values is wasting resources and bandwidth.
First, you need `rrdtool` to be installed, you don't need anything else to work with RRD files.
Creating the RRD file is the most tricky part, because you can't change it afterward.
I want to collect a data every 5 minutes (300 seconds), this is an absolute data between 0 and 4000, so we will define a step of 300 seconds to tell the file must receive a value every 300 seconds. The type of the value will be GAUGE, because it's just a value that doesn't depend on the previous one. If we were monitoring power change over time, we would like to use DERIVE, because it computes the delta between each value.
Furthermore, we need to configure the file to give up on a value slot if it's not updated within 600 seconds.
Finally, we want to be able to graph each measurement, this can be done by adding an AVERAGE calculated value in the file, but with a resolution of 1 value, with 240 measurements stored. What this mean, is for each time we add a value in the RRD file, the field for AVERAGE will be calculated with only the last value as input, and we will keep 240 of them, allowing us to graph up to 240 * 5 minutes of data back in time.
rrdtool create solar-power.rrd --step 300 ds:value:gauge:600:0:4000 rra:average:0.5:1:240 ^ ^ ^ ^ ^ ^ ^ ^ ^ | | | | | max value | | | | number of values to keep | | | | min value | | | how many previous values should be used in the function, 1 means just a single value, so averaging itself | | | time before null | | (xfiles factor) how much percent of unknown values do we agree to use for calculating a value | | measurement type | function to apply, can be AVERAGE, MAX, MIN, LAST, or mathematical operations | variable name
And then, you have your `solar-power.rrd` file created. You can inspect it with `rrdtool info solar-power.rrd` or dump its content with `rrdtool dump solar-power.rrd`.
Now that we have prepared the file to receive data, we need to populate it with something useful. This can be done using the command `rrdtool update`.
CURRENT_POWER=$(some-command-returning-a-value) rrdtool update solar-power.rrd "N:${CURRENT_POWER}" ^ ^ | | value of the first field of the RRD file (we created a single field) | when the value has been measured, N equals to NOW
The trickiest part, but less problematic, is to generate a usable graph from the data. The operation is not destructive as it's not modifying the file, so we can make a lot of experimentations on it without affecting the content.
We will generate something simple like the picture below. Of course, you can add a lot more information, color, axis, legends etc.. but I need my dashboard to stay simple and clean.
A diagram displaying solar power over time (on a cloudy day)
rrdtool graph --end now -l 0 --start end-14000s --width 600 --height 300 \ /var/www/htdocs/dashboard/solar.svg -a SVG \ DEF:ds0=/var/lib/rrdtool/solar-power.rrd:value:AVERAGE \ "LINE1:ds0#0000FF:power" \ "GPRINT:ds0:LAST:current value %2.1lf"
I think most flags are explicit, if not you can look at the documentation, what interests us here are the last three lines.
The `DEF` line associates the RRA AVERAGE of the variable `value` in the file `/var/lib/rrdtool/solar-power.rrd` to the name `ds0` that will be used later in the command line.
The `LINE1` line associates a legend, and a color to the rendering of this variable.
The `GPRINT` line adds a text in the legend, here we are using the last value of `ds0` and format it in a printf style string `current value %2.1lf`.
RRDtool is very nice, it's a storage engine for monitoring software such as collectd or munin, but we can also use them on the spot with simple scripts. However, they have drawbacks, when you start to create many files it doesn't scale well, generate a lot of I/O and consume CPU if you need to render hundreds of pictures, that's why a daemon named `rrdcached` has been created to help mitigate the load issue by delegating updates of a lot of RRD files in a more sequential way.
I encourage you to look at the official project website, all the other command can be very useful, and rrdtool also exports data as XML or JSON if needed, which is perfect to plug in with other software.