💾 Archived View for wilw.capsule.town › log › 2022-04-24-ledger-python-visualisation.gmi captured on 2024-08-18 at 17:20:10. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

Simple Ledger visualisations using Python

Posted on 24 April 2022

If you're a current follower of this blog then you may already know that I'm a bit of a fan of using plain text accounting for managing finances.

I mainly use the Ledger [1] text file format and CLI tool for bookkeeping and reporting on finances. This works great, and I can quickly and easily generate different kinds of reports using a range of simple Ledger commands.

For example, to generate a quick income/expenses balance sheet for a particular date range I can run `ledger balance income expense -b 2022/03/01 -e 2022/03/31`, which produces something along the lines of the following:

             £381.50  Expenses
             £129.87    Hosting
             £185.00    Services:Accountancy
              £66.63    Software
          £-1,000.00  Income:Sales:Contracting
--------------------
            £-618.50

Whilst this approach allows for easy numeric reporting for accounting purposes, by default there are no built-in ways for _visualising_ the flow of money.

In this post I'll talk a little about how to use Python to generate a simple visualisation of financial flow when provided with a Ledger export. This consists of three key steps:

Outputting data from Ledger in a readable format;
Processing the data;
Outputting a diagram that visually represents the data.

Step 1: Outputting Ledger data

Ledger provides a number of commands that allow for exporting and reporting on a ledger file. I recommend reading the documentation [2] for a full idea. However, in this case, we can simlpy use the `csv` command to generate an output that can be easily piped into a Python program.

For example, `ledger csv income expense -b 2022/03/01 -e 2022/03/31` produces an output with each transaction on its own row in CSV format, along the lines of below:

"2022/03/01","","Linode","Expenses:Hosting","£","40.58","",""
"2022/03/01","","Company1","Income:Consultancy","£","-120.5","",""
"2022/03/01","","Accountancy Company","Expenses:Services","£","200.28","",""
"2022/03/03","","Company2","Income:Sales","£","-900.0","",""
...

We can write some simple Python now to receive this data into our program. Create a new Python source file (e.g. `viz.py`) and add these contents:

import sys

# Define a Transaction class
class Transaction:
  def __init__(self, account, value):
    self.account = account # The account name for this transaction
    self.value = 0 - float(value) # Income is negative

# Prepare a list of transactions by reading from stdin
transactions = []
for line in sys.stdin:
  line = line.replace('"', '').split(',')
  transactions.append(
    Transaction(line[3], line[6])
  )

The code above reads in lines from `stdin` and instantiates `Transaction` objects for each line. We can invoke this code by piping the output from Ledger directly into the program:

ledger csv income expense -b 2022/03/01 -e 2022/03/31 | python viz.py

In this post, I'm only really interested in the financial flow (i.e. the values of transactions), and so we don't need to capture payees or other information from the transactions.

Step 2: Data processing

The next step is to process the transaction data. My aim is to achieve a Sankey [3] diagram that shows the flow from income to expense accounts, and to illustrate any profit.

As such, we need to build some account data for the various accounts in our Ledger file.

To do so, add the following code to your `viz.py` file:

# Instantiate an Account class
class Account:
  def __init__(self, name, value):
    print(name, value)
    self.name = name
    self.value = value

  # In the Sankey diagram, show income coming in from above and expense going downwards:
  @property
  def orientation(self):
    if self.value < 0: return -1
    if self.value > 0: return 1

accounts = []
for transaction in transactions:
  t_accounts = transaction.account.split(':')
  top_level = t_accounts[0]
  account = next(filter(lambda a: a.name == top_level, accounts), None)
  if not account:
    account = Account(top_level, 0)
    accounts.append(account)
  account.value += transaction.value

# Calculate profit
profit = sum(map(lambda t: t.value, transactions))
if profit > 0:
  accounts.append(Account('Profit', 0 - profit))

In the above code, we declare an `Account` class and then iterate through our transactions to build up a list of `Account` objects, with each having a total value representing the sum of all transactions involving that account. In this code, for now, we only consider the top level accounts (i.e. `Income`, `Expense`) rather than deeper levels (like `Income:Sales`).

In the last few lines above, we work out if the reporting period is profitable. If so, we manually create a `Profit` account with the left over value so that this can be visualised in the diagram.

Step 3: Producing the diagram

We'll make use of the `matplotlib` package for creating its own version [4] of a Sankey diagram. Install this package in your environment (`pip install matplotlib`) and then import the required modules at the top of `viz.py` (below the `sys` import):

from matplotlib.sankey import Sankey
import matplotlib.pyplot as plt

Then, at the bottom of your `viz.py` file, add the following code:

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, xticks=[], yticks=[], title="Ledger Financial Data")
sankey = Sankey(ax=ax, unit='£')
sankey.add(
  flows=list(map(lambda a: a.value, accounts)),
  labels=list(map(lambda a: a.name, accounts)),
  orientations=list(map(lambda a: a.orientation, accounts)),
  patchlabel="Income and expense"
)
sankey.finish()
plt.show()

In the above code we instantiate a new figure, axis, and Sankey diagram. We then add a new section to the diagram (using `add()`) and to this pass a bunch of parameters:

`flows`: a list of total values in each account
`labels`: a list of account names for each account
`orientation`: a list of orientations for the Sankey arrows
`patchlabel`: a title for the Sankey section

We then finalise the diagram and ask for the figure to be shown.

If you now run your code, as described in "Step 1", you should see something like the following being produced.

The diagram shows us some basic information about income, expense, and profit, but it would be nice to get a bit more of a breakdown.

Step 4: Add sub-accounts to the visualisation

To display more of a granular breakdown in the diagram, we need to make changes to the processing step (Step 2) and the display step (Step 3) above.

To begin with, modify the `Account` class so we can provide it with a reference to the account's parent (if it exists):

class Account:
  def __init__(self, index, name, value, parent=None):
    self.name = name
    self.value = value
    self.parent = parent # A reference to another Account object
    ...

Next, modify the loop in which we create the accounts to detect parent accounts:

accounts = []
for transaction in transactions:
  t_accounts = transaction.account.split(':')
  for level, a in enumerate(t_accounts):
    account_name = ':'.join(t_accounts[:level + 1])
    parent_account_name = ':'.join(t_accounts[:level])
    account = next(filter(lambda a: a.name == account_name, accounts), None)
    if not account:
      parent_account = None
      if parent_account_name:
        parent_account = next(filter(lambda a: a.name == parent_account_name, accounts), None)
      account = Account(len(accounts), account_name, 0, parent_account)
      accounts.append(account)
    account.value += transaction.value

The code above performs various additional lookups to try and find parent accounts, and instantiates all accounts at each level in the Ledger file.

Finally, we can adjust the diagram-producing code to make use of the various accounts:

sankey = Sankey(ax=ax, unit='£')
for parent_account in filter(lambda a: not a.parent, accounts):
  children = list(filter(lambda a: (a.parent and a.parent.name) == parent_account.name, accounts))
  if len(children):
    sankey.add(
      flows=list(map(lambda a: a.value, children)),
      labels=list(map(lambda a: a.name, children)),
      orientations=list(map(lambda a: a.orientation, children)),
    )
  else:
    sankey.add(
      flows=[parent_account.value],
      labels=[parent_account.name],
      orientations=[parent_account.orientation],
    )
sankey.finish()
plt.show()

This code now iterates through each parent account (i.e. each account without a parent of its own). It then looks up the children of these accounts and creates a new Sankey segment containing the child accounts (or a single account for the parent if there are no child accounts).

Running the script again with the Ledger data should yield something like the following diagram.

We now get more useful information being displayed, and we can see - at a glance - where the money is going.

Conclusion

Although we have made improvements to the diagram, there are still possible further enhancements that you may wish to explore, depending on your needs. For example, the current approach only goes two levels deep; the display stage would need to be amended to recursively navigate account levels to get an even finer-grained view.

I can recommend taking a look through the `matplotlib` Sankey documentation [5] for more information on how you can tweak the diagram, or the documentation in general [6] should you wish to explore other types of visualisations that you can pass the processed transactions in to.

Reply via email

Back to gemlog