💾 Archived View for gem.sdf.org › jquah › extracurriculars › awk-books.gmi captured on 2022-01-08 at 14:02:44. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2021-12-03)

🚧 View Differences

-=-=-=-=-=-=-

Bookkeeping with awk, R, and octave

Motivation

If Socrates had lived in the age of big data, would he be famous for saying "the unquantified life is not worth living"? In the second decade of the 21st century we already find it completely natural to measure with the utmost precision our distances walked, energy taken in through meals, oxygen inhaled through exercise, blood pressure and heart rate, all recorded automatically by the portable computers we insist on calling "phones" (despite almost never using them to conduct voice conversations across great distances) and other wearable devices. In the preceding decade, while software engineers were laying the groundwork for this trend, I had already allowed my personal record-keeping to encroach on such territory as cash transactions, SmarTrip travels, and music listening choices, but only with the primitive tools of pencil and paper. For easier post-processing of the data, I started making these entries in dedicated software programs, including Grisbi and Gnucash for the monetary transactions. When the lag time of X forwarding and the unavailability of such specialized programs on campus computer labs limited the use of such graphical interfaces to the ever-decreasing hours spent at home, I sought out a record-keeping system that would work within the text-based environment of a terminal window.

Implementation details

Each transaction occupies a single line of a tab-separated text file, with the date as the first field. The next two fields represent the account being debited, and the account being credited. The remaining fields give further details of the transaction, concluding with the dollar amount. For example, to record the payment of a $54 phone bill using my credit card, I might write the line

20201220	card	util	-	N	phone bill	54.

Once the payment appears in my online statement, the fifth field gets changed from N to Y, indicating that the bank and I should be in agreement about the balance.

Two invocations of awk can scan the line above and create separate "double entry" records, with the "credit card" account debited by $54 and the "utilities" account credited by $54. In order to know the starting balances (on which subsequent debits and credits are performed), the text file begins with one line for each asset, using similar syntax:

20200101	eqty	cash	-	Y	starting balance	10.
20200101	eqty	citi	-	Y	CitiBank balance	500.
20200101	eqty	card	-	Y	credit card debt	0.

Summaries of the week-by-week totals in selected expense accounts are then produced by running

this awk script

with the text file of transactions as its argument.

After saving as histwk.txt the first part of the awk output (everything up to the current balances), the following octave script will populate a vector of cumulative sums for each tracked expense category, and generate a stacked area plot to visualize where the money is going over the course of the current year.

#!/usr/bin/env octave

fid=fopen("histwk.txt")
numrows=52;

[dataFrame,count]=fscanf(fid,"%g %e %e %e %e %e %e",[7,numrows]);

t=dataFrame(1,:)';
Inco=cumsum(dataFrame(2,:)');
Haus=cumsum(dataFrame(3,:)');
Util=cumsum(dataFrame(4,:)');
Food=cumsum(dataFrame(5,:)');
Trvl=cumsum(dataFrame(6,:)');
Andr=cumsum(dataFrame(7,:)');

area(t,[Haus,Util,Food,Trvl,Andr]);
xlabel("weeks since Jan 1");
ylabel("Year-to-Date Expenses and Income");
legend("Housing","Utilities","Food","Transportation","Other");
legend("location", "north");

hold on
plot(t,Inco,"-;Income;");

A similar result may be obtained using base R and ggplot:

stackedArea.r

Transferrable skills learned from this project

Back to Extracurriculars