💾 Archived View for sotiris.papatheodorou.xyz › gemlog › 20220707_announcing_tsvutils.gmi captured on 2024-09-28 at 23:42:45. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-07-16)
-=-=-=-=-=-=-
I think I have mentioned how much I like Tab Separated Value (TSV) files before. They are simple to write and read which makes them easy to manipulate using standard Unix tools like awk. Here is an example TSV file:
Time (s) Temperature Distance (m) 0.0 29 0 0.5 29.5 0.8 2.2 31 3
Over the last months my collection of TSV related scripts grew. I thought I could write a few more and create a collection of Unix-like utilities for TSV files. The scripts try to mimic Unix utilities in behavior while taking into account the structure of TSV files. For example tsvtail will always print the TSV header and won't count it in the number of lines to print and tsvcut allows selecting columns by name instead of index.
The utilities read data from standard input and write to standard output so you can compose them into pipelines. They are all documented via man pages for quick, offline access to documentation. Most of the utilities are POSIX shell scripts with the exception of the Comma Separated Value (CSV) converters where I used Python's csv module and tsvplot which uses gnuplot. I have a few more utilities planned so the list will grow eventually.
Plot the data in file.tsv using the first column as the x-axis data and all other columns as y-axis data and save the result to plot.png.
tsvplot file.tsv > plot.png
Convert the data in file.csv to TSV, keep the columns whose names match one of the extended regular expressions Time and Distance and then plot the data.
csv2tsv file.csv | tsvcut Time Distance | tsvplot
Keep the columns of file.tsv whose names contain (m), format them as an HTML table and display it in the lynx browser. The parentheses have to be backslash escaped because they are extended regular expression special characters.
tsvcut '\(m\)' < file.tsv | tsv2html | lynx -stdin
Sort the rows of file.tsv in descending order based on the values of the column whose name is Temperature, keep the first 5 and save them as a Markdown table in top_5.md.
tsvsort -r '^Temperature