💾 Archived View for cdaniels.net › assets › talks › shell.txt captured on 2023-01-29 at 03:25:27.

View Raw

More Information

⬅️ Previous capture (2021-11-30)

-=-=-=-=-=-=-

## Shell Scripting ############################################################

> ACM 2018-11-07

==== Topics ===================================================================

------ Introductory -----------------------------------------------------------


	* convert
	* pandoc
	* ps
	* lsof
	* sed basics (s///g)
	* awk basics (field extraction)
	* grep
	* find








------ Intermediate -----------------------------------------------------------



















------ Advanced ---------------------------------------------------------------






















==== Emperor SH and the Traveler ==============================================

A young traveller came to visit Emperor Sh, and found him sitting in his
sparsely furnished temple.

"Emperor Sh," he said, "I am told you are the greatest scholar of shell that
the world has known."

Emperor Sh made no reply. The traveller continued.

"I have come to ask your advice. I am thinking of developing a character-based
graphing tool. It will interactively change the plot based on key presses. What
shell commands should I use?"

"Don't do it in shell," said Emperor Sh, curtly.

The young traveller was confused. He tried again. "Well, I am also working on a
database audit script. It needs to validate that certain characters do not
appear in any fields in several tables."

"Don't do it in shell," repeated Emperor Sh.

The traveller began to despair. "I have journeyed one thousand miles, tried
thirteen distributions of my operating system, and waded through hundreds of
badly-written manual pages," he cried, "and now I am finally come to Emperor
Sh, the greatest shell programmer in the world, and am told to use no shell at
all! I may as well do what my friends told me, and just string together some
small Python scripts in virtualenv environments!"

"Good idea," said Emperor Sh. "Do it in shell."

Enlightenment crushed down on the young traveller and he bowed to the emperor,
sobbing with reverent joy.

> Source: https://sanctum.geek.nz/etc/emperor-sh-and-the-traveller.txt

==== Summary ==================================================================

------ Return Codes -----------------------------------------------------------























------ Flow Control -----------------------------------------------------------

if expression ; then
	command
fi

[ -f "$something" ]

[ -z "$somevar" ]

[ "thing" = "thing2" ]

while expression ; do
	command
done

for varname in list ; do
	command
done






------- Command Substitution --------------------------------------------------



echo "the current date is $(date)"




















------ Re-Direction -----------------------------------------------------------

Re-direct standard in

	command < stdin.txt

Standard out:

	command > stdout.txt

or

	command 1> stdout.txt

Standard error:

	command 2> stderr.txt

All at once:

	command < input.txt > stdout.txt 2> stderr.txt

Merge stdout and stderr:

	command > output.txt 2>&1

stdout of one commad to stdin of another:

	command1 | command2

stdout of one command as a file handle:

	command1 <(command2)

------ Useful Tools -----------------------------------------------------------

~~~~~~~~ ImageMagick (convert) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Convert allows you to perform a wide variety of transformations and conversions
on various image formats. For example:

convert foo.jpg foo.png

	Would convert a JPEG image to a PNG.

>>> More here: http://www.imagemagick.org/Usage/

Combining with the loops we discussed already:

	for f in *.jpg ; do convert "$f" "$(basename "$f" .jpg).png" ; done

Would convert every JPEG in ./ to PNG, with the same name but a changed
extension. *.jpg is a path expansion, which we will discuss later. "$(basename
"$f")" is a command substitution, which will also be discussed later.







~~~~~~~~ pandoc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Pandoc allows one to convert nearly any type of document to nearly another
type. Not always very well though (jack of all trades, master of none).

For example: pandoc README.md --output README.pdf --from markdown --to latex
Would convert README.md to a PDF via LaTeX.

>>> More examples here: https://pandoc.org/demos.html
















~~~~~~~~ ps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is relatively common across POSIX, although output format can vary in some
cases. ps lists running processes. Usually, you want "ps aux" which lists
everything.

Hint: you can get a list of just process names and PIDs with

	ps aux | awk '{print($2, $11);}'

Note that this is not robust against commands with spaces in the name.














~~~~~~~~ lsof ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Linux-specific command that lists all open file handles and sockets. Useful for
figuring out which process has a lock on a file (i.e. lsof | grep "somefile").





















~~~~~~~~ sed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

sed exposes most of the functionality of the ed text editor in an automated,
stream-oriented interface. The capabilities of sed are expansive[1], we'll just
cover the basics today.

The most common use for sed is re-writing patterns in streams, for example:

	echo -e "line 1\nline 2\nline 3" | sed 's/line/number/g'

This example would output:

	number 1
	number 2
	number 3

>>> See also the famous sed1line.txt: http://sed.sourceforge.net/sed1line.txt

>>> Danger: gnu sed and BSD sed are different and often incompatible, be
careful out there!

1: https://aurelio.net/projects/sedarkanoid/



~~~~~~~~ awk ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Awk is an extremely powerful record-processing oriented DSL. The fully depth of
awk is beyond the scope of this talk. For our purposes, it is most useful for
extracting fields from columnar data with irregular separators (i.e. ps aux).

	awk '{print($1, $2, $3... $n);}'

Will extract columns 1, 2, 3... up to n. You can change the order, separator,
or selection to your liking.

>>> Highly suggest "The Awk Programming Language"": https://www.amazon.com/Programming-Language-Kernighan-Weinberger-Paperback/dp/B00LLOFNOW

(hint: PDFs are relatively easy to find on the net)










~~~~~~~~ grep ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Grep is used to search for patterns based on regular expressions.

Select every line containing foo from a file:

	grep "foo" < file.txt

Find every file recursively under . containing the string foo:

	grep "the" -R ./

Note that some versions of grep support PCREs via the -P flag - this is not
portable. Instead just invoke perl directly:

	grep -P "[a-z]{4}"

is equivalent to:

	perl -ne 'print if /[a-z]{4}/'





~~~~~~~~ find ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Find can be used to search for files by their name or path.

Find all files contain the string foo in the filename:

	find -name "*foo*"

Use -iname for case insensitive, and -path/-ipath to search through other path
components.

If you want to use find as the input to a loop, use -print or -print0 - this
helps to resolve string escaping issues. A common idiom is:

	find . -iname "pattern" -print | while read -r line ; do something ; done

Or

	find . -iname "pattern" -print0 | xargs -0 somecommand

You can also use find -exec, but this is generally not preferred as it is less
flexible.



------ Check if a Command is in $PATH -----------------------------------------

Often, we want to check if a specific command is available for use - this is
especially useful for sanity checking in scripts. A common idiom is:

	if [ ! -x "$(which coolcommand 2> /dev/null)" ] ; then
		echo "oops, looks like coolcommand isn't installed!"
		exit 1
	fi

Note that this is always guaranteed to be portable, while other methods (i.e.
command -v) are not.

$(which coolcommand 2> /dev/null) gets the path to the coolcommand executable
found in $PATH. The 2>/dev/null discards any error message. If no such command
is found, then the test simplifies to [ ! -x "" ], which is always true.









------ lockdirs ---------------------------------------------------------------

Often, we want to guarantee that only one instance of our program is running at
any given moment. There are many ways to accomplish this, but this simplest
would be a lockdir. A lockdir is the convention of creating a specific
directory when the program starts, and deleting it when it exists. This is
preferred over creating a file (i.e. with touch), since directory creation is
atomic on most systems.

Here is an example:

	if [ -d "/var/lockdir/myprog" ] ; then
		echo "oops, myprog already running!"
		exit
	fi

	mkdir -p "/var/lockdir" # ensure /var/lockdir exists
	mkdir "/var/lockdir/myprog"

Note that the method shown above is not robust against a program crashing
before it deletes the lockdir.




------ Background Jobs --------------------------------------------------------

Any command you suffix with & will be run in the background. You can get a list
of jobs you have backgrounded with the "jobs" command.

A common idiom is:

	somecommand > /tmp/output.txt &
	# do other things
	wait # block until all background jobs exit
	# do things with output.txt

You can background a running program (i.e. vim) via <C-z>. You can later
foreground the job via the "fg" command. This is useful for the workflow of
edit->run->edit.

You can also detach a background job from your shell session with disown (i.e.
exiting your terminal will not halt the program). Example:

	atril something.pdf &
	disown
	exit

Would result in atril continuing to run, despite the terminal session being
closed.

------ Math in Shell Scripts --------------------------------------------------

The best way to get accurate math results in shell scripts is via the bc
command. There are also various ways of performing math operations natively,
but these are cumbersome, and most shells do not support floating point math.

Add two numbers:

	A=7
	B=3
	echo "$A + $B" | bc

Floating point divide:

	A=7
	B=3
	echo "scale=2 ; (1.0 * $A) / $B" | bc

Hint: scale sets the number of digits of precision. By default, scale is 0. If
you are unsure if one of your terms will be expressed as an int or a float,
multiplying by 1.0 will coerce it's type to be a float.

You can use command substitution to catch the result of your calculations.

Note that this method (bc) can be very slow. If this is too slow for your
application, think long and hard about if it really needs to be implemented as
a shell script.

------ xargs ------------------------------------------------------------------

xargs allows you to reads the arguments to a program from standard input.

Let's look at an example:

	echo -e "line 1\nline 2\nline 3\nline 4" | xargs echo "my arguments are: "

Outputs:
	my arguments are:  line 1 line 2 line 3 line 4

We can force xargs to only provide one argument per call

	echo -e "line 1\nline 2\nline 3\nline 4" | xargs -n1 echo "my argu >

Outputs:

	my arguments are:  line
	my arguments are:  1
	my arguments are:  line
	my arguments are:  2
	my arguments are:  line
	my arguments are:  3
	my arguments are:  line
	my arguments are:  4

Or an arbitrary number:

	echo -e "line 1\nline 2\nline 3\nline 4" | xargs -n4 echo "my argu >

Outputs:

	my arguments are:  line 1 line 2
	my arguments are:  line 3 line 4

------ Officially Unofficial sh Strict Mode -----------------------------------

sh has an unofficial "strict mode", which you should usually use unless you have
a reason not to. Executing

	set -e
	set -u

At the beginning of your program will cause it to exit with an error if any
command throws an uncaught error (set -e), or an variable you attempt to use is
undefined (set -u).

This can save you from cases like this:

	# ... some code ...
	rm -rf $HOME/$SOMEPATH

Without set -u, if SOMEPATH is undefined, this will simplify to:

	rm -rf $HOME

Which is almost certainly not what you intend.



------ Debugging Shell Scripts ------------------------------------------------

A good way to debug your shell scripts is via "set -x". This causes every
line of shell which is executed to also be printed.





















------ Abusing /proc for Fun and Profit ---------------------------------------

On Linux only (or FreeBSD with procfs emulation enabled), /proc allows you to
enumerate the file handles and sockets of all open programs. In particular,
/proc/$PID/fd/ contains all open file descriptors for process with ID $PID.
Remember that FD 0 will be standard in, FD 1 standard out, and FD 2 standard
error. Prank your friends by writing to standard out of their shell sessions!

You can also use this method to send input into the standard in of a detached
process. Consider a case like this:

	apt install somepackage &
	disown

Then apt prompts for input and hangs until you provide it. Just use ps aux to
find the PID of your wayward apt instance and echo -n "y\n" into it's standard
input!.








------ Path Expansions and YOU ------------------------------------------------

When writing shell scripts, it is important to understand path expansions.
Generally, in any string which his not surrounded by single quotes, characters
like '*' and '~' will be expanded before being passed to any child processes.

Consider this shell session:

	$ ls
	a     b     c     d.txt e.txt
	$ echo *
	a b c d.txt e.txt
	$ echo *.txt
	d.txt e.txt

Notice that echo itself does not expand the argument '*' - the shell does this
before echo even starts running.

This can cause some interesting behaviors. For example, consider this command:

	rm -rf ./.*

What's wrong here? 


		.
		.
		.
		.
		.
		.
		.
		.
		.

The issue with this command is that ./.* expands to every child of ./ that
begins with '.' - including '..'. rm -rf will thus descend into '..' and
process the parent directory of ./ as well, and so on until it reaches the
filesystem root, then back down into every directory on the entire system. In
newer versions of rm, this behavior has been disabled for safety however.

------ portable path normalization: why we can't have nice things -------------

Often, when we deal with paths provided to us by the user, we want to normalize
the path to be "nice". For example, converting "../../foo/../bar/baz/././" into
its more comprehensible equivalent: "../../bar/baz/". Unfortunately, there
isn't a single standardized method of accomplishing this. the `realpath`
command was written long ago to handle this specific case, but it has yet to
become standard even across linux distributions, let alone across other UNIX
variants.

This can cause all kinds of havoc for unsuspecting shell scripts, and there
isn't really a clean or non-kludgey solution. The only real, portable way of
handling the path normalization issue is to just bundle your own realpath
implementation that will run in POSIX sh. Fortunately, Mr. Michael Kropat has
made available a high-quality MIT licensed implementation.

>> sh-realpath: https://github.com/mkropat/sh-realpath