💾 Archived View for freeside.wntrmute.net › log › 2021 › 20210525.gmi captured on 2023-07-22 at 16:35:13. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2022-03-01)
-=-=-=-=-=-=-
As I work through this book, I'm going to keep writeups going. They're in my Dendron notes, but I'll also publish them here.
Classic Computer Science Problems in Java
However, many people get into a rut, whether they be Android developers or enterprise web developers, where most of their time with the language feels like "API mashup." Instead of working on interesting problems, they find their time being spent learning every corner of an SDK or library. This book aims offer those programmers a respite.
My day job as an engineering tech lead doesn't involve much programming these days. It's mostly writing design docs, reviewing other people's code, and so forth. I wanted to learn a new language and not to learn a bunch of libraries first. This book stood out to me for that. I considered doing the Python version, but the idea of learning a new language appeals to me. I had started trying to learn Java for another project, and I think this book will give me some interesting problems to learn the language better and solve some interesting problems along the way.
In my writeups, I'll use Python for some of the examples because it's more pseudocode like and I feel like it's a better illustrative method. It'll also force me to understand the problem and not just have a rote memorization of some Java code.
The first chapter covers some of the fundamental building blocks used in solving problems: recursion, memoization, compression, rote translation, and bit manipulation.
Recursive methods call themselves; to avoid infinite recursion, we need to identify base cases. These are the cases where recursion stops. A canonical example is a recursive fibonacci sequence:
def fib(n): # Base case. if n < 2: return n # Recurse. return fib(n-1) + fib(n-2)
This example recomputes fibonacci values repeatedly. We can speed this up by caching results, which is called memoization.
__FIB_CACHE__ = {} def fib_memo(n): if n < 2: return n if not n in __FIB_CACHE__: __FIB_CACHE__[n] = fib_memo(n-1) + fib_memo(n-2) return __FIB_CACHE__[n]
The speedup from memoization can be pretty significant:
In [7]: %timeit fib(35) 2.73 s ± 41 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) In [8]: %timeit fib_memo(35) 125 ns ± 0.215 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
Recursion is often an elegant and intuitive way to solve a problem, but recursive solutions usually come with a performance hit. Any problem that can be solved recursively can also be solved iteratively.
def fib_iter(n): last = 0 current = 1 for i in range(n): prev_last = last last = current current += prev_last return last
We can compare the runtime for this approach with the recursive approaches above:
In [11]: %timeit fib_iter(35) 1.71 µs ± 33.6 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
The memoized approach is faster (~125 ns vs ~1710 ns), but it comes with the higher storage cost of the cache. In this case, it's trivial (~3K) but it's a factor in deriving solutions.
This chapter also covered trivial compression. In trivial compression, we reduced the representation space of a problem. Genetic sequences are comprised of sequences of one of four nucleotides, represented as A, C, G, and T. An uncompressed approach would be to represent these with only two bits: A->00, C->01, G->10, T->11. A single character in Java takes up 16 bits, so this represents a saving of 8x. For large sequences of genes, this could add up. In practice, real compression typically works by finding patterns or structures in the data to eliminate repeated data. Compression offers a way to trade computational complexity for storage space; this tradeoff will have to be evaluated when choosing whether or not (and how) to compress data.
The Towers of Hanoi is a well-known CS problem. We solved it recursively in this chapter, but with the note that as the number of discs increases, the time taken increases exponentially. Carl Burch wrote an article about the subject that goes into more depth.
I learned about the BitSet type; it's interesting, but not particularly ergonomic. Exercise #2 is
The BitSet class in the Java standard library has a flaw: while it keeps track of how many total bits have been set true, it does not keep track of how many bits have been set in total, including bits that have been set false (that’s why we needed the length instance variable). Write an ergonomic subclass of BitSet that keeps track of exactly how many bits have been set either true or false. Reimplement CompressedGene using the subclass.
This is sort of open-ended. At first, I took the approach of writing a class with two `BitSet`s; one containing the bit values, the other noting which bits had been set ever. However, in the context of this problem, I decided on a different approach. I wrote a `setNext` method that would set the next bit in the set. In both cases, I also added a `getValue` method that would handle reading multiple bits and returning the combined value. The class I wrote is
package net.kyleisom.ccspij.chapter1; import java.util.BitSet; public class Bits { private final BitSet bits; private int size; public Bits(int size) { bits = new BitSet(size); size = 0; } public void setNext(boolean value) { bits.set(size, value); size++; } public boolean get(int bit) { return bits.get(bit); } public int getInt(int bit) { return get(bit) ? 1 : 0; } public int getSize() { return size; } public int getValue(int start, int count) { int value = 0; final int stop = start + count - 1; for (int i = start; i < stop; i++) { value += getInt(i); value <<= 1; } value += getInt(stop); return value; } }
If I wanted to make it more generally useful, I could add a seek index, and maybe a max index to keep track of the furthest bit written.