💾 Archived View for thrig.me › blog › 2023 › 05 › 03 › memoization.gmi captured on 2023-11-14 at 08:17:16. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-05-24)
-=-=-=-=-=-=-
Memoization is a fancy term for having a cache of already computed results for something expensive or often calculated. Ideally the cache lookup should be faster than doing the calculation, and hopefully there is enough storage available for the cache. Usually computer science folks trot out the Fibonacci sequence here, but we'll use something simpler to understand, at the cost of the code not making sense.
(defun slow-identity (x) (sleep 5) x)
The sleep makes no sense; it will needlessly slow this poor implementation of the Common LISP IDENTITY function. The sleep would usually be some sort of involved calculation, or a database or network request that could take some time. The slowdown does demonstrate the speed-up possible through memoization, which usually happens via a wrapper function that calls the original function if the output for the given input has not already been cached. An implementation along these lines would look pretty similar in Perl or JavaScript: take a function pointer to call, wrap that in a returned function that does a hash check or calls the original function point to update the hash with.
(defun slow-identity (x) (sleep 5) x) (defun remember (fn) (let ((cache (make-hash-table))) (lambda (x) (multiple-value-bind (old-value exists?) (gethash x cache) (if exists? old-value (setf (gethash x cache) (funcall fn x))))))) (let ((wrapped (remember #'slow-identity))) (time (funcall wrapped 'cat)) (time (funcall wrapped 'cat)))
With a cache hit, subsequent calls of the wrapped function return a bit faster than the original does:
Evaluation took: 5.004 seconds of real time 10,581,244,842 processor cycles ... Evaluation took: 0.000 seconds of real time 2,226 processor cycles ...
Not everything benefits from memoization; the computation might be fast enough that a cache would only add latency and memory use, or the cache hit rate might be too low to see any benefit, or there may be memory pressure that prohibits the use of a cache. Fibonacci repeats previous calculations, a lot, which is why the compsci types reach for it to showcase memoization. The function being memoized probably should return the same output for the same input (idempotent, if you want a fancy term), and deriving the cache key is more complicated when there are varying numbers of arguments to the function (arity, for yet another fancy term).
Memoization is covered in more detail in the following, both of which have various interesting things in them.
Paradigms of Artificial Intelligence Programming (PAIP)
All done reading those? Great!
Another neat thing is lazy evaluation, which works on "infinite" lists. Results of some calculation are stored onto a list, the end of which is a function that will on access generate the next item in the list. Naturally bad things will happen if you try to generate too many items for the system to hold in memory, hence it's an "infinite" list, though workable if you stay within reasonable bounds. This applies to many things on computers.
Some languages have native support for infinite lists, Haskell or Raku come to mind, though they can be pretty easy to implement elsewhere. PAIP calls these pipes.
#!/usr/bin/env perl use 5.36.0; sub generator (@start) { my @seq = @start; push @seq, sub { # sneak the next value in just before the code reference that # points to this subroutine splice @seq, -1, 0, ( $seq[-3] + $seq[-2] ); $seq[-2]; }; sub ( $index = -1 ) { if ( $index == -1 ) { $seq[-1](); # generate the next entry } elsif ( 0 <= $index <= $#seq ) { return $seq[$index]; # we got this } else { die "we don't got this ($index)"; } } } my @start = ( 1, 2 ); my $fn = generator(@start); say $_ for @start; # generate a few values from the sequence say $fn->() for 3 .. 10;
As for practical uses... uh, those are left to the reader.
On-Line Encyclopedia of Integer Sequences
tags #lisp #perl