💾 Archived View for gem.splatt9990.com › old_stuff › 20210104-Schwartzian_Transform.gmi captured on 2024-12-17 at 09:51:47. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-05-24)

-=-=-=-=-=-=-

Schwartzian Transforms

So there's a useful coding technique called a Schwartzian Transform (named after Randall Schwartz) that's famous within the Perl community but is relatively unknown outside of it.

Randall Schwartz

The Schwartzian transform is a way to sort a list (generally of objects, but that's not required) by a computed key without needing to compute the key more than once per item. It comes from a similar lisp idiom called 'decorate, sort, undecorate'. More or less, the idea is that you map over the list to compute the sort key, returning an array with two elements (a 'pair' or 'tuple' in some langugages), the original item and the computed sort key. Using this new list of tuples, you then sort the list by the second element (i.e. the sort key.) Finally, you map over the now sorted list returning only the first item (i.e. the original items) which are now in the correct order.

It's a concept that's probably better demonstrated in code. I'll demonstrate it in Perl first. Don't worry if you're not familiar with Perl, I'll explain it at a high enough level. I'll also do an example in JavaScript as I'm familar with that language too.

# imaginary db call to fetch results
my @list = $dbh->get_table_list(); 

# perl's map and sort functions each take a small lambda and return a list.
# the $_ variable is the default variable which is set to the item be iterated on.
# $a and $b are the item's being compared and <=> is the numerical sorting operator (it returns 1, 0, or -1 appropriately for the two operands).

@list = map { $_->[0] }
  sort { $a->[1] <=> $b->[1] }
  map { [ $_, $_->expensive_calculated_field ] } @list;

In this example, we fetch a list of objects, ostensibly from an ORM which instantiates objects from a database query. These objects have a method called expensijve_calculated_field which is... expensive to calculate (I'm very creative at naming things, can you tell?)

Here's the Javascript version (using ES6.)

let list = getItems();
 
list = list.map( item => [ item, item.expensiveCalculatedField() ] )
           .sort( (a, b) => a[1] - b[1] )
           .map ( tuple => tuple[0] );

What's so special?

You might ask, what's the point of this? It iterates over the same list at least three times. Couldn't you just sort the list directly? Yes, you could. That code would like this:

my @list = $dbh->get_table_list();

@list = sort {
          $a->expensive_calculated_field <=> $b->expensive_calculated_field
        } @list;

or in JS:

let list = getList();

list.sort(
  (a, b) => (
    a.expensiveCalculatedField() - b.expensiveCalculatedField()
  )
);

and that code will in fact do the same thing without needing to iterate over the list as much. However, the expensive calculation will end up being called for every item, every time there's a comparison. Let's say the field takes about 5ms to calculate (a pretty long time for modern CPU's). That would mean that for every comparison it would take 10ms. Perl and Javascript use a quicksort algorithm to do it's sorting so it will do a fairly minimal n*log(n) comparisions (on average). Even still, that means you'll be doing at least a comparison for every item in the list. Usually you'll do even more. If your list has a 1,000 elements then, you'll take at least 10,000ms (or 10 regular seconds) and probably longer than that.

If instead, you pre-calculate the key (as in the Schwartzian Transform), you only have run that function once for each item. In the case of a 1,000 element list, that's 5,000 ms (or 5 seconds) plus the overhead of sorting the list (which honestly would be pretty negligible in this case.) In this case that's *at least* a 50% time savings, but more than likely even more.

Conclusion

I hope this has been helpful. Like I said earlier, I don't see techniques like this one mentioned often outside of the Perl community but it's a very useful thing to know. Hopefully, I added another tool to the toolbox for you.