learn by teaching

This is just a quick note for the day, but I learned something new today because of a comment on an answer I posted on Stackoverflow yesterday.

A Schwartzian Transform is a technique used for sorting objects that’s useful when the sort key on objects is expensive to calculate. The idea is just to calculate the sort key for each object only once and store it so it doesn’t need to be calculated again. An analogy would make this clearer.

You have 20 boxes of jellybeans to sort by bean count. You don’t know the number in any of them, so you have to count the contents of each box. Lets say this takes you about 5 minutes. In that time, you’ll forget how many are in past boxes you’ve counted, except the last one.

This is good enough for you to know whether the current box should go before or after the last box you encountered, but it doesn’t tell you exactly where a box should go in the lineup. To figure that out, you’ll have to recount some boxes.

The Schwartzian Transform in this case would be to tape some paper to each box and write down the count on that label. You’ll have to count the contents of each box once and only once, then just glance at the numbers and compare.

The O-notation of the key access portion of the Schwartzian approach is O(n). By comparison, the non-Schwartzian approach is O(n log n) to access the sort keys. There’s more to the sort than just reading keys, and the total sort complexity varies by the sort algorithm used. Still, going back to the jellybean example, you’d count through about 26 boxes instead of 26 (I’m computing log n using log base 10, natural log would give different results), which would mean 130 minutes of counting beans instead of 100.