A recent PR to my one-true-path-experiment repository showed that some of my curve generation functions are not stack safe. There is an obvious solution to this problem, but I was afraid it might make the functions much slower. We can use the elm-benchmark package to find out.
The problem
The functions in the Curve
module take a list of points, and draw a smooth line through them. When given too many points, these functions could cause a RangeError
: the javascript exception thrown when there are too many recursive calls.
The offending function is:
catmullRomHelper :
Float
-> (Vec2 Float -> Vec2 Float -> Vec2 Float -> List (Vec3 (Vec2 Float)))
-> List (Vec2 Float)
-> List (Vec3 (Vec2 Float))
catmullRomHelper alpha ending points =
case points of
p0 :: p1 :: p2 :: [] ->
ending p0 p1 p2
p0 :: p1 :: p2 :: p :: rest ->
catmullRomPoint alpha p0 p1 p2 p
:: catmullRomHelper alpha ending (p1 :: p2 :: p :: rest)
_ ->
[]
The standard way of fixing a RangeError
in elm is to use tail-recursion.
Tail Recursion
Tail recursion is where a (branch of a function) calls itself as the outer (last-evaluated) expression.
This implementation of fac
is not tail-recursive: the outer expression of the else branch is a multiplication.
fac n =
if n == 0 then
1
else
n * fac (n - 1)
And this implementation is, because the outer expression is facHelper
, the function itself.
fac n = facHelper n 1
facHelper n accumulator =
if n == 0 then
accumulator
else
facHelper (n - 1) (n * accumulator)
Tail-recursion is more efficient because the compiler rewrites the recursion into a while-loop. Every function call allocates some memory on the stack to store variables and do other bookkeeping. Recursion that is too deep will eventually fill the stack completely and crash. A while-loop runs in constant space and thus does not suffer from this problem.
Benchmarking
So the solution is to make the offending catmullRomHelper
function tail-recursive, right?
Well, there is a wrinkle when working with lists, as catmullRomHelper
is: the transformation to a tail-recursive function reverses the order in which the result is generated. This does not matter in the fac
example because multiplication is commutative (i.e. x * y == y * x
).
But for lists, [x, y]
is not the same as [y, x]
. The simple solution is to just reverse the output list, but I was afraid that this extra pass over the output would make the function much slower. Only one way to find out.
I benchmarked these versions.
- The old implementation - it’s obviously broken but gives a performance base line
- the tail-recursive implementation - sound, but does reversing the result give bad performance?
- a reversed implementation - draws the curve from end to start. a bit weird but does not need the reverse and thus should be faster
Writing the benchmarks
Benchmarking in elm is actually ridiculously easy thanks to the elm-benchmark package.
suite1 : Benchmark
suite1 =
let
boringPoints : List (Vec2 Float)
boringPoints =
List.range 0 1000
|> List.map (\i -> ( toFloat i, toFloat i ))
in
describe "Curve"
[ Benchmark.compare "catmullRom"
"old/broken"
(\_ -> catmullRom 0.5 boringPoints)
"tail-recursive"
(\_ -> catmullRomTailRecursive 0.5 boringPoints)
]
Full code in this Ellie
Old vs. Tail-recursive
Tail-recursive is actually faster! At least, in Chrome. I was quite surprised by this, but it makes sense. Even when they don’t overflow the stack, recursive calls need a lot of memory. This memory needs to be cleaned up and this garbage collection is relatively slow.
In Firefox, the tail-recursive variant is ~10% slower, but firefox is faster overall (more runs/second). So either FF can optimize recursive calls better, or its garbage collection is faster than Chrome’s.
So the tail-recursive variant is stack-safe and actually faster (in Chrome) or a little slower (Firefox). I still wanted to know how much the reverse of the list costs, so I ran another test
Tail-recursive vs. Reversed
In Firefox
The reversed implementation is ~18% faster for lists of length 100.
And ~18% faster for lists of length 1000.
The exact numbers shift a little between runs, but I wanted to know how the speed increase scales with input size, and it looks like it’s reasonably constant.
For Chrome, reversed and tail-recursive perform about equally and again Firefox is faster overall.
Conclusion
The difference between tail-recursive and reversed is significant. But, I don’t think the speed increase justifies switching the default implementation to reversed
. The tail-recursive variant can run over 1500 times in one second with lists of length 1000. I think that should be good enough for pretty much any user of my package. When someone does run into performance problems in the future, I know that switching to the reversed implementation is an option.
Predicting performance in Elm can be hard. Functional languages often hide what’s really going on, and the JS engine adds a second layer of unpredictable performance. Most of the time, performance doesn’t really matter for elm apps, but when writing a package, a benchmark can be very helpful.
So, run some experiments on your packages - make sure you run them in multiple browsers - and make more informed decisions.
The code from this post is available as an Ellie