A recent PR to my one-true-path-experiment repository showed that some of my curve generation functions are not stack safe. There is an obvious solution to this problem, but I was afraid it might make the functions much slower. We can use the elm-benchmark package to find out.
The offending function is:
-> (Vec2 Float -> Vec2 Float -> Vec2 Float -> List (Vec3 (Vec2 Float)))
-> List (Vec2 Float)
-> List (Vec3 (Vec2 Float))
catmullRomHelper alpha ending points =
case points of
p0 :: p1 :: p2 ::  ->
ending p0 p1 p2
p0 :: p1 :: p2 :: p :: rest ->
catmullRomPoint alpha p0 p1 p2 p
:: catmullRomHelper alpha ending (p1 :: p2 :: p :: rest)
The standard way of fixing a RangeError in elm is to use tail-recursion.
Tail recursion is where a (branch of a function) calls itself as the outer (last-evaluated) expression.
This implementation of fac is not tail-recursive: the outer expression of the else branch is a multiplication.
fac n =
if n == 0 then
n * fac (n - 1)
And this implementation is, because the outer expression is facHelper, the function itself.
fac n = facHelper n 1
facHelper n accumulator =
if n == 0 then
facHelper (n - 1) (n * accumulator)
Tail-recursion is more efficient because the compiler rewrites the recursion into a while-loop. Every function call allocates some memory on the stack to store variables and do other bookkeeping. Recursion that is too deep will eventually fill the stack completely and crash. A while-loop runs in constant space and thus does not suffer from this problem.
So the solution is to make the offending catmullRomHelper function tail-recursive, right?
Well, there is a wrinkle when working with lists, as catmullRomHelper is: the transformation to a tail-recursive function reverses the order in which the result is generated. This does not matter in the fac example because multiplication is commutative (i.e. x * y == y * x).
But for lists, [x, y] is not the same as [y, x]. The simple solution is to just reverse the output list, but I was afraid that this extra pass over the output would make the function much slower. Only one way to find out.
I benchmarked these versions.
The old implementation - it’s obviously broken but gives a performance base line
the tail-recursive implementation - sound, but does reversing the result give bad performance?
a reversed implementation - draws the curve from end to start. a bit weird but does not need the reverse and thus should be faster
Writing the benchmarks
Benchmarking in elm is actually ridiculously easy thanks to the elm-benchmark package.
suite1 : Benchmark
boringPoints : List (Vec2 Float)
List.range 0 1000
|> List.map (\i -> ( toFloat i, toFloat i ))
[ Benchmark.compare "catmullRom"
(\_ -> catmullRom 0.5 boringPoints)
(\_ -> catmullRomTailRecursive 0.5 boringPoints)
Tail-recursive is actually faster! At least, in Chrome. I was quite surprised by this, but it makes sense. Even when they don’t overflow the stack, recursive calls need a lot of memory. This memory needs to be cleaned up and this garbage collection is relatively slow.
In Firefox, the tail-recursive variant is ~10% slower, but firefox is faster overall (more runs/second). So either FF can optimize recursive calls better, or its garbage collection is faster than Chrome’s.
So the tail-recursive variant is stack-safe and actually faster (in Chrome) or a little slower (Firefox). I still wanted to know how much the reverse of the list costs, so I ran another test
Tail-recursive vs. Reversed
runs / second
goodness of fit
The reversed implementation is ~18% faster for lists of length 100.
runs / second
goodness of fit
And ~18% faster for lists of length 1000.
The exact numbers shift a little between runs, but I wanted to know how the speed increase scales with input size, and it looks like it’s reasonably constant.
For Chrome, reversed and tail-recursive perform about equally and again Firefox is faster overall.
The difference between tail-recursive and reversed is significant. But, I don’t think the speed increase justifies switching the default implementation to reversed. The tail-recursive variant can run over 1500 times in one second with lists of length 1000. I think that should be good enough for pretty much any user of my package. When someone does run into performance problems in the future, I know that switching to the reversed implementation is an option.
Predicting performance in Elm can be hard. Functional languages often hide what’s really going on, and the JS engine adds a second layer of unpredictable performance. Most of the time, performance doesn’t really matter for elm apps, but when writing a package, a benchmark can be very helpful.
So, run some experiments on your packages - make sure you run them in multiple browsers - and make more informed decisions.