A recent PR to my one-true-path-experiment repository showed that some of my curve generation functions are not stack safe. There is an obvious solution to this problem, but I was afraid it might make the functions much slower. We can use the elm-benchmark package to find out.

## The problem

The functions in the `Curve`

module take a list of points, and draw a smooth line through them. When given too many points, these functions could cause a `RangeError`

: the javascript exception thrown when there are too many recursive calls.

The offending function is:

```
catmullRomHelper :
Float
-> (Vec2 Float -> Vec2 Float -> Vec2 Float -> List (Vec3 (Vec2 Float)))
-> List (Vec2 Float)
-> List (Vec3 (Vec2 Float))
catmullRomHelper alpha ending points =
case points of
p0 :: p1 :: p2 :: [] ->
ending p0 p1 p2
p0 :: p1 :: p2 :: p :: rest ->
catmullRomPoint alpha p0 p1 p2 p
:: catmullRomHelper alpha ending (p1 :: p2 :: p :: rest)
_ ->
[]
```

The standard way of fixing a `RangeError`

in elm is to use tail-recursion.

## Tail Recursion

Tail recursion is where a (branch of a function) calls itself as the outer (last-evaluated) expression.

This implementation of `fac`

is not tail-recursive: the outer expression of the else branch is a multiplication.

```
fac n =
if n == 0 then
1
else
n * fac (n - 1)
```

And this implementation is, because the outer expression is `facHelper`

, the function itself.

```
fac n = facHelper n 1
facHelper n accumulator =
if n == 0 then
accumulator
else
facHelper (n - 1) (n * accumulator)
```

Tail-recursion is more efficient because the compiler rewrites the recursion into a while-loop. Every function call allocates some memory on the stack to store variables and do other bookkeeping. Recursion that is too deep will eventually fill the stack completely and crash. A while-loop runs in constant space and thus does not suffer from this problem.

## Benchmarking

So the solution is to make the offending `catmullRomHelper`

function tail-recursive, right?

Well, there is a wrinkle when working with lists, as `catmullRomHelper`

is: the transformation to a tail-recursive function reverses the order in which the result is generated. This does not matter in the `fac`

example because multiplication is commutative (i.e. `x * y == y * x`

).

But for lists, `[x, y]`

is not the same as `[y, x]`

. The simple solution is to just reverse the output list, but I was afraid that this extra pass over the output would make the function much slower. Only one way to find out.

I benchmarked these versions.

- The old implementation - it’s obviously broken but gives a performance base line
- the tail-recursive implementation - sound, but does reversing the result give bad performance?
- a reversed implementation - draws the curve from end to start. a bit weird but does not need the reverse and thus should be faster

**Writing the benchmarks**

Benchmarking in elm is actually ridiculously easy thanks to the elm-benchmark package.

```
suite1 : Benchmark
suite1 =
let
boringPoints : List (Vec2 Float)
boringPoints =
List.range 0 1000
|> List.map (\i -> ( toFloat i, toFloat i ))
in
describe "Curve"
[ Benchmark.compare "catmullRom"
"old/broken"
(\_ -> catmullRom 0.5 boringPoints)
"tail-recursive"
(\_ -> catmullRomTailRecursive 0.5 boringPoints)
]
```

*Full code in this Ellie*

**Old vs. Tail-recursive**

Tail-recursive is actually faster! At least, in Chrome. I was quite surprised by this, but it makes sense. Even when they don’t overflow the stack, recursive calls need a lot of memory. This memory needs to be cleaned up and this garbage collection is relatively slow.

In Firefox, the tail-recursive variant is ~10% slower, but firefox is faster overall (more runs/second). So either FF can optimize recursive calls better, or its garbage collection is faster than Chrome’s.

So the tail-recursive variant is stack-safe and actually faster (in Chrome) or a little slower (Firefox). I still wanted to know how much the reverse of the list costs, so I ran another test

**Tail-recursive vs. Reversed**

In Firefox

The reversed implementation is ~18% faster for lists of length 100.

And ~18% faster for lists of length 1000.

The exact numbers shift a little between runs, but I wanted to know how the speed increase scales with input size, and it looks like it’s reasonably constant.

For Chrome, reversed and tail-recursive perform about equally and again Firefox is faster overall.

## Conclusion

The difference between tail-recursive and reversed is significant. But, I don’t think the speed increase justifies switching the default implementation to `reversed`

. The tail-recursive variant can run over 1500 times in one second with lists of length 1000. I think that should be good enough for pretty much any user of my package. When someone does run into performance problems in the future, I know that switching to the reversed implementation is an option.

Predicting performance in Elm can be hard. Functional languages often hide what’s really going on, and the JS engine adds a second layer of unpredictable performance. Most of the time, performance doesn’t really matter for elm apps, but when writing a package, a benchmark can be very helpful.

So, run some experiments on your packages - make sure you run them in multiple browsers - and make more informed decisions.

*The code from this post is available as an Ellie*