Folkertdev - Benchmarking curve generation

Benchmarking curve generation

Posted on Mar 17, 2018 · by Folkert de Vries

Tags: elm

A recent PR to my one-true-path-experiment repository showed that some of my curve generation functions are not stack safe. There is an obvious solution to this problem, but I was afraid it might make the functions much slower. We can use the elm-benchmark package to find out.

The problem

The functions in the Curve module take a list of points, and draw a smooth line through them. When given too many points, these functions could cause a RangeError: the javascript exception thrown when there are too many recursive calls.

The offending function is:

catmullRomHelper :
    Float
    -> (Vec2 Float -> Vec2 Float -> Vec2 Float -> List (Vec3 (Vec2 Float)))
    -> List (Vec2 Float)
    -> List (Vec3 (Vec2 Float))
catmullRomHelper alpha ending points =
    case points of
        p0 :: p1 :: p2 :: [] ->
            ending p0 p1 p2

        p0 :: p1 :: p2 :: p :: rest ->
            catmullRomPoint alpha p0 p1 p2 p 
                :: catmullRomHelper alpha ending (p1 :: p2 :: p :: rest)

        _ ->
            []

The standard way of fixing a RangeError in elm is to use tail-recursion.

Tail Recursion

Tail recursion is where a (branch of a function) calls itself as the outer (last-evaluated) expression.

This implementation of fac is not tail-recursive: the outer expression of the else branch is a multiplication.

fac n = 
    if n == 0 then 
        1
    else
        n * fac (n - 1)

And this implementation is, because the outer expression is facHelper, the function itself.

fac n = facHelper n 1


facHelper n accumulator = 
    if n == 0 then
        accumulator
    else
        facHelper (n - 1) (n * accumulator)

Tail-recursion is more efficient because the compiler rewrites the recursion into a while-loop. Every function call allocates some memory on the stack to store variables and do other bookkeeping. Recursion that is too deep will eventually fill the stack completely and crash. A while-loop runs in constant space and thus does not suffer from this problem.

Benchmarking

So the solution is to make the offending catmullRomHelper function tail-recursive, right?

Well, there is a wrinkle when working with lists, as catmullRomHelper is: the transformation to a tail-recursive function reverses the order in which the result is generated. This does not matter in the fac example because multiplication is commutative (i.e. x * y == y * x).

But for lists, [x, y] is not the same as [y, x]. The simple solution is to just reverse the output list, but I was afraid that this extra pass over the output would make the function much slower. Only one way to find out.

I benchmarked these versions.

The old implementation - it’s obviously broken but gives a performance base line
the tail-recursive implementation - sound, but does reversing the result give bad performance?
a reversed implementation - draws the curve from end to start. a bit weird but does not need the reverse and thus should be faster

Writing the benchmarks

Benchmarking in elm is actually ridiculously easy thanks to the elm-benchmark package.

suite1 : Benchmark
suite1 =
    let
        boringPoints : List (Vec2 Float)
        boringPoints =
            List.range 0 1000
                |> List.map (\i -> ( toFloat i, toFloat i ))
    in
        describe "Curve"
            [ Benchmark.compare "catmullRom"
                "old/broken"
                (\_ -> catmullRom 0.5 boringPoints)
                "tail-recursive"
                (\_ -> catmullRomTailRecursive 0.5 boringPoints)
            ]

Full code in this Ellie

Old vs. Tail-recursive

catmullRom

name

old/broken

tail-recursive

runs / second

1,287

1,357

% change

+5.48%

goodness of fit

99.9%

99.88%

Tail-recursive is actually faster! At least, in Chrome. I was quite surprised by this, but it makes sense. Even when they don’t overflow the stack, recursive calls need a lot of memory. This memory needs to be cleaned up and this garbage collection is relatively slow.

In Firefox, the tail-recursive variant is ~10% slower, but firefox is faster overall (more runs/second). So either FF can optimize recursive calls better, or its garbage collection is faster than Chrome’s.

So the tail-recursive variant is stack-safe and actually faster (in Chrome) or a little slower (Firefox). I still wanted to know how much the reverse of the list costs, so I ran another test

Tail-recursive vs. Reversed

In Firefox

catmullRom

name

tail-recursive

reversed

runs / second

15,315

18,206

% change

+18.88%

goodness of fit

99.22%

99.32%

The reversed implementation is ~18% faster for lists of length 100.

catmullRom

name

tail-recursive

reversed

runs / second

1,595

1,886

% change

+18.26%

goodness of fit

99.75%

And ~18% faster for lists of length 1000.

The exact numbers shift a little between runs, but I wanted to know how the speed increase scales with input size, and it looks like it’s reasonably constant.

For Chrome, reversed and tail-recursive perform about equally and again Firefox is faster overall.

Conclusion

The difference between tail-recursive and reversed is significant. But, I don’t think the speed increase justifies switching the default implementation to reversed. The tail-recursive variant can run over 1500 times in one second with lists of length 1000. I think that should be good enough for pretty much any user of my package. When someone does run into performance problems in the future, I know that switching to the reversed implementation is an option.

Predicting performance in Elm can be hard. Functional languages often hide what’s really going on, and the JS engine adds a second layer of unpredictable performance. Most of the time, performance doesn’t really matter for elm apps, but when writing a package, a benchmark can be very helpful.

So, run some experiments on your packages - make sure you run them in multiple browsers - and make more informed decisions.

The code from this post is available as an Ellie