Appendices of 'On-the-fly tail call optimization in Javascript'

In my opinion the article has already enough numbers, tables, comments etc. Here are the appendices.

A. Comparison with trampoline implementations

Below are two trampoline implementations, tailcatch and tailtramp, and the results. As expected, they are much slower than tailopt. Note that these trampoline implementations both support mutual recursion, which tailopt currently does not.

// Copyright (c) 2010 Guillaume Lathoud
// MIT License
//
// tailcatch.js
// 
// Implement full tail-call optimization in Javascript through
// a throw/catch trampoline.
//
// Heavily inspired from Sjoed Visscher's work:
// http://w3future.com/weblog/2006/02/#tailCallEliminationInJavascript
// 
// Difference: I brought some speed improvements:
// - removed the caller search for self.
// - reduced the length of the throw/catch chain from n to 1.

/*jslint evil:false */

function tailcatch(g) {
    return (function (n) {

        return function()
        {
            if (n>5) {  // Jump off a small Empire State Building
                throw {tailCallArgs: arguments, tailCallThis: this};
            }
            
            var args = arguments;
            var me   = this;
            var ret;
            while (true)
            {
                if (n<1) {
                    try
                    {
                        n++;
                        ret = g.apply(me, args);
                        n--;
                        return ret;
                    }
                    catch(e) // Hit 33rd Street and bounce again
                    {
                        if (!e.tailCallArgs) {
                            throw e;
                        }
                        n=0;
                        args = e.tailCallArgs;
                        me   = e.tailCallThis;
                    }
                } else {
                    n++;
                    ret = g.apply(me, args);
                    n--;
                    return ret;
                    
                }
            }
        };

    })(0);
}

And here is tailtramp:

// Copyright (c) 2010 Guillaume Lathoud
// MIT License
//
// tailtramp.js
// 
// Implement full tail-call optimization in Javascript through a
// trampoline.

/*jslint evil:false */

function tailtramp(g) {
    function TailCall(arr) {
        this.arr = arr;
    }
    return (function (n) {

        return function()
        {
            var arr = [g, this, arguments], ret;

            if (n>5) {  // Jump off a small Empire State Building
                return new TailCall(arr);
            }
            
            while (true)
            {
                n++;
                ret = arr[0].apply(arr[1], arr[2]);
                n--;
                
                if ((n>0) || (!(ret instanceof TailCall))) {
                    return ret;
                }
                
                // Hit 33rd Street and bounce again
                arr = ret.arr;
            }
        };

    })(0);
}

System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)

V8 version: 2.1.0

Test: 10 runs

d8 -e "load('tailopt-js-rerun.js');rerun(10);"
...
fact_rec_________________ iter_sec: 2.680e-7 (100: base)
fact_rectailcatch________ iter_sec: 6.104e-6 (2277)     
fact_rectailtramp________ iter_sec: 2.146e-6 (801)      
fact_tail________________ iter_sec: 4.190e-7 (156)      
fact_tail2_______________ iter_sec: 4.150e-7 (155)      
fact_tailcatch___________ iter_sec: 6.691e-6 (2496)     
fact_tailtramp___________ iter_sec: 3.204e-6 (1195)     
fact_tailopt_____________ iter_sec: 1.897e-7 (71)       
fact_tailopt2____________ iter_sec: 1.878e-7 (70)       
fact_iter________________ iter_sec: 1.339e-7 (50)       
gcd_rec__________________ iter_sec: 2.633e-6 (100: base)
gcd_rectailcatch_________ iter_sec: 5.829e-5 (2214)     
gcd_rectailtramp_________ iter_sec: 2.468e-5 (937)      
gcd_tailopt______________ iter_sec: 1.092e-6 (41)       
gcd_iter_________________ iter_sec: 1.220e-6 (46)       
treeforeach_rec__________ iter_sec: 2.255e-4 (100: base)
treeforeach_tail_________ iter_sec: 2.921e-4 (130)      
treeforeach_tailcatch____ iter_sec: 4.555e-4 (202)      
treeforeach_tailtramp____ iter_sec: 3.590e-4 (159)      
treeforeach_tailopt______ iter_sec: 2.843e-4 (126)

B. Standard deviation

To verify that the performance measurements are somewhat meaningful, I ran each test 500 times and computed the standard deviation. Thanks to Sam Mason for the suggestion.

Reminder: the result iter_sec is the duration of a single call, in seconds: the lower, the better.

B1. Rhino

System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)

Java version: 1.5.0, gij (GNU libgcj) version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)

Rhino version: 1.6 release 7 2008 10 14

Test: 500 runs

rhino  -e "load('tailopt-js-rerun.js');rerun(510,10);"
...
rerun() started, subset: null , nrunmin: 510 , nruntoforget: 10 , global.runs: 509
..
fact_rec_____________ iter_sec: 3.984e-4 (100: base) [mean:4.037e-4, std:9.700e-6, ratio:2.403e-2]
fact_tail____________ iter_sec: 5.986e-4 (150)       [mean:5.803e-4, std:1.330e-5, ratio:2.292e-2]
fact_tailopt_________ iter_sec: 4.862e-4 (122)       [mean:4.919e-4, std:1.298e-5, ratio:2.638e-2]
fact_tail2___________ iter_sec: 5.714e-4 (143)       [mean:5.783e-4, std:1.364e-5, ratio:2.358e-2]
fact_tailopt2________ iter_sec: 4.745e-4 (119)       [mean:4.749e-4, std:1.066e-5, ratio:2.245e-2]
fact_iter____________ iter_sec: 1.167e-4 (29)        [mean:1.188e-4, std:3.392e-6, ratio:2.857e-2]
gcd_rec______________ iter_sec: 3.000e-3 (100: base) [mean:3.039e-3, std:9.211e-5, ratio:3.031e-2]
gcd_tailopt__________ iter_sec: 2.964e-3 (99)        [mean:2.977e-3, std:4.393e-5, ratio:1.476e-2]
gcd_iter_____________ iter_sec: 8.356e-4 (28)        [mean:8.045e-4, std:1.424e-5, ratio:1.770e-2]
treeforeach_rec______ iter_sec: 5.363e-2 (100: base) [mean:5.414e-2, std:2.201e-3, ratio:4.065e-2]
treeforeach_tail_____ iter_sec: 5.474e-2 (102)       [mean:5.509e-2, std:6.106e-4, ratio:1.108e-2]
treeforeach_tailopt__ iter_sec: 6.244e-2 (116)       [mean:6.269e-2, std:1.004e-3, ratio:1.602e-2]

The ratio std/mean is about 2-3% in most cases. Only for treeforeach_rec, the ratio is about 4%. Conclusion: the duration differences are meaningful for fact and gcd, and the three variants of treeforeach perform similarly in average.

B2. Google V8 Engine

System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)

V8 version: 2.1.0

Test: 500 runs

d8 -e "load('tailopt-js-rerun.js');rerun(510,10);"
...
rerun() started, subset: null , nrunmin: 510 , nruntoforget: 10 , global.runs: 509
...
fact_rec_____________ iter_sec: 2.657e-7 (100: base) [mean:2.636e-7, std:1.087e-8, ratio:4.126e-2]
fact_tail____________ iter_sec: 4.011e-7 (151)       [mean:4.040e-7, std:5.537e-9, ratio:1.371e-2]
fact_tailopt_________ iter_sec: 1.855e-7 (70)        [mean:1.866e-7, std:4.243e-9, ratio:2.274e-2]
fact_tail2___________ iter_sec: 4.078e-7 (153)       [mean:4.103e-7, std:6.752e-9, ratio:1.645e-2]
fact_tailopt2________ iter_sec: 1.857e-7 (70)        [mean:1.866e-7, std:3.168e-9, ratio:1.698e-2]
fact_iter____________ iter_sec: 1.311e-7 (49)        [mean:1.322e-7, std:3.324e-9, ratio:2.514e-2]
gcd_rec______________ iter_sec: 2.659e-6 (100: base) [mean:2.651e-6, std:4.979e-8, ratio:1.878e-2]
gcd_tailopt__________ iter_sec: 1.061e-6 (40)        [mean:1.068e-6, std:2.215e-8, ratio:2.073e-2]
gcd_iter_____________ iter_sec: 1.197e-6 (45)        [mean:1.204e-6, std:2.985e-8, ratio:2.479e-2]
treeforeach_rec______ iter_sec: 2.224e-4 (100: base) [mean:2.222e-4, std:4.170e-6, ratio:1.877e-2]
treeforeach_tail_____ iter_sec: 2.855e-4 (128)       [mean:2.860e-4, std:5.037e-6, ratio:1.761e-2]
treeforeach_tailopt__ iter_sec: 2.754e-4 (124)       [mean:2.778e-4, std:2.647e-6, ratio:9.528e-3]

Except for fact_rec, the standard deviation is only about 2% of the mean, so comparisons between means seem pretty safe! Indeed, for most comparisons the differences include much more than 3 times the standard deviation of each compared value (with a Gaussian assumption that means, quickly said, more than 99% confidence). Only gcd_tailopt and gcd_iter may be deemed similar (which is quite good!), as well as treeforeach_tail and treeforeach_tailopt (not great).

B3. String Replication

These are the detailed results for the String Replication section on 500 runs, in the form of:

mean duration of a single iteration (in seconds) over the 500 runs - the lower, the better. In brackets, relative values, where base 100 is the recursive result,
std standard deviation of the single iteration duration over the 500 runs,
ratio = std / mean, which indicates the stability of the result (high for Firefox 3.6, probably because of some chronic garbage collection for the tailopt and iter versions).


Firefox 3.6, Thinkpad Linux Ubuntu, 510 runs total (the 10 first runs ignored):
string_repli_rec______________ mean: 5.972e-5 (100.0), std:5.302e-6, ratio:8.878e-2
string_repli_tail_____________ mean: 1.582e-5 ( 26.5), std:1.212e-6, ratio:7.660e-2
string_repli_tailopt__________ mean: 1.105e-5 ( 18.5), std:1.261e-6, ratio:1.141e-1
string_repli_iter_____________ mean: 9.818e-6 ( 16.4), std:1.268e-6, ratio:1.291e-1

Google Chrome 4.0.249.43, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
string_repli_rec______________ mean: 5.312e-5 (100.0), std:2.287e-6, ratio:4.305e-2
string_repli_tail_____________ mean: 4.075e-6 (  7.7), std:2.415e-7, ratio:5.926e-2
string_repli_tailopt__________ mean: 3.479e-6 (  6.5), std:2.178e-7, ratio:6.261e-2
string_repli_iter_____________ mean: 3.361e-6 (  6.3), std:2.337e-7, ratio:6.955e-2

Google V8, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
$ d8 -e 'load("test-string_repli.js");'
string_repli_rec______________ mean: 3.324e-5 (100.0), std:4.964e-7, ratio:1.493e-2
string_repli_tail_____________ mean: 2.381e-6 (  7.2), std:1.742e-7, ratio:7.314e-2
string_repli_tailopt__________ mean: 2.231e-6 (  6.7), std:1.242e-8, ratio:5.565e-3
string_repli_iter_____________ mean: 2.166e-6 (  6.5), std:4.501e-8, ratio:2.078e-2

Rhino, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
$ rhino -e 'load("test-string_repli.js");'
string_repli_rec______________ mean: 1.337e-2 (100.0), std:2.329e-4, ratio:1.742e-2
string_repli_tail_____________ mean: 1.243e-3 (  9.3), std:7.140e-5, ratio:5.746e-2
string_repli_tailopt__________ mean: 1.100e-3 (  8.2), std:6.673e-5, ratio:6.068e-2
string_repli_iter_____________ mean: 8.072e-4 (  6.0), std:1.941e-5, ratio:2.405e-2