In my opinion the article has already enough numbers, tables, comments etc. Here are the appendices.
Below are two trampoline implementations, tailcatch
and tailtramp
, and the results. As expected, they are much slower than tailopt
. Note that these trampoline implementations both support mutual recursion, which tailopt
currently does not.
// Copyright (c) 2010 Guillaume Lathoud
// MIT License
//
// tailcatch.js
//
// Implement full tail-call optimization in Javascript through
// a throw/catch trampoline.
//
// Heavily inspired from Sjoed Visscher's work:
// http://w3future.com/weblog/2006/02/#tailCallEliminationInJavascript
//
// Difference: I brought some speed improvements:
// - removed the caller search for self.
// - reduced the length of the throw/catch chain from n to 1.
/*jslint evil:false */
function tailcatch(g) {
return (function (n) {
return function()
{
if (n>5) { // Jump off a small Empire State Building
throw {tailCallArgs: arguments, tailCallThis: this};
}
var args = arguments;
var me = this;
var ret;
while (true)
{
if (n<1) {
try
{
n++;
ret = g.apply(me, args);
n--;
return ret;
}
catch(e) // Hit 33rd Street and bounce again
{
if (!e.tailCallArgs) {
throw e;
}
n=0;
args = e.tailCallArgs;
me = e.tailCallThis;
}
} else {
n++;
ret = g.apply(me, args);
n--;
return ret;
}
}
};
})(0);
}
And here is tailtramp
:
// Copyright (c) 2010 Guillaume Lathoud
// MIT License
//
// tailtramp.js
//
// Implement full tail-call optimization in Javascript through a
// trampoline.
/*jslint evil:false */
function tailtramp(g) {
function TailCall(arr) {
this.arr = arr;
}
return (function (n) {
return function()
{
var arr = [g, this, arguments], ret;
if (n>5) { // Jump off a small Empire State Building
return new TailCall(arr);
}
while (true)
{
n++;
ret = arr[0].apply(arr[1], arr[2]);
n--;
if ((n>0) || (!(ret instanceof TailCall))) {
return ret;
}
// Hit 33rd Street and bounce again
arr = ret.arr;
}
};
})(0);
}
System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)
V8 version: 2.1.0
Test: 10 runs
d8 -e "load('tailopt-js-rerun.js');rerun(10);"
...
fact_rec_________________ iter_sec: 2.680e-7 (100: base)
fact_rectailcatch________ iter_sec: 6.104e-6 (2277)
fact_rectailtramp________ iter_sec: 2.146e-6 (801)
fact_tail________________ iter_sec: 4.190e-7 (156)
fact_tail2_______________ iter_sec: 4.150e-7 (155)
fact_tailcatch___________ iter_sec: 6.691e-6 (2496)
fact_tailtramp___________ iter_sec: 3.204e-6 (1195)
fact_tailopt_____________ iter_sec: 1.897e-7 (71)
fact_tailopt2____________ iter_sec: 1.878e-7 (70)
fact_iter________________ iter_sec: 1.339e-7 (50)
gcd_rec__________________ iter_sec: 2.633e-6 (100: base)
gcd_rectailcatch_________ iter_sec: 5.829e-5 (2214)
gcd_rectailtramp_________ iter_sec: 2.468e-5 (937)
gcd_tailopt______________ iter_sec: 1.092e-6 (41)
gcd_iter_________________ iter_sec: 1.220e-6 (46)
treeforeach_rec__________ iter_sec: 2.255e-4 (100: base)
treeforeach_tail_________ iter_sec: 2.921e-4 (130)
treeforeach_tailcatch____ iter_sec: 4.555e-4 (202)
treeforeach_tailtramp____ iter_sec: 3.590e-4 (159)
treeforeach_tailopt______ iter_sec: 2.843e-4 (126)
To verify that the performance measurements are somewhat meaningful, I ran each test 500 times and computed the standard deviation. Thanks to Sam Mason for the suggestion.
Reminder: the result iter_sec
is the duration of a single call, in seconds: the lower, the better.
System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)
Java version: 1.5.0, gij (GNU libgcj) version 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
Rhino version: 1.6 release 7 2008 10 14
Test: 500 runs
rhino -e "load('tailopt-js-rerun.js');rerun(510,10);"
...
rerun() started, subset: null , nrunmin: 510 , nruntoforget: 10 , global.runs: 509
..
fact_rec_____________ iter_sec: 3.984e-4 (100: base) [mean:4.037e-4, std:9.700e-6, ratio:2.403e-2]
fact_tail____________ iter_sec: 5.986e-4 (150) [mean:5.803e-4, std:1.330e-5, ratio:2.292e-2]
fact_tailopt_________ iter_sec: 4.862e-4 (122) [mean:4.919e-4, std:1.298e-5, ratio:2.638e-2]
fact_tail2___________ iter_sec: 5.714e-4 (143) [mean:5.783e-4, std:1.364e-5, ratio:2.358e-2]
fact_tailopt2________ iter_sec: 4.745e-4 (119) [mean:4.749e-4, std:1.066e-5, ratio:2.245e-2]
fact_iter____________ iter_sec: 1.167e-4 (29) [mean:1.188e-4, std:3.392e-6, ratio:2.857e-2]
gcd_rec______________ iter_sec: 3.000e-3 (100: base) [mean:3.039e-3, std:9.211e-5, ratio:3.031e-2]
gcd_tailopt__________ iter_sec: 2.964e-3 (99) [mean:2.977e-3, std:4.393e-5, ratio:1.476e-2]
gcd_iter_____________ iter_sec: 8.356e-4 (28) [mean:8.045e-4, std:1.424e-5, ratio:1.770e-2]
treeforeach_rec______ iter_sec: 5.363e-2 (100: base) [mean:5.414e-2, std:2.201e-3, ratio:4.065e-2]
treeforeach_tail_____ iter_sec: 5.474e-2 (102) [mean:5.509e-2, std:6.106e-4, ratio:1.108e-2]
treeforeach_tailopt__ iter_sec: 6.244e-2 (116) [mean:6.269e-2, std:1.004e-3, ratio:1.602e-2]
The ratio std/mean is about 2-3% in most cases. Only for treeforeach_rec
, the ratio is about 4%. Conclusion: the duration differences are meaningful for fact
and gcd
, and the three variants of treeforeach
perform similarly in average.
System: Ubuntu 8.04 on a Thinkpad T43 (Intel Pentium M, 2 GHz, 1.5 GB RAM)
V8 version: 2.1.0
Test: 500 runs
d8 -e "load('tailopt-js-rerun.js');rerun(510,10);"
...
rerun() started, subset: null , nrunmin: 510 , nruntoforget: 10 , global.runs: 509
...
fact_rec_____________ iter_sec: 2.657e-7 (100: base) [mean:2.636e-7, std:1.087e-8, ratio:4.126e-2]
fact_tail____________ iter_sec: 4.011e-7 (151) [mean:4.040e-7, std:5.537e-9, ratio:1.371e-2]
fact_tailopt_________ iter_sec: 1.855e-7 (70) [mean:1.866e-7, std:4.243e-9, ratio:2.274e-2]
fact_tail2___________ iter_sec: 4.078e-7 (153) [mean:4.103e-7, std:6.752e-9, ratio:1.645e-2]
fact_tailopt2________ iter_sec: 1.857e-7 (70) [mean:1.866e-7, std:3.168e-9, ratio:1.698e-2]
fact_iter____________ iter_sec: 1.311e-7 (49) [mean:1.322e-7, std:3.324e-9, ratio:2.514e-2]
gcd_rec______________ iter_sec: 2.659e-6 (100: base) [mean:2.651e-6, std:4.979e-8, ratio:1.878e-2]
gcd_tailopt__________ iter_sec: 1.061e-6 (40) [mean:1.068e-6, std:2.215e-8, ratio:2.073e-2]
gcd_iter_____________ iter_sec: 1.197e-6 (45) [mean:1.204e-6, std:2.985e-8, ratio:2.479e-2]
treeforeach_rec______ iter_sec: 2.224e-4 (100: base) [mean:2.222e-4, std:4.170e-6, ratio:1.877e-2]
treeforeach_tail_____ iter_sec: 2.855e-4 (128) [mean:2.860e-4, std:5.037e-6, ratio:1.761e-2]
treeforeach_tailopt__ iter_sec: 2.754e-4 (124) [mean:2.778e-4, std:2.647e-6, ratio:9.528e-3]
Except for fact_rec
, the standard deviation is only about 2% of the mean, so comparisons between means seem pretty safe! Indeed, for most comparisons the differences include much more than 3 times the standard deviation of each compared value (with a Gaussian assumption that means, quickly said, more than 99% confidence). Only gcd_tailopt
and gcd_iter
may be deemed similar (which is quite good!), as well as treeforeach_tail
and treeforeach_tailopt
(not great).
These are the detailed results for the String Replication section on 500 runs, in the form of:
mean
duration of a single iteration (in seconds) over the 500 runs - the lower, the better. In brackets, relative values, where base 100 is the recursive result,std
standard deviation of the single iteration duration over the 500 runs,ratio = std / mean
, which indicates the stability of the result (high for Firefox 3.6, probably because of some chronic garbage collection for the tailopt
and iter
versions).
Firefox 3.6, Thinkpad Linux Ubuntu, 510 runs total (the 10 first runs ignored):
string_repli_rec______________ mean: 5.972e-5 (100.0), std:5.302e-6, ratio:8.878e-2
string_repli_tail_____________ mean: 1.582e-5 ( 26.5), std:1.212e-6, ratio:7.660e-2
string_repli_tailopt__________ mean: 1.105e-5 ( 18.5), std:1.261e-6, ratio:1.141e-1
string_repli_iter_____________ mean: 9.818e-6 ( 16.4), std:1.268e-6, ratio:1.291e-1
Google Chrome 4.0.249.43, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
string_repli_rec______________ mean: 5.312e-5 (100.0), std:2.287e-6, ratio:4.305e-2
string_repli_tail_____________ mean: 4.075e-6 ( 7.7), std:2.415e-7, ratio:5.926e-2
string_repli_tailopt__________ mean: 3.479e-6 ( 6.5), std:2.178e-7, ratio:6.261e-2
string_repli_iter_____________ mean: 3.361e-6 ( 6.3), std:2.337e-7, ratio:6.955e-2
Google V8, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
$ d8 -e 'load("test-string_repli.js");'
string_repli_rec______________ mean: 3.324e-5 (100.0), std:4.964e-7, ratio:1.493e-2
string_repli_tail_____________ mean: 2.381e-6 ( 7.2), std:1.742e-7, ratio:7.314e-2
string_repli_tailopt__________ mean: 2.231e-6 ( 6.7), std:1.242e-8, ratio:5.565e-3
string_repli_iter_____________ mean: 2.166e-6 ( 6.5), std:4.501e-8, ratio:2.078e-2
Rhino, Thinkpad Linux Ubuntu, 510 runs total (the first 10 runs ignored):
$ rhino -e 'load("test-string_repli.js");'
string_repli_rec______________ mean: 1.337e-2 (100.0), std:2.329e-4, ratio:1.742e-2
string_repli_tail_____________ mean: 1.243e-3 ( 9.3), std:7.140e-5, ratio:5.746e-2
string_repli_tailopt__________ mean: 1.100e-3 ( 8.2), std:6.673e-5, ratio:6.068e-2
string_repli_iter_____________ mean: 8.072e-4 ( 6.0), std:1.941e-5, ratio:2.405e-2
Produced on 2014-08-13 by tailopt-js-appendices.scm - by Guillaume Lathoud (glathoud _at_ yahoo _dot_ fr)