The Computer Language
Benchmarks Game

A History

Once upon a time…

Doug Bagley had a burst of crazy curiosity: When I started this project, my goal was to compare all the major scripting languages. Then I started adding in some compiled languages for comparison…

That project was abandoned in 2002, restarted in 2004 by Brent Fulgham, continued from 2008 by Isaac Gouy, and interrupted in 2018 by the Debian Alioth hosting service EOL. Everything has changed; several times.

April 2017 through March 2018, Google Analytics saw 477,419 users.

Enough that many web search results show web spam and phishing - be careful!

Dismiss, Distract

You will probably come across stuff that people have said about the benchmarks game. Did they show that what they claim is true? If they provided a URL, does the content actually confirm what they said?

Perhaps they were genuinely mistaken. Perhaps the content changed.

I heard that one is pretty bad…

Perhaps they heard wrong.

… neither scientific nor indicative of expected performance…

… That having been said … on the current benchmarks, Rust already outperforms C++, which is a pretty big deal…

No, we have to choose —

- either we accept the "neither scientific nor indicative" dismissal and don't even consider "a pretty big deal";

- or we reject the "neither scientific nor indicative" dismissal and consider "a pretty big deal".

(What "scientific" is supposed to mean in this context is unexplained; it seems just to be used as a dismissal).

… nor indicative of expected performance on real-world idiomatic code.

We've certainly not attempted to prove that these measurements, of a few tiny programs, are somehow representative of the performance of any real-world applications — not known.

(Why we should only care about "idiomatic code" is unexplained; it seems just to be used as a dismissal).

There's a reason they call it a game

Not really. The name "benchmarks game" signifies nothing more than the fact that programmers contribute programs that compete (but try to remain comparable) for fun not money.

…to dispute a decision you basically need to pray the maintainer reopens it for some reason.

Never true. Followup comments could always be made in the project ticket tracker. There was a public discussion forum, etc. etc.

Someone's brilliant hack was rejected. Someone saw the opportunity to push traffic to their personal blog.

The guy that runs it arbitrarily decided to stop tracking some languages he didn't care about…

The guy that runs it arbitrarily decided to start tracking those languages he supposedly didn't care about.

Measurements are no longer made for these —

ATS, FreeBASIC, CINT, Cyclone, Intel C, Tiny C, Mono C#, Mono F#, Intel C++, CAL, Clean, Clojure, Digital Mars D, GNU D, Gwydion Dylan, SmartEiffel, bigForth, GNU GForth, G95 Fortran, Groovy, Hack, Icon, Io, Java -client, Java -Xint, gcj Java, Rhino JavaScript, SpiderMonkey, TraceMonkey, Lisaac, LuaJIT, Mercury, Mozart/Oz, Nice, Oberon-2, Objective-C, Pike, SWI Prolog, YAP Prolog, IronPython, PyPy, Rebol, Rexx, Scala, Bigloo Scheme, Chicken Scheme, Ikarus Scheme, GNU Smalltalk, Squeak Smalltalk, Mlton SML, SML/NJ, Tcl, Zonnon.

I know it will take more time than I choose. Been there; done that.

Be curious

Wtf kind of benchmark counts the jvm startup time?

How much difference does it make for these tiny programs?

Compare the times against [pdf] Java Microbenchmark Harness reports:

secs JMH Average
n-body 21.54 23.367 ± 0.062
spectral-norm 4.29 4.061 ± 0.054
meteor-contest 0.24 0.112 ± 0.001

otoh JVM start-up, JIT, OSR… are quickly effective and typical cold / warmed-up comparison at most of these workloads will show miniscule difference.

otoh For measurements of a few tenths of a second, a few tenths of a second is a huge difference.

(In stark contrast to the traditional expectation of warmup, some benchmarks exhibit slowdown, where the performance of in-process iterations drops over time.)

Apples and Oranges

We compare programs against each other, as though the different programming languages had been designed for the exact same purpose — that just isn't so.

The problems introduced by multicore processors, networked systems, massive computation clusters, and the web programming model were being worked around rather than addressed head-on. Moreover, the scale has changed: today's server programs comprise tens of millions of lines of code, are worked on by hundreds or even thousands of programmers, and are updated literally every day. To make matters worse, build times, even on large compilation clusters, have stretched to many minutes, even hours. Go was designed and developed to make working in this environment more productive.

Most (all?) large systems developed using Erlang make heavy use of C for low-level code, leaving Erlang to manage the parts which tend to be complex in other languages, like controlling systems spread across several machines and implementing complex protocol logic.

Lua is a tiny and simple language, partly because it does not try to do what C is already good for, such as sheer performance, low-level operations, or interface with third-party software. Lua relies on C for those tasks.

a good starting point

How does Java compare in terms of speed to C or C++ or C# or Python? The answer depends greatly on the type of application you're running. No benchmark is perfect, but The Computer Language Benchmarks Game is a good starting point.