The Computer Language
Benchmarks Game

Why do people just make up stuff sometimes?

Trust, and verify

Sometimes people just make up stuff because the facts don't fit the story they want to tell you.

You will probably come across stuff that people say about the benchmarks game. Do they show that what they claim is true? If they provide a URL, does the content actually confirm what they say?

Perhaps they are genuinely mistaken. Perhaps the content changed.

The guy that runs it arbitrarily decided to stop tracking some languages he didn't care about…

… and stop tracking other languages I did care about (and stop tracking other languages I contributed programs for).

Because I know it will take more time than I choose.
Been there; done that.

Measurements are no longer made for these language implementations —
ATS, FreeBASIC, CINT, Cyclone, Intel C, Tiny C, Mono C#, Mono F#, Intel C++, CAL, Clean, Clojure, Digital Mars D, GNU D, Gwydion Dylan, SmartEiffel, bigForth, GNU GForth, G95 Fortran, Groovy, Icon, Io, Java -client, Java -Xint, gcj Java, Rhino JavaScript, SpiderMonkey, TraceMonkey, Lisaac, LuaJIT, Mercury, Mozart/Oz, Nice, Oberon-2, Objective-C, Pike, SWI Prolog, YAP Prolog, IronPython, PyPy, Rebol, Rexx, Scala, Bigloo Scheme, Chicken Scheme, Ikarus Scheme, GNU Smalltalk, Squeak Smalltalk, Mlton SML, SML/NJ, Tcl, Zonnon.

If you're interested in something not shown on the benchmarks game website then please take the program source code and the measurement scripts and publish your own measurements.

…a lot more publicity than some 3rd party mini benchmark site…

Yes and when you make measurements for some proggit popular programming language implementations, and publish them and promote them, they too will attract attention and be the basis of another successful website.

So, start — make measurements for yourself, for the language implementations that interest you. Maybe Crystal or Nim or [pdf] Julia; maybe something else.

Then, follow through — update the language implementations, update the programs, update the measurements, update the presentation: again, and again, and again.

Wtf kind of benchmark counts the jvm startup time?

Compare the times against [pdf] Java Microbenchmark Harness reports:

secs JMH Average
n-body 21.54 23.367 ± 0.062
spectral-norm 4.29 4.061 ± 0.054
meteor-contest 0.24 0.112 ± 0.001

JVM start-up, JIT, OSR… are quickly effective and typical cold / warmed-up comparison at most of these workloads will show miniscule difference. Note the exception.

(In stark contrast to the traditional expectation of warmup, some benchmarks exhibit slowdown, where the performance of in-process iterations drops over time.)

…to dispute a decision you basically need to pray the maintainer reopens it for some reason.

Never true. Followup comments could always be made in the project ticket tracker. There was a public discussion forum, etc. etc.

Someone's brilliant hack was rejected. Someone saw the opportunity to push traffic to their personal blog.

There's a reason they call it a game

Not really. The name "benchmarks game" signifies nothing more than the fact that programmers contribute programs that compete (but try to remain comparable) for fun not money.

Once upon a time…

Doug Bagley had a burst of crazy curiosity: When I started this project, my goal was to compare all the major scripting languages. Then I started adding in some compiled languages for comparison…

That project was abandoned in 2002, restarted in 2004 by Brent Fulgham, continued from 2008 by Isaac Gouy, and interrupted in 2018 by the Debian Alioth hosting service EOL. Everything has changed; several times.

April 2017 through March 2018, Google Analytics saw 477,419 users.

Enough that many web search results show web spam - be careful!