too simple

Swap globals for local variables in a function and see the Python code speed up. Swap language implementations and see the C code speed up.

Take a tiny tiny ten-line snippet of code, loop through 10 000 million times and usually brittle performance measurements may become micro-benchmark-broken.

source	process	mem	in-process
Intel C	21.98	19,436	21.980
C clang	43.98	19,772	43.974
Go	44.01	19,436	44.012
C gcc	44.02	19,772	44.019
C# .NET	44.09	29,772	44.047
Java	44.20	42,012	44.151
Java -Xint	5 min	37,792	323.444
Ruby yjit	11 min	19,620	688.632
PHP	11 min	19,768	716.994
Python 3 #3	29 min	19,316	1,744.378
Python 3	54 min	19,316	3,245.662
Matz's Ruby	1h 38 min	19,440	5,895.362

Something to think about, when you're tempted to draw broad conclusions from ten-line snippets of code.

The "process" measurements (the usual elapsed-seconds) were made without doing warm-up iterations and include startup costs.

The "in-process" measurements were made without doing warm-up iterations and exclude startup costs.