Who wrote the programs
The programs have been crowd sourced, contributed to the project by an ever-changing self-selected group.
How programs could be measured better
There are some examples where the measurement script fails to make cpu time and memory use measurements: OCaml fannkuch-redux #3, OCaml fannkuch-redux #4, OCaml reverse-complement, OCaml reverse-complement #3.
BenchExec successfully makes measurements for those programs. For example, OCaml fannkuch-redux #3 —
starttime=2024-07-16T12:24:19.263290-07:00
returnvalue=0
walltime=9.037729900999693s
cputime=35.813672s
memory=31322112B
blkio-read=0B
blkio-write=0B
pressure-cpu-some=8.862072s
pressure-io-some=0s
pressure-memory-some=0s
Seems like krun or BenchExec would now be a better framework for this kind-of project.
How programs are measured
Each program is run and measured at the smallest input value, program output redirected to a file and compared to expected output. As long as the output matches expected output, the program is then run and measured at the next larger input value until measurements have been made at every input value.
If the program gives the expected output within an arbitrary cutoff time (120 seconds) the program is measured again (5 more times) with output redirected to /dev/null
.
If the program doesn't give the expected output within an arbitrary timeout (usually one hour) the program is forced to quit. If measurements at a smaller input value have been successful within an arbitrary cutoff time (120 seconds), the program is measured again (5 more times) at that smaller input value, with output redirected to /dev/null
.
The measurements shown on the website are either:
For sure, programs taking 4 and 5 hours were only measured once!
How programs are timed
Each program is run as a child-process of a Python script using Popen
:
secs - The time is taken before forking the child-process and after the child-process exits, using time.time()
This will include processing done by the measurement script and whatever else happens on the idle isolated test machine.
cpu secs - The script child-process usr+sys rusage
time is recorded using os.wait3
.
This may not include all processing done by multi process programs and when that seems likely busy time is shown instead.
busy - The psutil.cpu_percent
before forking the child-process and after the child-process exits, scaled by secs.
This will include processing done by the measurement script, and whatever else happens on the idle isolated test machine.
Note: These measurements include startup time. Compare with a dozen examples that exclude startup cost and are made after warm-up iterations.
How program memory use is measured
The script child-process maximum resident set size
is recorded using os.wait3
.
This is not the maximum resident set size of the process tree and may underestimate the memory use of multi process programs.
How source code size is measured
We start with the source-code markup you can see, remove comments, remove duplicate whitespace characters, and then apply minimum GZip compression. The measurement is the size in bytes of that GZip compressed source-code file.
Thanks to Brian Hurt for the idea of using size of compressed source code instead of lines of code.
median source code gzip (July 2018)
|
Perl
| 513
|
TypeScript
| 532
|
Lua
| 553
|
Ruby
| 568
|
Dart
| 610
|
Chapel
| 632
|
Racket
| 638
|
Python
| 672
|
PHP
| 736
|
Hack
| 745
|
JavaScript
| 779
|
Erlang
| 792
|
Go
| 829
|
Haskell
| 842
|
Pascal
| 846
|
F#
| 876
|
OCaml
| 914
|
Java
| 945
|
Smalltalk
| 950
|
Lisp
| 1004
|
Fortran
| 1019
|
C++
| 1044
|
C#
| 1059
|
C
| 1115
|
Swift
| 1164
|
Rust
| 1319
|
Ada
| 1819
|
(Note: There is some evidence that complexity metrics don't provide any more information than SLoC or LoC.)
How CPU load is measured
The psutil.cpu_percent
before forking the child-process and after the child-process exits.
This will include processing done by the measurement script, and whatever else happens on the idle isolated test machine.