The Computer Language
24.11 Benchmarks Game

Energy Efficiency of Programming Languages

The project continues to be a convenient source of tiny-tiny programs written in various programming languages. Independent researchers take the programs and do their independent research no-questions-asked.


Let's ask questions.

syntax and semantics

 

Programming languages define the syntax and semantics, but it is their implementations that primarily influence performance. (page 4)
Fig. 4. … This simple model captures the relationship implied in Pereira et al., namely that the choice of programming language has a direct impact on total energy consumption. (page 9)

Q: Is this intended to suggest Pereira et al. claims syntax and semantics has a direct impact on total energy consumption?

Pereira et al. actually says —

… the reused language libraries, the quality of the compiler, and its (aggressive) optimizations all greatly influence the performance of the resulting programs. Thus, a programming language may become faster, not by changing its programs, but by "just" improving its libraries and or its compiler (or virtual machine). (page 1)

Pereira et al. says that the impact is through differences in programming language implementation and application implementation. Not programming language as syntax and semantics. Rather programming language as the-whole-ball-of-wax.

conclusion validity

 

… we can see several examples where language A consumes more energy to run a benchmark than language B, while taking less time to do so.

Q: Are they examples where application A written for multicore CPUs consumes more energy to run, while taking less time to do so?

Examples where application A written for multicore CPUs takes more CPU-secs to run, while taking less time to do so? Like this —

  fannkuch-redux
  CPU-secs secs   joules secs
Lua 603.36 603.40   6732.44 634.87
Perl #2 1,766.92 442.57   10526.19 249.35

Pereira et al. do not seem to consider differences in CPU core usage when discussing their results. That missing factor may threaten the validity of their conclusions.

comparable programs

 

… the really interesting study assumes the Computer Language Benchmarks Game is a source of comparable programs, which is not remotely true if you know that site.

Q: Do they just assume?

Pereira et al. tell us —

While the different languages contain different implementations, all produced the same exact output and each are implemented to be the fastest and most efficient as possible. … allows us to compare the different programming languages in a quite just manner …

Long ago, some of these programs were part of the Go install. Presumably the Go programs were intended to be comparable to the C programs and more-likely the same.

assumptions (again)

 

… if your study claims that C++ uses … more energy, 56% more time, and … more memory than C, it’s time to reexamine your assumptions.

Q: Should summary information give most weight to middle values or outliers?

That "56%" statistic is from a posted and re-posted (often without context) screen-shot. The original "1.56" statistic is from the "Time" column "Table 4. Normalized global results for Energy, Time, and Memory" [pdf] Energy Efficiency across Programming Languages SLE’17 conference paper, page 263.

Pereira et al. tell us —

The obtained solutions were the best performing ones at the time we set up the study. … we expect that more advanced and more efficient solutions will substitute the ones we obtained as time goes on, and even the languages’ compilers might evolve.

For a single outlier (regex-redux) there's a 12x difference between the measured times of the selected (pcre) C and (boost/regex) C++ programs.

As Pereira et al. assumed, faster (pcre2) C and (pcre2) C++ programs were contributed to the benchmarks game in the following years. As Pereira et al. assumed, the updated programs were included in a later study and the outlier disappeared.

  ratio of averages   average of ratios
  Mean GeoMean Median   Mean GeoMean Median
C 1.00 1.00 1.00  :  1.00 1.00 1.00
C++ 1.56 1.34 1.03  :  2.37 1.34 1.00

Had the original summary presented medians rather than means, the differences would have been unremarkable.