Language Comparison Tool

Languages ( Compiled VM Interpreted ): Standard deviation

Energy Efficiency of Programming Languages

The concept of programming languages predates electrical computational machines. Paper tapes and punch cards were first used in mechanical weaving machines such as the Jacquard Machine to describe instructions for patterns during the Industrial Revolution (1). This principle was, therefore, also applied to reading in code on the first computers, as paper holes lend themselves well to the binary encoding of information. The first abstract programming language "Plankalkül" was designed on paper by Konrad Zuse in the 1940s, who independently created the first general purpose computers during World War II, and was on around a one-bit primitive type (on / off) with support for several modern elements such as loops, conditional statements, subroutines, exceptions and objects (2). The first functional language to feature a compiler was Fortran, which is still used today for performance comparisons.

Language execution models

While all programming languages are used to describe instructions for a computer to execute, there are two important properties, which allow grouping them. The first one distinguished different execution models, which define when and how the programmers' source code is translated into machine code:

Compiled languages come with a separate program, the compiler. It is used to transform the source code into more abstract representations first, and then to generate hardware specific machine code. Compilers have the advantage that they can perform static optimisations to the code and use platform specific improvements. Furthermore, the amount of time needed for them to run does not have any impact on the consumers of software, as the only the compiled executables are distributed. However, this also implies that the program needs to be compiled for each platform individually.

Virtual machines solve this problem by introducing an additional layer of abstraction. The source code is compiled into an intermediate form, the bytecode, which is a language-specific format that is similar to machine code but resides on a higher level. The consumers of the software have to install an additional virtual machine, which converts the bytecode into the platform-specific machine code at runtime, which creates a performance overhead for the user. If the format is standardised, a program only needs to be written once before being able to run on any platform with the virtual machine.

Interpreted languages are an extension of virtual machines. Instead of using an intermediate bytecode format, they often directly translate the source code into machine instructions at runtime. This approach is the most dynamic and allows for example for code modification during the execution of the code. However, it is also the most resource consuming in most use cases, as all translation work has to be processed at the same time as the actual program. Furthermore, significantly fewer static optimisations can be made, because the runtime often does not know about the entire code yet.

Language paradigms

The second characteristic are the paradigms the language makes use of, which fundamentally define what effect code can have one the state of the computer as well as how the control flow is structured. Four key paradigms are:

Functional programming does not allow procedures to have any side effects and is very similar to the notion of mathematical functions. During their execution, procedures are not allowed to modify any variables but only to read existing and create new ones. Functional programming is often more formal and precise, as the effect of code is easier to reason about. However, critical parts of programs like user interaction or file input/output are far more difficult to model.

Imperative programming languages are the exact opposite of functional ones, as their main feature are the side effects code can have. They are mostly more efficient because information in memory can simply be modified instead of having to create altered copies of everything. Imperative languages also can give direct access to low-level features like the memory, as computers themselves operate imperatively. Still, the possibilities of data modifications from outside provide extra challenges in the verification of their code and make them less safe.

Object oriented programming tries to model problems similar to reality by grouping code together based on the data they modify. It thereby provides an intuitive way to model real problems by encapsulating procedures and attributes into objects, which can then interact with one another in pre-defined manners. The object-oriented approach can greatly reduce redundancy in the code. Furthermore, it can be used to define permissions on which part of a program is allowed to alter a certain part of its state. Thereby it increased the safety for example of imperative code, with which it is often combined.

Scripting languages are often used to execute a sequence of simple instructions dynamically. They are mostly used to bind together other programs or procedures into a more complex application. Therefore, scripts are often written in a domain-specific or embedded language that is interpreted on a line-by-line basis. As they originate from control languages used to automate common tasks for example in programming, they can be classified as higher-level languages, whose performance is often bound by the programs they call.

The Computer Language Benchmarks Game

"The Computer Language Benchmarks Game" (3) is an online software project that compares the performance of 27 programming languages in 13 toy benchmarks. It tries to enable comparisons based on the execution time, memory usage, file size and CPU load by having an open competition in which everyone can submit their implementation for one of the tests. However, all benchmarks also strictly define how the algorithms may be implemented, which makes side-by-side comparison easier but also disallows certain language specific optimisations to be used. Furthermore, due to the simplicity of the tests, the achieved performance cannot stand for the actual efficiency of large applications. Still, they provide an easy way to see differences in implementations of the same problems and different languages. Because of a large number of contributions and frequent updates of the scores, the project has also become a source for very optimised code in the respective languages.

RAPL

The Running Average Power Limit interface (4) has been supported on Intel processors starting with the Sandy Bridge microarchitecture released in 2011 (5). One of its functionalities allows reading an internal counter register, which holds the number of fractions of a Joule that have approximately been consumed since its last reset. It has been used to implement several power monitoring tools on Linux (6). Furthermore, the high accuracy of its fine-grained estimates has also made it a popular meter in research about the energy efficiency of programs. Implementations working with C and Java have also made it possible to measure the performance of any piece of code either written in those languages or executed in a program invoked by them (7: p. 258).

"Energy Efficiency across Programming Languages"

Seven researchers from the University of Minho, the Nova University of Lisbon and the University of Coimbra published the paper "Energy Efficiency across Programming Languages - How Do Energy, Time and Memory Relate?" in 2017 (7). They used ten benchmarks from The Computer Language Benchmark Game as well as the RAPL interface to measure the performance of 27 programming languages of different execution models and paradigms regarding the execution time, the peak memory usage as well as the energy consumption of both the CPU and DRAM. As energy is calculated as E_nergy = T_ime × P_ower one of the most significant questions they asked was whether increasing energy efficiency could be reduced to only making programs run faster or use less memory. The setup of their tests including the used hardware and, e.g. compiler versions can be found on their website.

In the coming paragraphs, only graphs showing the key findings of the paper can be found. As the researchers published their entire testing data (8), however, this website also provides the opportunity to visually explore the data set in its entirety. On the right hand side of each graph title, there is a symbol. By clicking or tapping on it, the language comparison tool will be opened mode using that diagram as a starting point. With the tool, the benchmark, as well as the languages and measurements, can be selected. Additionally, the code of how the particular algorithm was implemented in a language can also be shown. This visualisation is an intuitive way to compare the 27 languages in 10 benchmarks using four atomic and twelve derived measurements based on individual project constraints or preferences. More information about the individual benchmarks can be found on the website of The Computer Language Benchmarks Game.

Energy consumption and execution time

A first approach to answer this question is to look at five of the most used programming languages and compare the total energy consumption with the execution time:

This example would suggest, that the direct proportionality of E_nergy = T_ime × P_ower holds in reality as well, even if the power is unknown. However, if one looks at the ratio in the Regex Redux benchmark, larger variance between the same languages can be seen:

This phenomenon can be further stressed by observing the results of the Binary Trees benchmark. Even though Pascal consumes roughly 10% less energy, it still takes about twice as much execution time as Chapel (7: p. 260):

A second approach is to compare all energy-time ratios of one benchmark. In the N-Body test it can be seen that most data points lie between 10W and 15W with only JRuby and Lua standing out as exceptions. It can, therefore, be concluded that while generally, the direct proportionality between energy and time holds, this still depends on the benchmark, i.e. the type of algorithm that needs to be performed:

Another way to look at the data is to separate between the compiled (C and C++), visual machine (F# and Java) and interpreted languages (Python and Ruby). While Java almost matches the performance of C and C++, there still is a progression between the execution types. Indeed, on average compiled languages consumed 120J in 5103ms, virtual machines 576J in 20623ms and interpreted languages 2365J in 87614ms execution time (p. 261).

The paradigms of the programming languages also significantly affect the resulting performance. On average, imperative languages (Ada, C, F#) consumed 125J in 5585ms, object-oriented (Ada, F#, Python) consumed 879J in 32965ms, functional (Erlang, F#) consumed 1367J in 42740ms and scripting languages took 2320J in 88322ms execution time (p. 261):

Energy consumption and memory usage

The second interesting relationship is between the peak memory usage and the energy consumption. One typical tradeoff for performance is to cache precomputed values or use more memory expensive but faster algorithms instead. A starting point is to look at the languages with the most and least amount of memory usage:

It can be seen that the five best-performing languages C, C++, Fortran, Go and Pascal are all compiled and imperative. Dart, Erlang, JRuby, Lua and Perl are interpreted and scripting languages with Erlang being an exception. It can also be observed that while the three best languages seem to have similar energy to memory ratio, this is not true for example for Lua. Therefore, the next graph compares this ratio for almost all languages in the Spectral Norm benchmark. Note that Lua, Perl, PHP, Python and Ruby are excluded, as they distort the chart because of their high ratios:

The high variations between the languages are observable. It can be proven that there is almost no correlation between the DRAM energy consumption and the memory usage. However, the different paradigms of languages can still affect the ratio between the amount of energy necessary for the CPU and DRAM. While the dynamic random-access memory uses between five and ten percent on average, the languages that negatively stand out are all feature scripting:

Conclusion

The choice of the programming language can have a significant impact on the energy efficiency of the resulting program. While compiled, imperative code has the best performance on average, very well optimised virtual machines like the Java JVM still present a viable alternative. However, interpreted scripting languages perform worse in most use cases, even though the actual difference depends on the benchmark. Memory intensive implementations do not proportionally increase the energy consumption. Therefore, precomputing certain values is a good way to increase both the speed and energy efficiency of a program. Lastly, while faster programs do not consume less energy in general, the most efficient languages still perform the best in both areas because of their extensive optimisations.

Using the data collected from all benchmarks a normalised rating can be calculated for energy, time and memory (p. 263). It is based on the respective language with the best performance, which is C for the first two measures and Pascal for the last one. The last graph shows the energy-related rating of all languages. As with all diagrams in this article, clicking or tapping on the symbol to the right of the title allows access to all datasets in the interactive graph mode. While testing the energy efficiency of multiple languages for a specific problem remains the most accurate way to find the most performant choice, this article can serve as an important tool to explore and narrow the scope in advance.