This year, supercomputers have reached a milestone — performing a billion billion operations per second. Why and how did they get there?

On May 27, 2022, the HPC community (high-performance computing) announces with great fanfare the arrival of the first “exascale” supercomputeri.e. able to perform 1018 “FLOPS”, or a billion billion operations per second (on real numbers in floating-point notation, to be precise).

The new supercomputer, frontierwhich is operated by the US Department of Energy at Oak Ridge National Laboratory in Tennessee with several million cores, supplants the Japanese supercomputer Fugaku which retrogrades to the second position of the TOP500 ranking of the most powerful machines.

frontier, not content with being (for now) the most powerful computer in the world, also ranks well in terms of energy efficiency… at least in relation to its power, because it consumes enormous amounts of energy , the equivalent of a city of several tens of thousands of inhabitants. And the problem does not stop at frontiersince it is only the flagship of the flourishing global fleet of several thousand supercomputers.

The Fugaku supercomputer…time number one. // Source : Riken

Battle between the Americans, the Chinese… and the Europeans

This return of the Americans to the front of the race highlights a new battleground between the American and Chinese superpowers, which the Europeans observe from ambush. Indeed, China had created a surprise in 2017 by stealing first place from the United States: we then witnessed a mass arrival with more than 200 Chinese supercomputers in the TOP500. Today, the first Chinese machine is relegated to sixth place and the Chinese have chosen to take their machines out of this ranking.

In 2008, the supercomputer Roadrunner of the American Los Alamos National Lab is the first to reach the “PetaFlops”, or one million billion FLOPS (1015). The exascale becomes a strategic objective for the Americans, even though this goal seems technically unattainable.

To achieve exascale, it was necessary to rethink the architecture of the previous generation PetaFlops. For example, at these extreme scales, the question of reliability of millions of components becomes crucial. Like a grain of sand jamming a gear, the failure of one element prevents the entire machine from functioning.

The challenge of exascale facing the “energy wall”

But the US Department of Energy (US DoE) has added a constraint to this technological development by imposing a maximum power of 20 Megawatts to deploy the exascale – constraint called “energy wall”. The American Initiative Exascale computing was funded at over $1 billion in 2016.

To pass this “energy wall”, it was necessary to rethink all the software layers (from the operating system to the applications) and design new algorithms to manage the heterogeneous computing resourcesthat is to say the standard processors and accelerators, the memory hierarchies, the interconnections in particular.

supercomputer frontier
The Frontier supercomputer. // Source : Oak Ridge National Laboratory

In the end, the electricity consumption of frontier is measured at 21.1 Megawatts, or 52.23 Gigaflops per Watt, which roughly translates to 150 tons of CO emissions2 per day taking into account the energy mix of Tennessee, where the platform is located. This is just below the 20 Megawatt wall limit set in the DoE target (if we take the 21.1 Megawatts from the 1.102 exaflop of frontierwe arrive at 19.15 Megawatts).

This places frontier second in the Top Green500 supercomputers that consume the fewest operations (Flops) per Watt – this ranking was launched in 2013 and notes the birth of community concerns for energy issues. This place in the Top Green500 is good news: the performance gain of frontier is also accompanied by a gain in energy consumption.

An estimate of energy consumption that is too optimistic?

But these estimates of digital energy consumption are underestimated, as is often the case: they only take into account usage and neglect the important part of the manufacture of the supercomputer and associated infrastructure such as buildings, and its future. dismantling. My research experience and that of my academic and industrial colleagues allow us to estimate that the use represents only about half of the total energy cost, taken over an average lifespan of 5 years. There are few studies on the subject, because of the systemic difficulty of the thing and the low availability of data, but let us quote the recent study of measurement of the consumption of one hour of a core of the Dahu computing platform which concludes with a proportion of the costs of use of barely 30% with regard to the total energy cost.

In addition, technological improvements that allow energy savings generate an overall surplus of consumption: this is called the“rebound effect”. New features and increased usage ultimately result in increased energy consumption. A recent example in computer science is that of natural language models (NLP, for Natural Language Processing), which are enriched with new functionalities as the calculation performance increases link.

The tree that hides the forest

Technological progress to achieve exascale is indisputable, but the direct and indirect counterpart that weighs on global warming remains significant, despite the optimists who say that it is a drop in the bucket compared to the 40 billion tonnes of CO2 emitted each year by all human activities.

Moreover, it is not just a single supercomputer: frontier is the tree that hides the forest. Indeed, we have observed for a long time in the community that the progress obtained by building a new generation of high-performance computing spreads rapidly: new platforms very quickly replace the platforms already deployed in university computing centers or in businesses. If the replacement is premature, the effective life of the replaced machines is reduced, and their environmental impact increases.

frontier supercomputer supercomputer
Inside a “cabinet”. // Source : Oak Ridge National Laboratory

The TOP500 represents only a part of the galaxy of HPC platforms deployed around the world. It is very difficult to estimate the number, because many platforms fall off the radar: a large number of large-scale platforms are in private companies and many smaller-scale platforms are deployed locally.

A small study made directly on the data of the TOP500 shows that the effective performance of the most powerful platform has been multiplied in the last ten years by 33 (the average performance of the 500 machines only progresses by a factor of 20). Over the same period, the energy gain of the TOP Green500 was multiplied by barely 15 (and 18 on average). The overall balance in terms of energy consumed is therefore negative – it has, in the end, increased.

What to do with these advances in computing?

A counter-argument can be made: progress towards more and more powerful platforms could make it possible to find technical solutions To fight against climate change. This way of thinking is representative of the state of mind of our technocentric society, but it is unfortunately almost impossible to measure the impact of these new technologies on the reduction of the carbon footprint. Indeed, most of the time, these measures focus on the phases of use and ignore the “sides”, such as the manufacture of new equipment, for example.

One can legitimately wonder what mechanism drives this race for performance. One reason cited by the designers of frontier is scientific progress: the more complex the phenomena that we seek to model and understand, the more simulations are needed and the only way to conduct simulations is to build ever more powerful HPC platforms…”

The conversation logo

Denis TrystramUniversity professor in computer science, Grenoble Alpes University (UGA)

This article is republished from The Conversation under Creative Commons license. Read theoriginal article.

California18

Welcome to California18, your number one source for Breaking News from the World. We’re dedicated to giving you the very best of News.

Leave a Reply