Switchless directs COSMA7 supercomputer to congested ports

Scientists at the Institute of Computational Cosmology (ICC) in England spend most of their time trying to get to the bottom of the secrets found in the innermost parts of space, from the Big Bang and the origins of the universe to mysteries like dark matter. Much of this work is done through simulations running on the COSMA variety of supercomputers.

The family’s latest formula, COSMA8, has more than 50,000 compute cores and 360 TB of RAM. COSMA7 is an experimental formula, with more than 12,000 Xeon SP 5120 processor cores and 250 TB of memory on 452 nodes. Massive parallel workloads running on formulas, which investigate everything from black holes and planet formation to collisions, are incredibly computational and require a lot of data.

According to Alastair Basden, technician of the COSMA HPC cluster at Durham University.

“We can put models into our simulations of things we don’t understand, things like dark matter and dark energy,” Basden said at a recent virtual assembly with journalists and analysts. “By refining our models and the parameters we introduced into the simulations, we verified that the result of the simulations fits what we see in the sky with those telescopes. It takes some time, but simulation requires a lot of computing time on tens of thousands of computing cores. however, you also want a fast network to connect all those other compute nodes. »

In HPC, more compute and garage can be added to a cluster. Networking can be a specific challenge. As emerging workloads, such as synthetic intelligence, become increasingly important, demand for computing resources from disparate parties remains a factor and “noisy neighbors” affecting the functionality of other workloads on the same system, networks in the hpc area occasionally face bandwidth factors, degradation of functionality and congestion

“Everything that happens on a node here also affects what happens on the node there, all the while with the movement of knowledge around the cluster,” Basden said. “So we want faster network interconnection. “

In models of the universe created through CCI, replacing a star in one segment will have an effect on other stars in remote parts of the simulation. It is vital that messages pass through the template as temporarily as possible.

We met the ten-year-old startup when it came out on the sly in November 2021 with a switchless network architecture designed to meet the desires for developing functionality of the enterprises and HPC organizations running those AI workloads. The ICC, hosted at Durham University, runs with Rockport to verify the architecture of the COSMA7 supercomputer to see if it can reduce the effect of network congestion on the programs it runs, adding the modeling codes that will run on long-term exascale systems.

The Rockport facility is funded through Distriyeted Research Advanced Computing (DiRAC), a built-in supercomputing facility that has 4 sites across the country, in addition to Durham, and the ExCalibur allocation, a five-year program that is part of the UK exascale. Efforts that concentrate mainly on software, but have money for new hardware architectures. Rockport’s allocation money aims to test exclusive network technologies.

Rockport fits this definition of unique. As we reported earlier, the classic switches of the Rockport architecture create a highly scalable and effective network that solves the problems of cost, latency, bandwidth, and congestion in typical networks. twelve network things consistent with node.

“We moved from a centralized switching technique to a distributed switch, where one and both nodes in the network comprise switching capability,” said Matt Williams, director of leaders generation at Rockport. “Instead of being connected to centralized switches, our batches are connected directly to others through twelve compromised links on our nodes. There are no switches or external nodes connected directly, which gives us huge benefits of architectural functionality when talking about functionality-intensive applications. Directly connected nodes, distributed switching capability to endpoints. In our environment, nodes do not connect to the network. The nodes are the network.

Rockport’s switchless network includes the NC 1225 network adapter that creates a three hundred Gb/s node design and includes the popular Ethernet interface and passive SHFL device that houses the HPC point network. An FPGA implements all technologies. The independent network administrator oversees the direct-connect network and the rNOS operational formula automates the discovery, configuration, and repair of formulas and ensures that the most productive path through the network is taken through the workload.

CCI has been testing Rockport’s infrastructure for more than a year in the Durham Intelligent NIC Environment (DINE), an experimental formula that includes 24 compute nodes and Rockport’s 6D torus-switchless Ethernet fabric. and in comparison the results.

The scientists ran code on the formula and artificially increased network congestion to simulate other code running at the same time. The ICC executed other scenarios: rockport generation running on a variable number of nodes and converting the number of noisy neighbors running into nodes that expand congestion. Scientists found that functionality degraded as network congestion increased, according to Basden.

They also found that workloads took 28% longer to complete than those on Rockport nodes.

With COSMA7, Basden will split the supercomputer in two, with 224 nodes running Rockport’s Switchless Network architecture and the other 224 InfiniBand. The Lustre or Ceph files will be connected directly into any of the structures and scientists will be able to directly compare the two networks running. in thousands of computer cores one hundred TB of RAM. The functionality of workloads on congested networks will be studied, as well as comparisons of those running on an overall target system.

“What’s vital is that those nodes are the same: the same RAM, the same processors, etc. ,” Badsen said. “We’ll be able to make direct comparisons of the codes in a quiet environment, so we can prevent other users from saying, ‘Just run this code. ‘We will introduce congestion artificially, but also in unrestricted environments, and then ‘I will be able to get a concept of the network that sometimes works better.

They will also increase the number of cores and RAM and “run genuine clinical simulations and look at the functionality of those that are congested,” he said.

Williams is confident that Badsen and the ICC will see innovations in running Rockport.

“When we think about functionality under load, what’s not really the most productive situation on the spec sheet, is how you control functionality under a very heavy load,” he said. “Let’s cut all those packages into small pieces. Let’s take latency-sensitive critical messages and move forward in the queue, allowing us to ensure that critical messages that calculate workload localization time arrive first when they cross the network.

This comes down to the importance of having the messages from some of those giant simulations delivered to the other parties as temporarily as possible. In testing, messages from Rockport’s architecture take 25 nanoseconds to reach their targets. This compares to 15,000 nanoseconds with InfiniBand and 220,000 nanoseconds with Ethernet.

In institutional and commercial HPC environments, “the network is the problem,” Williams said. “Computing and the garage are accelerating. The network didn’t actually follow them. aggregated bandwidth, it’s quite inefficient and can get up to 60% less bandwidth than you think. What we see is that because of the congestion on this network, because of the challenge of employing this bandwidth efficiently, you can get a degradation of workload functionality. On an idle network, you can get very smart functionality for your code. When other people start using it, that congestion occurs and their workload slows down. those noisy neighbors. We have a very different architecture. We need to make sure our generation is easy to adopt, simple to implement, and simple to use for other people.

The immediate movement of knowledge to the cloud, the strong accumulation in east-west traffic, and the widespread adoption of trendy programs such as synthetic intelligence (AI) and device learning are putting pressure on classic network infrastructures designed for the era. complicated meeting. . .

The Next Platform is through Stackhouse Publishing Inc in partnership with the UK’s leading generation publication, The Register.

It provides a detailed high-end computing policy across enterprises, knowledge centers, knowledge centers at scale, and public clouds. Read more. . .

Leave a Comment Cancel Reply