Hot Chips 2020: Marvell Main Points ThunderX3 Processors: Up to 60 cores consistent with matrix, 96 double matrix in 2021

Today, as a component of HotChips 2020, we saw Marvell nevertheless reveal some main points about the microarchitecture of its new ThunderX3 server processors and fundamental microarchitectures.more specific specifications on how the processor’s internal design team promises to differentiate itself from the developing festival of the Arm server market.

We had reviewed the ThunderX2 in 2018; at the time it was still a Cavium product before the designs and groups were purchased through Marvell a few months later that year. Since then, the Arm server ecosystem has been revived through the core of Arm’s Neoverse N1 processor and designs from partners such as Amazon (Graviton2) and Ampere (Altra), a very different set of cases parallel to AMD’s success, return to the market, a very different landscape.

Marvell began presenting HotChips with a roadmap of its products, detailing that the ThunderX3 generation is not only a unique design, but represents a flexible technique that employs multiple arrays, with the CN110xX 60-core first-generation SKU employing an unwrned matrix.monolithic design in 2020, and next year with the launch of a 96-core dual matrix variant aimed at superior performance.

Using a dual chip like this is very attractive because it represents a midpoint between an absolutely monolithic design and a supplier sage like AMD. Each slice is the same here, as it can be used independently as separate products.

From an SoC point of view, the ThunderX3 uncoils up to 60 cores, with the 2-d variant up to 96. The first question that comes to the brain when you see those numbers is why the 2-d variant is not developed. to 120 complete hearts – Marvell did not ignore this lecture, but there were some clues in the presentation.

Marvell had called for an upgrade of the pershapeance 2 to 3 times to a ThunderX2 at equivalent strength levels.The latter had a 180W TDP; if the TX3 maintains this thermal envelope, it would mean that a dual matrix design would have had to increase the TDP.to 360W, far beyond what can be cooled into the air in a typical server form factor.rack in terms of strength density. Assuming only linear relief to the 96 cores as advertised, we would end up with approximately 288 W, which is more in line with existing high-finish server CPU implementations without water cooling. Of course, this is all our own research and taking on the issue.

A single chip supports 8 channels of DDR4-3200 which is for this generation of a server product and is necessarily in line with everyone on the market.As for I/O, we see a 64-channel disclosure of PCIe 4.0, which is back in line with competitors, but is part of what Ampere or AMD’s high-level opportunities can achieve.

A big unknown right now is how the dual matrix product will segment the I/O and reminiscence controllers, whether it’s a 50-50 resource allocation between the two arrays, or whether we’ll see an imbalance in the configuration, or if the platform can manage all the resources in each cube and become a 16-channel beast in 128 lanes?

According to at least, the ThunderX3 looks similar to Amazon’s Graviton2, as they support a similar number of processor cores and similar I/O and memory configurations.The biggest difference that can be noted without delay is that the ThunderX3 uses SMT4 on its processor cores and therefore supports up to 240 threads supported by the array.There is also a difference from TDP, however, I attribute this to the graviton2 being conservative with its clock frequencies, while Amconsistent SKUs with SKUs are more in line with the ThunderX3, especially the Q64-30 from 64 cores to 3.0GHz 180W which is closest in the specification.

Another thing that stands out about the ThunderX3 is the 90 MB L3 cache that eclipses the past generation’s 32 MB such as Ampere and Amazon’s 32 MB configurations.

Marvell has selected here to develop its own interconnection microarchitecture that has now evolved from an undeniable ring design to a dial-up ring with 3 subanillos, or columns.This provides a complete dis with 15 ring stops (columns 3×5) and the full core of 60 MB of total L3 cache, which is a respectable amount.

During the Q&A sessions, Marvell revealed that its justification for a dial-up ring topology as opposed to a single ring or mesh design that a single ring would not have been able to build functionality and bandwidth to a higher number of A Mesh Design would have brought a big change and would have required relief in the number of cores.A dial ring represented an intelligent compromise between the two architectures.In fact, if this is what allowed Marvell to come up to 3 times more cache compared to its nearest competitors, it turns out to have been a smart choice.

One strange thing I’ve detected is that the formula still uses a set of snoop-based consistency rules that contrasts with other directory-based formulas in the industry.This can reduce complexity and deployment area, but can be delayed in terms of power and consistency of chip traffic.

Memory drivers on the rings, and Marvell’s CCPI3 interface between sockets/die serves here up to 84GB/s bandwidth.

Leave a Comment

Your email address will not be published. Required fields are marked *