With the annual Hot Chips conference taking place this week, many of the industry’s biggest chip design firms are at the show, talking about their latest and/or upcoming wares. For Intel, it’s a case of the latter, as the company is at Hot Chips to talk about its next generation of Xeon processors, Granite Rapids and Sierra Forest, which are set to launch in 2024. Intel has previously revealed this processors on its data center roadmap – most recently updating it in March of this year – and for Hot Chips the company is offering a bit more in the way of technical details for the chips and their shared platform.
While there’s no such thing as an “unimportant” generation for Intel’s Xeon processors, Granite Rapids and Sierra Forest promise to be one of Intel’s most important updated to the Xeon Scalable hardware ecosystem yet, thanks to the introduction of area-efficient E-cores. Already a mainstay on Intel’s consumer processors since 12th generation Core (Alder Lake), with the upcoming next generation Xeon Scalable platform will finally bring E-cores over to Intel’s server platform. Though unlike consumer parts where both core types are mixed in a single chip, Intel is going for a purely homogenous strategy, giving us the all P-core Granite Rapids, and the all E-core Sierra Forest.
As Intel’s first E-core Xeon Scalable chip for data center use, Sierra Forest is arguably the most important of the two chips. Fittingly, it’s Intel’s lead vehicle for their EUV-based Intel 3 process node, and it’s the first Xeon to come out. According to the company, it remains on track for a H1’2024 release. Meanwhile Granite Rapids will be “shortly” behind that, on the same Intel 3 process node.
As Intel’s slated to deliver two rather different Xeons in a single generation, a big element of the next generation Xeon Scalable platform is that both processors will share the same platform. This means the same socket(s), the same memory, the same chiplet-based design philosophy, the same firmware, etc. While there are still differences, particularly when it comes to AVX-512 support, Intel is trying to make these chips as interchangeable as possible.
As announced by Intel back in 2022, both Granite and Sierra are chiplet-based designs, relying on a mix of compute and I/O chiplets that are stitched together using Intel’s active EMIB bridge technology. While this is not Intel’s first dance with chiplets in the Xeon space (XCC Sapphire Rapids takes that honor), this is a distinct evolution of the chiplet design by using distinct compute/IO chiplets instead of stitching together otherwise “complete” Xeon chiplets. Among other things, this means that Granite and Sierra can share the common I/O chiplet (built on the Intel 7 process), and from a manufacturing standpoint, whether a Xeon is Granite or Sierra is “merely” a matter of which type of compute chiplet is placed down.
Notably here, Intel is confirming for the first time that the next gen Xeon Scalable platform is getting self-booting capabilities, making it a true SoC. With Intel placing all of the necessary I/O features needed for operation within the I/O chiplets, an external chipset (or FPGA) is not needed to operate these processors. This brings Intel’s Xeon lineup closer in functionality to AMD’s EPYC lineup, which has been similarly self-booting for a while now.
Altogether, the next gen Xeon Scalable platform will support up to 12 memory channels, scaling with the number and capabilities of the compute dies present. As previously revealed by Intel, this platform will be the first to support the new Multiplexer Combined Ranks (MCR) DIMM, which essentially gangs up two sets/ranks of memory chips in order to double the effective bandwidth to and from the DIMM. With the combination of higher memory bus speeds and more memory channels overall, Intel says the platform can offer 2.8x as much bandwidth as current Sapphire Rapids Xeons.
As for I/O, a max configuration Xeon will be able to offer up to 136 lanes general I/O, as well as up to 6 UPI links (144 lanes in total) for multi-socket connectivity. For I/O, the platform supports PCIe 5.0 (why no PCIe 6.0? We were told the timing didn’t work out), as well as the newer CXL 2.0 standard. As is traditionally the case for Intel’s big-core Xeons, Granite Rapids chips will be able to scale up to 8 sockets altogether. Sierra Forest, on the other hand, will only be able to scale up to 2 sockets, owing to the number of CPU cores in play as well as the different use cases Intel is expecting of their customers.
Along with details on the shared platform, Intel is also offering for the first time a high-level overview of the architectures used for the E-cores and the P-cores. As has been the case for many generations of Xeons now, Intel is leveraging the same basic CPU architecture that goes into their consumer parts. So Granite and Sierra can be thought of as a deconstructed Meteor Lake processor, with Granite getting the Redwood Cove P-cores, while Sierra gets the Crestmont E-Cores.
As noted before, this is Intel’s first foray into offering E-cores for the Xeon market. Which for Intel, has meant tuning their E-core design for data center workloads, as opposed to the consumer-centric workloads that defined the previous generation E-core design.
While not a deep-dive on the architecture itself, Intel is revealing that Crestmont is offering a 6-wide instruction decode pathway as well as an 8-wide retirement backend. While not as beefy as Intel’s P-cores, the E-core is not by any means a lightweight core, and Intel’s design decisions reflect this. Still, it is designed to be far more efficient both in terms of die space and energy consumption than the P cores that will go into Granite.
The L1 instruction cache (I-cache) for Crestmont will be 64KB, the same size as on Gracemont. Meanwhile, new to the E-core lineup with Crestmont, the cores can either be packaged into 2 or 4 core clusters, unlike Gracemont today, which is only available as a 4 core cluster. This is essentially how Intel is going to adjust the ratio of L2 cache to CPU cores; with 4MB of shared L2 regardless of the configuration, a 2-core cluster affords each core twice as much L2 per core as they’d otherwise get. This essentially gives Intel another knob to adjust for chip performance; customers who need a slightly higher performing Sierra design (rather than just maxing out the number of CPU cores) can instead get fewer cores with the higher performance that comes from the effectively larger L2 cache.
And finally for Sierra/Crestmont, the chip will offer as close to instruction parity with Granite Rapids as possible. This means BF16 data type support, as well as support for various instruction sets such as AVX-IFMA and AVX-DOT-PROD-INT8. The only thing you won’t find here, besides an AMX matrix engine, is support for AVX-512; Intel’s ultra-wide vector format is not a part of Crestmont’s feature set. Ultimately, AVX10 will help to take care of this problem, but for now this is as close as Intel can get to parity between the two processors.
Meanwhile, for Granite Rapids we have the Redwood Cove P-core. The traditional heart of a Xeon processor, Redwood/Granite aren’t as big of a change for Intel as Sierra Forest is. But that doesn’t mean they’re sitting idly by.
In terms of microarchitecture, Redwood Cove is getting the same 64KB I-cache as we saw on Crestmont, which unlike the E-cores, is 2x the capacity of its predecessor. It’s rare for Intel to touch I-cache capacity (due to balancing hit rates with latency), so this is a notable change and it will be interesting to see the ramifications once Intel talks more about architecture.
But most notably here, Intel has managed to further shave down the latency of floating-point multiplication, bringing it from 4/5 cycles down to just 3 cycles. Fundamental instruction latency improvements like these are rare, so they’re always welcome to see.
Otherwise, the remaining highlights of the Redwood Cove microarchitecture are branch prediction and prefetching, which are typical optimization targets for Intel. Anything they can do to improve branch prediction (and reduce the cost of rare misses) tends to pay relatively big dividends in terms of performance.
More applicable to the Xeon family in particular, the AMX matrix engine for Redwood Cove is gaining FP16 support. FP16 isn’t as quite as heavily used as the already-supported BF16 and INT8, but it’s an improvement to AMX’s flexibility overall.
Memory encryption support is also being improved. Granite Rapids’ flavor of Redwood Cove will support 2048, 256-bit memory keys, up from 128 keys on Sapphire Rapids. Cache Allocation Technology (CAT) and Code and Data Prioritization (CDP) functionality are also getting some enhancements here, with Intel extending them to be able to control what goes in to the L2 cache, as opposed to just the LLC/L3 cache in previous implementations.
Ultimately, it goes without saying that Intel believes they’re well-positioned for 2024 and beyond with their upcoming Xeons. By improving performance on the top-end P-core Xeons, while introducing E-core Xeons for customers who just need lots of lighter CPU cores, Intel believes they can address the complete market with two CPU core types sharing a single common platform.
While it’s still too early to talk about individual SKUs for Granite Rapids and Sierra Forest, Intel has told us that core counts overall are going up. Granite Rapids parts will offer more CPU cores than Sapphire Rapids (up from 60 for SPR XCC), and, of course, at 144 cores Sierra will offer even more than that. Notably, however, Intel won’t be segmenting the two CPU lines by core counts – Sierra Forest will be available in smaller core counts as well (unlike AMD’s EPYC Zen4c Bergamo chips). This reflects the different performance capabilities of the P and E cores, and, no doubt, Intel looking to fully embrace the scalability that comes from using chiplets.
And while Sierra Forest will already go to 144 CPU cores, Intel also made an interesting comment in our pre-briefing that they could have gone higher with core counts for their first E-core Xeon Scalable processor. But the company decided to prioritize per-core performance a bit more, resulting in the chips and core counts we’ll be seeing next year.
Above all else – and, perhaps, letting marketing take the wheel a little too long here for Hot Chips – Intel is hammering home the fact that their next-generation Xeon processors remain on-track for their 2024 launch. It goes without saying that Intel is just now recovering from the massive delays in Sapphire Rapids (and knock-on effect to Emerald Rapids), so the company is keen to assure customers that Granite Rapids and Sierra Forest is where Intel’s timing gets back on track. Between previous Xeon delays and taking so long to bring to market an E-core Xeon Scalable chip, Intel hasn’t dominated in the data center market like it once did, so Granite Rapids and Sierra Forest are going to mark an important inflection point for Intel’s data center offerings going forward.