💾 Archived View for arcanesciences.com › gemlog › 22-07-07 captured on 2023-04-26 at 13:06:15. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2022-07-16)

-=-=-=-=-=-=-

sunset's gemlog

Server Chip Roll Call!

In light of some of the questions I get on a regular basis, I decided to do a writeup on server processors - all of them, or at least all of the ones I know - with some notes on their future outlook. No guarantees of usefulness or even accuracy.

x86

AMD Epyc

Description: High core counts (64 today, 96 tomorrow), especially by x86 standards, with high-performance dual-threaded OoO cores. Optional large off-die L3 extension.

Competitiveness: Pretty dang good. There's areas where they lose to Intel (AVX-512 or socket counts above 2) or to ARM (highly threaded loads with minimal LLC pressure) but Epyc is probably the most well-rounded high-end server CPU that exists.

Outlook: Generally good. AMD seems reluctant to be a very early mover on new nodes, but their primary x86 competition has suffered from huge enough delays that it mostly hasn't mattered. Genoa should be a solid improvement due to platform improvements despite conservative microarchitectural changes.

Operating systems: Mostly Linux and NT, plus fringe operating systems (*BSD probably being the largest.)

Intel Xeon

Description: Mid-high core counts (40 today, 56 tomorrow) with high-performance dual-threaded OoO cores. High socket scalability and very wide SIMD.

Competitiveness: Good at lower core counts, typically, but aside from AVX-512 codes, full-socket performance in high-end SKUs tends to lag behind AMD. Glueless to 8 sockets, unlike AMD's 2-socket max, which is still a niche - though a declining one.

Outlook: Decent, contingent on Intel's ability to execute. Sapphire Rapids would have been great if it launched in 2021. The roadmap is interesting, though, and Intel has some useful IP that AMD doesn't (namely, inhouse small cores for building high-core-density server parts like the future Sierra Forest.

Operating systems: Mostly Linux and NT, plus fringe operating systems (*BSD probably being the largest; Nonstop, Solaris, and VMS are also available.)

ARM

Ampere Altra

Description: 80-128 Neoverse N1 cores licensed from ARM.

Competitiveness: The N1 core is actually pretty decent, though increasingly obsolete. Single-thread performance is okay, though not spectacular; throughput performance varies widely depending on how LLC-sensitive the workload is. Small L3 can crater performance on some streams.

Outlook: Hard to say. Ampere is transitioning to custom cores later in 2022, which - based on gcc/llvm commits - look like they'll be fairly narrow. I expect very high core density per socket, likely 192+.

Operating systems: Essentially just Linux. Probably Windows Server down the road.

Amazon / Annapurna Graviton

Description: 64-core Neoverse V1 cores licensed from ARM.

Competitiveness: Good. I've done a SPEC run for the V1 cores - available in my gemlog - and liked what I saw. Hard to do full-socket tests until bare-metal instances are available, but my guess is that full-socket performance often falls short of Epyc even when individual cores compare favorably.

Outlook: Amazon doesn't discuss its roadmaps, but I would guess we'll see Graviton generally tracking new Neoverse IP. This could be a good thing (cores so far have been great!) or a bad thing (ARM's improvement cadence on Cortex in recent years has become shaky; still no sign of the Neoverse "Poseidon" cores as of this writing.)

Operating systems: Linux.

Alibaba Yitian 710

Description: 128-core ARMv9 server processor with custom cores. Alibaba hasn't discussed microarchitecture.

Competitiveness: Their claimed SPEC results are not great, not terrible - probably pretty decent for an initial inhouse microarchitecture but aren't particularly groundbreaking.

Outlook: Difficult to predict.

Operating systems: Linux.

Huawei Kunpeng 920

Description: 64-core custom ARMv8 server chip with scalability to four sockets.

Competitiveness: Probably pretty decent at release, but hasn't had a followup. Newer ARMs have passed it by.

Outlook: At one point there was an aggressive roadmap with SMT and support for SVE. Given sanctions, I have no expectation that those ambitious future Kunpeng parts will ship. The microarchitecture may still have a future in SMIC-fabricated desktop parts.

Operating systems: Linux.

Phytium Feiteng

Description: 64 custom OoO ARMv8 cores at 2.0-2.5GHz depending on product. Shared L2 between quads, shared L3 across chip. Feiteng is used as the main node CPU in the Tianhe-3 exascale supercomputer, supporting Matrix accelerator cards.

Competitiveness: The FT2000 numbers presented at Hot Chips in 2016 were mediocre, but by no means awful. The microarchitecture doesn't seem to have received major updates since then, though there have been some uncore changes.

Outlook: Due to US sanctions, future improvements will be limited to what SMIC can build.

Operating systems: Linux.

HPC

Fujitsu A64FX

Description: 52 specialized ARMv8 cores with 512b SVE pipes, arranged in 13-core clusters and attached to wide HBM interface and integrated Tofu inter-node fabric. Replaces SPARC64 XIfx, which used Hybrid Memory Cubes instead of HBM and only had 34 cores.

Competitiveness: For bandwidth-sensitive codes, A64FX does great. RAM capacity per core (and, by extension, per node) is small enough to be annoying for some loads, and the cores themselves are unspectacular at many loads. NEC does similar bandwidth/FLOPS numbers with SX, but since SX-Aurora's release NEC no longer has an integrated fabric and now depends on a host system to carry the vector cards (though it is not an accelerator in the conventional sense.)

Outlook: Fujitsu has not extensively discussed A64FX's successor, but it's likely to have one. I expect to hear more at SC22 or SC23.

Operating systems: Linux.

Sunway SW26010

Description: Strange heterogeneous manycore, divided into blocks consisting of one heavy core and 64 light cores with 512b SIMD and local scratchpads. Independent DDR4 interface per block. Six blocks per chip. Used in the "BlueLight", "TaihuLight", and "Oceanlight" supercomputers, the last of which is an exascale system.

Competitiveness: The cache/memory configuration looks fairly severely limiting to me, but papers have been published showing impressive performance on some applications. I can't imagine programming and optimization are particularly fun. Cell comes to mind as a comparison point.

Outlook: Sunway's now on the Entity List, so the same caveats apply as those affecting Phytium and Huawei.

Operating systems: Linux on the heavy cores.

RISC/UNIX

IBM Power

Description: The last RISC/UNIX player still seeing active silicon R&D. Power10 is an aggressive 8-wide enhancement of the Power9 microarchitecture, with enlarged structures throughout the core and a shift from massive eDRAM caches to SRAM dictated by the withdrawal of GlobalFoundries from cutting-edge lithography (outside of specialty nodes.) Up to 30 cores per die. Power10 had a fairly troubled development cycle.

Competitiveness: The SPEC results show moderate improvements over Power9, but still underwhelms, especially per socket, against current Epyc. Scales to 16 sockets and has some unique features that will keep it looking good in niche applications for a long time, but Power's cores are no longer dominant - especially when making actual apples-to-apples core comparisons.

Outlook: Power11 is in development. Competitiveness depends on a lot of factors, but the decline of the RISC/UNIX market probably doesn't bode well, especially since IBM seems to have abandoned conventional HPC (where Power9 was fairly successful.)

Operating systems: Low tens of thousands of users running AIX, over 100k running the object-oriented i operating system, and a fair bit of Linux mixed in alongside them. AIX users tend to run larger systems and upgrade more frequently than i users.

Oracle SPARC

Description: After increasingly dismal Sun missteps in the early Niagara family, SPARC made an impressive turnaround starting with the new out-of-order S3 core in the SPARC T4. The M8, announced in autumn of 2017, used a new 4-wide microarchitecture, rebalanced (and unusual) caches, and 32 8-threaded cores running at 5GHz, scalable to 8 sockets.

Competitiveness: In fall 2017, likely very good, though Oracle was cagey about M8 benchmarks. Likely less competitive today due to comparatively low core count - though there are probably niche applications where the M8 still does well.

Outlook: The SPARC design group was gutted alongside the M8's release in autumn of 2017, and the M9 - which would have targeted higher clocks and contained 64 cores - was cancelled. To the best of my knowledge, no new SPARC generations from Oracle are in development. The installed base is still large - SPARC was the RISC/UNIX volume leader even when it was languishing in third place in revenue - but system sales have greatly declined.

Operating systems: Mostly Solaris.

Fujitsu SPARC64

Description: SPARC64 XII is, according to Fujitsu, a 12-core 8-thread processor at up to 4.35GHz. Earlier roadmaps referred to the XII as 24-core 4-thread, but Fujitsu has - presumably for licensing reasons - opted to market it as 12-core in a similar manner to "SMT8 cores" in Power9/Power10. Pairs of "instruction pipelines" share an L2, an L1I, and a TLB, but other structures are private.

Competitiveness: Unspectacular. Core performance and socket performance both lag behind its approximate contemporary, Power9, and looks even worse by comparison to newer cores.

Outlook: Fujitsu has announced intention to exit the UNIX market in the late 2020s. There may be one final server rev, but it's unclear if it will include a new silicon generation; my expectation is that it will not.

Operating systems: Solaris.

Intel/HP Itanium

Description: After a decade of increasingly unimpressive revisions of the once-mighty Itanium2 core, Intel finally announced a new microarchitecture in the Poulson processor in 2012, and released a minor rev of it in 2017. Up to 8 cores at 2.66GHz, with a 32MB shared LLC, built on an Intel 32nm process and inhabiting the Boxboro-MC server platform.

Competitiveness: Grim. Poulson generally lost to Power7 at release in 2012, often by significant margins. In 2022, the situation has not improved.

Outlook: No.

Operating systems: Mostly HP-UX, low-volume but very high margin Nonstop, some VMS (mostly small systems), and historically a sprinkling of since-migrated Linux and NT users. Also niche things like GCOS, which are very important to a few companies and governments but move next to no volume, and the successful-but-specialized Secure64.

Mainframe

IBM Z

Description: The only performance-relevant mainframe chip left. The new "Telum" processor has eight cores, 32MB SRAM L2 each, massive L1's, and no waiting.

Competitiveness: Good, as long as you don't look at the pricetag. Very high ST performance, high bandwidth throughout the system, superb RAS features.

Outlook: It'll be with us for a long time.

Operating systems: Lots of z/OS (MVS) and z/Linux. Some niche z/TPF, z/VM, and VSEn (which was recently spun out from IBM.) Hitachi VOS3 in the Japanese market via Hitachi rebadges of IBM systems.

Fujitsu GS21 / BusinessServer

Description: 8-core chips descended from Amdahl's out-of-order GS8800 design. Up to 15 activated cores and 256GB RAM in a chassis. 390-compatible, though with fairly significant divergence from IBM's system architecture.

Competitiveness: Performance targets and max system sizes are both far below IBM's. Fujitsu claims ~550 390 MIPS per core, whereas IBM has been in the thousands of rated MIPS per core for some years.

Outlook: Until recently, the Fujitsu roadmap depicted at least three new generations of GS21, but in 2022 Fujitsu announced that they would end mainframe sales by 2030 and only one new generation would be produced, to be released around 2024. This came as something of a surprise to me, since Fujitsu is still the mainframe volume leader in Japan and has significant reach internationally as well (especially in Germany and Australia.)

Operating systems: MSP, XSP, AVM in Japan and Australia; BS2000 and VM2000 in most other markets, especially Europe.

NEC NOAH

Description: The latest chip, NOAH-7, has 8 cores running NEC's proprietary mainframe instruction set (which is distantly descended from Bull's DPS-7000 ISA.) Probably 28nm. Up to 48 cores per system with 256GB RAM in the new i-PX AKATSUKI mainframe, released in June 2022.

Competitiveness: No direct competition, since there are no compatible systems from any other vendor. Probably unimpressive on pure compute, but hey, good luck running ACOS-4 on that shiny new M2 Macbook.

Outlook: NEC's roadmap shows at least two more generations of ACOS-4 hardware after i-PX AKATSUKI. They account for about a quarter of mainframe volume in Japan and very little elsewhere, but they have some very large customers and I think at least one of those generations is safe.

Operating systems: ACOS-4 only.