Last week was the birthday of the final[1] Itanium chip released - the Poulson microarchitecture. I remember its release on 8 Nov 2012. I also remember reading the manual a few months prior and being underwhelmed. Some of the early materials released about the Poulson microarchitecture - and heavily boosted by a certain someone on RWT's forums, as well as by David Kanter himself - caused me to expect a significant boost in across-the-board performance that would finally propel IPF into competitiveness with Power, and earn its place at the top of Intel's product lineup while ensuring a long future.
The long future part happened - for about eight weeks. Intel announced the Modular Development Model as part of the Poulson launch presentation; the next-generation, 22nm, "Kittson" processor would go into a Xeon-compatible socket and have an uncore shared with future Xeon. Things looked good for an extended roadmap and a soft landing, until the end of January 2013, when Intel suddenly canceled Kittson. I've never been clear on what happened; my best guess is that early Poulson sales were bad enough that HP opted not to fund Kittson's continued development, and since HP made up almost all IPF volume by late 2012, that was it. No new Itanium processor was ever released.
The Poulson microarchitecture was an oddity. Unlike the very-VLIW-oid Merced and Itanium2 cores, Poulson largely abandoned grouping in its mid-end and backend and behaved like a far more conventional in-order core. In most regards the new microarchitecture looked fairly aggressive, though there were certainly red flags raised by the increase in latencies across the core and the removal of two load/store units (which were also capable of simple integer operations.) Intel claimed almost no performance-per-core improvement against the previous-gen Tukwila family; I've always found that claim a little bit suspicious, especially in light of Poulson's much higher clocks. It would not surprise me if Intel was deliberately sandbagging performance claims about a processor they wanted to stop selling. I've never been able to confirm or deny my suspicions on that front, as I haven't (yet) done a SPEC run or other benchmarking on a Poulson machine, and neither Intel nor HP ever submitted any industry standard benchmarks for it.
In other news, I've been trying to track down two HP system codenames mentioned in the internal emails from the Oracle/HP suit. The first, Octane, is brought up in an internal email from Martin Fink, who ran HP's Business Critical Systems unit:
This was a high-tension call. [Intel] found out about Octane. So, they're saying we're building our next generation Mission Critical system on their competitor's product, you're not porting HP-UX, so why should they help us with a soft landing.
The implication that Octane was an Opteron-based mission critical system is hard to ignore. It's surprising to me that HP would go that direction, but I can't think of any other company that would be considered "Intel's competitor" in the same way; if Octane was Power, for instance, that would be HP's competitor far more than Intel's. I can only conclude HP designed, then canceled, a large mission-critical Opteron system to replace Integrity.
This brings us to our next fuel-named HP system, Hydrazine, in another email from Martin Fink, dated May 2009:
He [Bob Kelly @ Microsoft] really liked the idea that they could exit Itanium, move to Hydrazine (8/16 socket x86 with our own chipset) and not give the market the impression that they are exiting the enterprise space. The trigger here is that Microsoft will not support Windows 8 on Itanium. Our goal is to delay any visibility of that fact until we can announce Hydrazine (early next year).
This one is less mysterious. Hydrazine is almost certainly none other than the Proliant DL980, the 8-socket Xeon monster that HP released in early 2010. I never looked closely at the DL980 until recently; I had always assumed it was just a bog-standard gluelessly-connected system built around Intel's Boxboro-EX chipset. I was wrong. DL980 is built on a custom HP node controller architecture, which HP calls PREMA, with close resemblance to the XNC in the Superdome2 - albeit lacking the SD2's attached eDRAM L4. Unlike the SD2, PREMA is directly connected between nodes; there is no central crossbar. I've seen references to the potential scalability of the PREMA architecture to 32 sockets, though I'm not sure it would have scaled especially well with direct node-to-node connections instead of a crossbar.[2] It's odd that the Hydrazine email mentions "8/16 socket", but DL980 only shipped at 8s - it could point to errata that prevented it from scaling higher, or just underwhelming performance at 16s that made it an unviable system in that class.
[1] The Itanium 9700 released in 2017, rather cynically branded "Kittson" following the termination of the actual Kittson project, was an identical die to Poulson.
[2] IBM's Power 795 uses all-to-all connections between 8 4-socket nodes, without even a local node-controller to mediate, but also exposes heroic amounts of bandwidth to accomplish it.