sunset's gemlog
Almost an Embedded Itanium: ST200
During the period immediately before and immediately after the release of the "Merced" Itanium, HP and ST Micro had a close partnership around a new, programmer-friendly embedded CPU called ST200. While neither party ever emphasized the connection, ST200 has an undeniable similarity to Itanium, and one could probably make a strong case for it being an EPIC processor.
Some of the major similarities, from an examination of the ST231 manuals --
- Both are variable-length VLIW-like encodings. Most VLIWs have a fixed-length instruction bundle with fixed-format instructions (sometimes called syllables) contained within it. ST231 has a single stop bit attached to each syllable indicating that the syllable is the end of a bundle, while Itanium uses a 5-bit template field in each 128-bit bundle to specify instruction types and the locations of group terminations.
- Both have speculative load instructions - load ops that can fail without disrupting program execution. On IPF, a speculative load that fails sets the target register's Not a Thing bit - essentially a null value - and the program can check for NaT and optionally re-run the instruction non-speculatively. ST231 also has speculative loads, but no NaT; failed loads simply write back a result of zero, which is kind of weird because 0 can also be a result of a successful memory access.
- Both have long-immediate op forms that consume an additional slot within the bundle. In ST231's case, this requires use of an "immr" or "imml" opcode in a slot adjacent to the syllable requiring an expanded immediate. In Itanium's case, the template mechanism specifies whether an 82b syllable is to be used instead of a pair of 41b syllables.
- Both are register-rich; ST231 has 64 GPRs and Itanium has 128, though a primary function of Itanium's unusually large register set is to enable the fancy register-rotation and register-stack mechanisms that IPF provides. ST2xx has no equivalent to those features.
There are some significant ways they differ, too.
- ST231 syllables are 32b and there's no template field. I suspect this means it's quite a bit denser than IPF, though I'd be curious to see what compilers emit in practice.
- An ST231 bundle - equivalent in the ways that matter to an Itanium instruction group, rather than an Itanium bundle - is 1 to 4 ops. An Itanium instruction group is arbitrarily long, limited only by the underlying parallelism and by ops that insist on being in the first or last slot of a group.
- Portions of the ST231 pipeline are exposed - for instance, changes to the link register won't be visible to the "goto link register" instruction until 2-4 bundles have executed. There are also minimum delays between writing to a predicate register and branching based on that predicate.
- ... and on that note, Itanium has a fully predicated ISA (every syllable has a predicate register field) while ST231's 8-register predicate register file is generally only used by branches (which are, themselves, unconditional.) ST231 generally requires a flow of cmp-to-predicate -> branch-with-predicate.
Itanium is dead, and ST2xx looks like it isn't far behind; ST announced in 2016 that they would exit the set-top box market, and I'm not sure if any part of ST's future roadmap includes ST2xx. It's interesting to see a likable, normal VLIW ISA in an embedded core, though, especially compared to bizarre instruction sets that exist elsewhere in the embedded VLIW space.[1]
[1] I'm referring to, among others, an 8-wide VLIW split into two halves, each with their own register file with limited cross-connect, and with almost no concessions to programmers expecting the familiar comforts of home; after all, a near-complete lack of pipeline interlocks builds character. The family will remain nameless to protect the guilty, but the truth is out there...