💾 Archived View for dioskouroi.xyz › thread › 29418153 captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

-=-=-=-=-=-=-

New SiFive RISC-V core P650 with 40% IPC increase

Author: FullyFunctional

Score: 175

Comments: 81

Date: 2021-12-02 16:21:26

Web Link

________________________________________________________________________________

snvzz wrote at 2021-12-02 17:28:22:

Some context: RISC-V Summit is next week, and RISC-V international has just approved a batch of important extensions[0]. With these extensions, RISC-V is not missing anything relative to ARM and x86 ISAs in terms of functionality.

I expect a lot of tape-outs to happen this month, as core vendors were probably holding for the announced ratifications, in fear of last minute changes. Next year is going to be exciting.

[0]:

https://riscv.org/announcements/2021/12/riscv-ratifies-15-ne...

monocasa wrote at 2021-12-02 17:51:35:

I wouldn't say RISC-V isn't missing anything. The lack of add/subtract with carry is an issue for efficient runtime of many JITed languages like JavaScript.

That being said, I don't think it's the worse thing in the world like some do. The focus now should be on compiled code since JITs by definition can make runtime descions on if some future extension that fixes this deficiency exists or not. The J extension has stalled for the moment, but with these other extensions ratified there should be more bandwidth available hopefully.

teruakohatu wrote at 2021-12-02 21:54:45:

Can't vendor's making desktop/mobile class CPUs detect the equivalent pattern and optimize it in microcode or silicon?

Or is that what we are trying to get away from?

monocasa wrote at 2021-12-02 22:08:57:

Maybe, but it's a leap, IMO. The equivalent patterns are 3x as long, and modify tons of arch visible state for their intermediate results which leaves more work for those combined instructions to do.

The complaint is valid, IMO, and would show up on the filtration test they used to come up with ops if they were working with JITs too rather than just what's in AOT code.

userbinator wrote at 2021-12-03 01:11:05:

It can try... but you're basically trying to "decompile" or "compress" code to a higher level, and that's not easy nor efficient. If something relatively simple like ADC is difficult, think of something like an entire encryption/hash round, which competing CISC processors already have dedicated instructions for. In the case that you do manage to make that work, there's still the matter of those extra instructions taking up valuable space in caches and memory bandwidth.

Hence why I don't think "RISC is the future" unlike a lot of other proponents; I think a CISC with uop-based decoding will be more scalable and performant. Even ARMs have moved a little in that direction.

wbl wrote at 2021-12-03 13:20:29:

Classic CISC processors like the VAX had lots of memory to memory instructions complex looping constructs etc. Special ops that are register to register aren't anti-RISC.

adgjlsfhk1 wrote at 2021-12-03 01:54:15:

Note that RiscV has just gotten the cryptography extension finalized.

throwaway81523 wrote at 2021-12-03 05:53:30:

> Can't vendor's making desktop/mobile class CPUs detect the equivalent pattern and optimize it in microcode or silicon?

The riscv stans keep saying that, but nobody has given a demo or shown benchmarks afaik, even under simulation. So it's just handwaving.

It's not only javascript, of course. int overflow in C is an error condition (undefined behaviour) that compilers usually don't try to trap (the -trapv option in gcc and clang enables trapping at some performance cost, so it's rarely used and we get continuing bugs and vulnerabilities as a result. Ada mandates trapping unless you enable an unsafe optimization which is, um, enabled by default in GNAT). Riscv increases that performance cost considerably from what I can tell. That's the opposite of what we needed.

I'm no CPU architect but I know they are able to signal overflow in floating point arithmetic, since IEEE 754 requires that. So I don't understand why they can't do it for integers.

feffe wrote at 2021-12-03 13:29:08:

Interestingly the MIPS CPU traps on overflow for the add and sub instructions. You have to use the addu or subu instructions to get the usual behavior of overflow.

MIPS is kind of the spiritual ancestor to RISC-V

jhgb wrote at 2021-12-03 12:57:44:

Isn't the obvious solution to the problem of overflows to define the behavior like pretty much all newer languages did it (presumably because they learned from the errors committed by C)?

socialdemocrat wrote at 2021-12-02 17:42:33:

That is great news! Is there any friendly intro/coverage anywhere of the new vector extension?

I am curious about the final design. Would be interesting to hear how people think it compares with ARMs scalable vector extensions.

snvzz wrote at 2021-12-02 17:54:49:

There's been a few talks on the topic. They're archived in e.g. youtube.

I like it. It's fairly simple and clean, yet powerful.

There was also some discussion here in HN months ago, about an article comparing RISC-V V extension and ARM SVE.

The article itself got several things wrong about V, but the discussion[0] was interesting.

[0]

https://news.ycombinator.com/item?id=27063748

sdbbp wrote at 2021-12-03 06:52:03:

This in-depth presentation is good:

https://www.youtube.com/watch?v=oTaOd8qr53U

marcodiego wrote at 2021-12-02 19:13:18:

Faster than ARM A-77:

https://www.phoronix.net/image.php?id=2021&image=sifive_p650...

. Performance comparable to Apple Icestorm architecture, the 'efficiency' cores in M1. Considering A-710 is the fastest ARM core currently available and its successor will only be available next year, SiFive is just a few years before real competition starts in an arena currently dominated by ARM.

This will be beautiful to watch.

zozbot234 wrote at 2021-12-02 20:36:31:

It will be interesting to see a comparison on power-efficiency as well as performance. RISC-V implementations have shown a pretty sizeable advantage wrt. power use in the past, and we don't quite know how this advantage compares in these larger, performance-focused designs.

DeathArrow wrote at 2021-12-03 05:53:24:

Power efficiency and performance depends very much on process node.

If Apple bought all future 3nm capacity from TSMC, good luck trying to compete.

dmitrygr wrote at 2021-12-02 21:13:47:

> just a few years before real competition starts

Are you assuming the competition will just sit and do nothing?

GhettoComputers wrote at 2021-12-02 22:09:10:

Good enough” matters more than benchmarks. They can make supercomputers but it doesn’t matter to someone who wants a $100 computer.

DeathArrow wrote at 2021-12-03 05:54:29:

Raspberry Pi is $50.

StreamBright wrote at 2021-12-03 09:10:27:

Sure, have you actually used one? There some challenges with the software support of RPIs especially model 4 and GPU drivers. I would like to see a platform (potentially RISCV) that has great software support and I could finally use one of these devices as a replacement for TV set top boxes running Android.

klelatti wrote at 2021-12-03 09:24:52:

If the issue is with GPU drivers then why would a RISC-V cpu make any difference?

There is a lot of wishful thinking that using RISC-V magically makes all of the SOC more open. It doesn’t!

StreamBright wrote at 2021-12-03 12:17:04:

Because a CPU architecture does not exist in a vacuum and RISCV’s marketing is about being open, so I expect they are going to have the rest of the SOC as open as well. I guess it will be easy enough to have all the drivers part of Linux.

I agree, this is wishful thinking.

dmitrygr wrote at 2021-12-02 22:23:16:

All riscv thingies i see today are decidedly not $100. I do see plenty of arm designs running linux under $10 though

LeifCarrotson wrote at 2021-12-02 23:16:17:

There are several cheap ones listed here:

https://riscv.org/exchange/

such as the $30 Sparkfun Red or the $20 Lofive boards. Those are for running an RTOS, not Linux, but they compete with Arduino, mbed, teensy, and other ARM Cortex M series microcontrollers.

A price target of $10 is something you'll only hit with massive scale-up.

dmitrygr wrote at 2021-12-03 01:19:54:

I was responding to "someone who wants a $100 computer"

Those are not computers

bruce343434 wrote at 2021-12-02 18:21:33:

With a projected score of 11+ SPECInt2006/GHz

That seems to imply a certain integer arithmetic performance, but I wonder what the floating point performance is. They could have just said "X flops".

Comparing to other benchmarks at [1], I have no idea, because they all have denormalized results, so totals, rather than per GHz per core. Nice reporting.

How fast is this thing? Pentium? first gen i3? current gent ryzen 5? The fact that they are being so obtuse about it leads me to believe performance isn't great.

[1]

https://www.spec.org/cgi-bin/osgresults?conf=cint2006;op=dum...

DeathArrow wrote at 2021-12-03 05:58:20:

A little lower than Pentium 4 640.

wmf wrote at 2021-12-02 20:43:06:

I'd compare it to an Atom "efficiency" core.

danielEM wrote at 2021-12-02 17:40:37:

Once it gets to the shelfes at reasonable price will be happy to work with/on it.

Curious how IP pricing compares to ARM in this case and how much would I need to put on top of it to tape out own batch of processors

snvzz wrote at 2021-12-02 17:58:45:

The license to the ISA itself is free.

There's several vendors besides RISC-V offering cores for licensing. There's even some OSHW cores that can be freely used.

Even if we choose to ignore the technical prowess of being a true 5th generation RISC ISA built with hindsight no other ISA has, what's IMHO a big deal in RISC-V is the mere availability of this market of cores.

It poses a threat to ARM's business model, where ARM licenses cores and ISA, but nobody else than ARM can license cores to others.

fartcannon wrote at 2021-12-02 18:30:13:

So I guess we should expect to hear a lot of FUD about RISC-V over the coming years.

marcodiego wrote at 2021-12-02 18:52:05:

No need to wait. Already happened in 2018:

https://www.theregister.com/2018/07/10/arm_riscv_website/

https://www.extremetech.com/wp-content/uploads/2018/07/arm-r...

snvzz wrote at 2021-12-02 18:59:09:

And it is how many learned about RISC-V's existence.

It will be a PR disaster long remembered. One for the textbooks.

jhgb wrote at 2021-12-03 13:03:32:

I find it amusing that RISC-V allegedly creates "fragmentation risk" when platform fragmentation in the ARM ecosystem already exists and it's painful enough -- at least that's what I recall from some comparisons with the x86/PC platform with respect to Linux kernel development.

fartcannon wrote at 2021-12-03 04:04:58:

Why do people fall for this shit? Blows my mind.

snvzz wrote at 2021-12-02 18:41:12:

This is a real possibility, albeit a sad one.

No amount of FUD will save ARM. Only pivoting into a different business model could.

duskwuff wrote at 2021-12-02 20:06:27:

Honestly, ARM is fine. They're no longer the only game in town, but they've still got a huge head start.

snvzz wrote at 2021-12-02 21:06:00:

They'll be fine if they focus on their microarchitectures rather than the ISA (where IMHO they've already lost), and make the process for obtaining a license much more streamlined; I've heard it takes no less than 18 months of long negotiations to license anythin from ARM. That's not sustainable now that there's competition.

duskwuff wrote at 2021-12-02 21:24:12:

That's already where their focus is. Most of ARM's customers are licensing specific cores from ARM, not the ISA as a whole.

klelatti wrote at 2021-12-03 09:30:05:

> where IMHO they've already lost

Given M1, Graviton etc etc that’s a bold statement.

snvzz wrote at 2021-12-03 12:51:33:

High performance implementations are possible even with bad ISAs, given enough resources.

x86-64 is much worse than ARM. It's a literal clusterfuck. And yet.

A high performance implementation of ARM, which is a much better ISA than x86-64, was something expected to happen sooner or later. It did not surprise me.

klelatti wrote at 2021-12-03 13:12:17:

Fair enough but I’m still not sure why you think the Arm ISA has ‘lost’?

Teknoman117 wrote at 2021-12-02 22:49:52:

As far as OSHW cores go, it's so very nice to be able to throw something together in verilog and be able to inherit a compiler and not be trampling on someone else's copyright...

dmitrygr wrote at 2021-12-02 18:43:59:

> built with hindsight no other ISA has

Why do all the riscv fans Conveniently ignore aarch64 when they make statements like this?

It was in fact a completely clean new design, based on hindsight, by people who know what they are doing, and with no legacy Cruft.

brucehoult wrote at 2021-12-02 22:05:02:

Aarch64 obviously _isn't_ a completely clean sheet design. It was constrained by having to execute on the same CPU pipelines as 32 bit code, at least for the first decade or so. And the 32 bit mode has to perform well. There are tens of millions of Raspberry Pi 3s and 4s (and later model Pi 2s) which have 64 bit CPUs but have never seen a 64 bit instruction in their lives. Android phones have been supporting both 32 and 64 bit apps for a long time.

The "by people who know what they are doing" thing is just pure FUD. Sure, ARM employs some competent people, but no more so than IBM, Intel, AMD or the various members of RISC-V International.

FullyFunctional wrote at 2021-12-02 19:19:47:

I'm a fan of RISC-V but the freedom is a large part of it. Aarch64 _is_ a very well designed ISA and _clearly_ has a lot of benefit of hindsight. The load pair/store pair instructions, the addressing modes, fixed 32-bit instruction size, etc. It all really helps. I suspect that Apple was actively part of designing it.

I think however that RISC-V isn't that much worse and because of the freedom we will almost certainly see more implementation of RISC-V. I'd be watching Tenstorrent, SiFive, Rivos, Esperanto, and maybe Alibaba/T-Head.

snvzz wrote at 2021-12-02 18:53:04:

>Why do all the riscv fans Conveniently ignore aarch64 when they make statements like this? It was in fact a completely clean new design, based on hindsight, by people who know what they are doing, and with no legacy Cruft.

aarch64 seems poorly designed to me.

ARMv7 had thumb, but for some reason ARMv8 did not incorporate any lessons from that. As a result, code density is bad; ARMv8 binaries are huge.

ARMv9, to be available in chips next year, is just a higher profile of required extensions, and does nothing to fix that.

Ever wonder why M1 needs such huge L1 cache? Well, now you know.

Considering ARMv9 will be competing against RVA22, I don't have much hope for ARM.

adrian_b wrote at 2021-12-02 20:16:50:

ARMv8 code density is quite good for a fixed-length ISA and is of course much better than that of RISC-V.

RISC-V has only one good feature for code density, the combined compare-and-branch instructions, but even this feature was designed poorly, because it does not have all the kinds of compare-and-branch that are needed, e.g. if you want safe code that checks for overflows, the number of required instructions and the code size explode. Only unsafe code, without run-time checks, can have an acceptable size in RISC-V.

ARMv8 has an adequate unused space in the branch opcode map, where combined compare-and-branch instructions could be added, and with a larger branch offset range than in RISC-V, in which case the code size advantage of ARMv8 vs. RISC-V would increase significantly.

While the combined compare-and-branch of RISC-V are good for code density, because branches are very frequent, the rest of the ISA is bad and the worst is the lack of indexed addressing, which frequently requires 2 RISC-V instructions instead of 1 ARM instruction.

snvzz wrote at 2021-12-02 20:28:55:

>in which case the code size advantage of ARMv8 vs. RISC-V would increase significantly.

Many things could be said about ARMv8, but that it has good code size is not one of it. It does, in fact, have abysmal code density. Both RISC-V and x86-64 produce significantly smaller binaries. For RISC-V, we're talking about a 20% reduction of size.

There's a wealth of papers on this, but you can verify this trivially yourself, by either compiling binaries for different architectures from the same sources, or comparing binaries in Linux distributions that support RISC-V and ARM.

>where combined compare-and-branch instructions could be added, and with a larger branch offset range than in RISC-V

If your argument is that ARMv8 could get better over time, I hate to be the bearer of bad news. ARMv9 code density isn't any better.

>and the worst is the lack of indexed addressing, which frequently requires 2 RISC-V instructions instead of 1 ARM instruction.

These patterns are standardized, and they become one instruction after fusion.

RISC-V, unlike the previous generation of ISAs, was thoroughly designed with hindsight on fusion. The simplest microarchitectures can of course omit it altogether, but the cost of fusion in RISC-V is low; I have seen it quoted at 400 gates.

brucehoult wrote at 2021-12-02 21:36:24:

Instruction fusion is a possibility for the future, which has been discussed academically, but no one implements it at present. I'm not sure anyone will -- it's too much complexity for simple cores, and not needed for big OoO cores.

The one fusion implementation I'm aware of if the SiFive 7-series combining a conditional branch that jumps forward over exactly one instruction. It turns the instruction pair into predicated execution.

I agree with everything else. In particular the code density. Anyone can download Ubuntu or Fedora images for the same release for amd64, arm64, and riscv64. Mount them and run "size" on any selection of binaries you want. The RISC-V ones are consistently and significantly smaller than the other two, with arm64 the biggest.

brucehoult wrote at 2021-12-02 22:15:23:

I'm not sure how you missed RISC-V's big feature for code density -- the "C" extension, giving it arbitrarily mixed 16 and 32 bit opcodes.

I've heard of that feature before somewhere else. It gave the company that invented it unparalleled code density in their 32 bit systems and propelled them to the heights of success in mobile devices. What was their name? Wait .. oh, yes ... ARM.

Why they forgot this in their 64 bit ISA is a mystery. The best theory I can come up with is that they thought the industry had shaken out and amd64 was the only competition they were going to have, ever. Aarch64 does indeed have very good code density for a fixed-length 32 bit opcode ISA, and comes very close to matching amd64. They may have thought that was going to be good enough.

Note: the RISC-V "C" extension is technically optional, but the only CPU cores I know of that don't implement it are academic toys, student projects, and tiny cores for use in FPGAs where they are running programs with only a few hundred instructions in them. Once you get over even maybe 1 KB of code it's cheaper in resources to implement "C" than to provide more program storage.

lucian1900 wrote at 2021-12-02 23:11:06:

Unfortunately, variable length opcodes are a problem for wide superscalar machines, i.e. the fast ones.

ruslan wrote at 2021-12-03 00:56:58:

Speaking about RISC-V, no it is not. In RISC-V "C" all 16 instructions have their 32 bit counterparts. When front-end reads in an instruction word (32 bits) it extracts two 32 bit ops from it then feeds them serially to decoder. So, there's only one same decoder that does the work both for 16 and 32 bit ops (basically it does not distinguish them), and that's also what makes macro op fusion possible and easy to implement, unlike ARM's Thumb which has two separate decoders with all the consequences.

dmitrygr wrote at 2021-12-03 01:20:57:

ARM literally documented thumb as using the exact mechanism you just claimed they do not have and riscv does. Suggest reading of ARMv4T spec

seoaeu wrote at 2021-12-03 03:53:25:

But not _that_ much of a problem. x86 is way, way worse about variable length opcodes than RISC-V and there are plenty of fast x86 processors...

zozbot234 wrote at 2021-12-02 20:47:31:

The thing with lack of shifted indexed addressing is that it just might not matter all that much beyond toy examples. Address calculations can generally be folded in with other code, particularly in loops which are a common case. So it's only rarely that you actually need those extra instructions.

adrian_b wrote at 2021-12-02 21:56:12:

Shifted indexed addressing is needed more seldom, but indexed addressing, i.e. register + register, is needed in every loop that accesses memory.

There are 2 ways of programming a loop that addresses memory with a minimum of instructions.

One way, which is preferable e.g. on Intel/AMD, is to reuse the loop counter as the index into the data structure that is accessed, so each load/store needs a base register + index register addressing, which is missing in RISC-V.

The second way, which is preferable e.g. on POWER and which is also available on ARM, is to use an addressing mode with auto-update, where the offset used in loads or stores is added into the base register. This is also missing in RISC-V.

Because none of the 2 methods works in RISC-V with a minimum number of instructions, like in all other CPUs, all such loops, which are very frequent, need pairs of instructions in RISC-V, corresponding to single instructions in the other CPUs.

brucehoult wrote at 2021-12-02 22:26:01:

A big difference here is that the RISC-V instructions are usually all 16 bits in size while the Aarch64 and POWER instructions are all 32 bits in size. So the code size is the same.

Also, high performance Aarch64 and POWER implementations are likely to be splitting those instructions into two decoupled uops in the back end.

Performance-critical loops are unrolled on all ISAs to minimise loop control overhead and also to allow scheduling instructions to allow for the several cycle latency of loads from even L1 cache. When you do that, indexed addressing and auto-update addressing are still doing both operations for every load or store which, as well as being a lot of operations, introduces sequential dependency between the instructions. The RISC-V way allows the use of simple load/store with offset -- all of which are independent of each other -- with one merged update of each pointer at the end of the loop. POWER and Aarch64 compilers for high performance microarchitectures use the RISC-V structure for unrolled loops anyway.

So indexed addressing and auto-update addressing give no advantage for code size, and don't help performance at the high end.

crest wrote at 2021-12-03 00:35:35:

While I haven no personal experience writing aarch64 assembler code my experience with ARM v6m an v7m makes me doubt your implied insult that ARM just failed/didn't give a fuck about their instruction set. Thumb 1 and 2 are well designed instruction sets optimized for a certain kind of uarch. Almost all quirks exposed to the low level programmer are there for good reasons and while some of the constraints are a pose a challenge for compiler writers they are not beyond the capabilities of GCC or LLVM. There are several possible reasons for ARM to return to a fixed length 32 bit encoding e.g. to allow very wide OoO designs like Apple's Firestorm cores or because the gain is smaller for 64 bit code with larger constants and better served by PC relative constant pools. And while the quirky LDMIA function prologue is very flexible, appeals to me as assembler programmer and saves code space having a single instruction potentially modify most integer registers as well as change the program counter and the active instruction set is hard to implement well while easier to implement register pair load/store instructions are enough for most common instruction sequences. The tradeoff was different for in-order ARM2/3 CPUs with single ported memory and a tiny unified cache (if that).

throwaway81523 wrote at 2021-12-03 06:01:20:

> Thumb 1 and 2 are well designed instruction sets

I had thought that Thumb 1 had serious shortcomings, which is why they ended up needing Thumb 2.

pohl wrote at 2021-12-02 19:06:26:

_Ever wonder why M1 needs such huge L1 cache? Well, now you know._

I'm not sure I follow this, but it reminds me to ask: does RISC-V allow for designs to have both efficiency & performance cores like the ARM big.LITTLE concept? Has anyone made one yet?

brucehoult wrote at 2021-12-02 21:44:27:

Of course you can do it. SiFive has been allowing customers to configure core complexes with a mixture of different core types for years -- for example mixing U84 cores with U74 or U54. If you want to do a BIG.little thing with transferring a running program from one core type to another that's just a software thing -- and using cores with the same ISA but different microarchitecture.

To date the examples of this that have been shipped to the public have used cores with similar microarchitecture, but a different set of extensions.

For example the U54-MC in the HiFive Unleashed and in the Microsemi Polarfire SoC FPGAs use four U54 cores plus one E51 core for "real time" tasks. The E51 doesn't have an FPU or MMU or Supervisor mode. The U74-MC in the HiFive Unmatched is similar.

Alibaba's ICE SoC, which you may have seen videos of running Android, has two C910 Out-of-Order cores (similar to ARM A72/A73) implementing RV64GC, and a third C910 core that also has a vector processing unit with two pipes with 256 bit vector ALU each, plus 128 bit vector load and store pipes.

dmitrygr wrote at 2021-12-02 19:01:19:

> for some reason ARMv8 did not incorporate any lessons from that.

I used to think so too, until I asked some more knowledgeable people about it. Turns out the lesson _IS_ that not having it is better. Fixed-sized instructions make a decoding significantly simpler, making it much easier to make very wide front ends

brucehoult wrote at 2021-12-02 21:55:54:

A little easier, not much easier. A number of organisations are making very wide RISC-V implementations, and one has already published how their decoder works. It's modular, with each block looking at 48 bits of code (the first 16 overlapping with the previous block) and decoding either two 16 bit instructions, or one aligned 32 bit instruction, or one misaligned 32 bit instruction with a following 16 bit instruction, or one misaligned 32 bit instruction followed by an ignored start of another misaligned 32 bit instruction.

You can put as many of these modules side by side as you want. There is a serial dependency between them in that each block has to tell the next block whether its last 16 bits are the start of a misaligned 32 bit instruction or not. That could become an issue with really really wide but for something decoding e.g. 16 bytes at a time (4 to 8 instructions) it's not an issue.

There is a trade-off between a little bit of decoder complexity and a lot of improved code density -- but nowhere near to the same extent as say x86.

jaas wrote at 2021-12-02 20:30:17:

Who exactly are the customers for this chip?

ruslan wrote at 2021-12-03 00:29:28:

The press-release does not say anything about physical chip, but a licensable core that can be used to build SoCs. Here SiFive acts same way as ARM does - sells cores.

socialdemocrat wrote at 2021-12-02 17:43:59:

Anyone able to put this in context? How fast are these cores compared to various ARM, Intel and AMD cores? At what level can they compete?

sanxiyn wrote at 2021-12-02 17:59:48:

> With a projected score of 11+ SPECInt2006/GHz, the SiFive Performance P650 brings RISC-V into a new category of high-end computing applications.

11+ SPECInt2006/GHz is comparable to Apple Icestorm microarchitecture. Apple Firestorm microarchitecture is roughly 2x better at 22 SPECInt2006/GHz.

Symmetry wrote at 2021-12-02 21:57:57:

How impressive that number is rather depends on how many GHz they're managing. In general the slower you design your clock to clock, the faster you can make all your caches. Plus the slower you clock your core, designed in or not, the lower the number of clock cycles it takes to talk to main memory.

pantalaimon wrote at 2021-12-02 19:17:48:

Mind you that raw core performance is not everything, memory bandwidth and caches are crucial to make sure the CPU isn't waiting for data all the time.

sanxiyn wrote at 2021-12-02 19:28:25:

Yes, but SPECint includes all such effects. As long as SPECint benchmarks (such as GCC) are representative of your workload, it works fine.

tlb wrote at 2021-12-02 20:47:24:

I trust that the Apple benchmarks include all such effects. I'm less convinced that the RISC-V "projections" include them. SPECint2006 is supposed to be measured with real memory and an OS. Per-GHz numbers can't accurately reflect main memory latency, since its speed doesn't scale with the CPU clock.

spear wrote at 2021-12-02 21:26:52:

Right, and "per GHz" numbers are also not very useful because you can't just crank up the GHz when you need performance. Even with the same process technology, you can't assume different microarchitectures will max out at the same frequency.

snvzz wrote at 2021-12-02 23:16:19:

You're right, and remarkably Apple has found a major roadblock to clock speeds while using ARMv8.

M1's L1 cache is huge, as a workaround to ARMv8's poor code density. Larger cache means lower clocks, unfortunately there's no way around speed of light.

ksec wrote at 2021-12-03 10:18:31:

>You're right, and remarkably Apple has found a major roadblock to clock speeds while using ARMv8.

What? Where is this claim coming from?

hajile wrote at 2021-12-03 00:57:47:

They claim it's slightly faster than A77. That would have the IPC getting pretty close to AMD's Zen 1 chips (though probably at a lower peak frequency).

sebow wrote at 2021-12-02 17:51:44:

If i recall correctly the sifive unmatched is still pretty slow compared to ARM(

https://www.phoronix.com/scan.php?page=article&item=hifive-u...

).Now this board is not the one in question(P650) but we'll have to observe upcoming benchmarks [for which i recommend phoronix]

Obviously you can't even think about comparing it further with Intel & AMD, but when you look at the history of something like ARM(which i believe is 30-40 years old), riscv came a long way pretty fast, and the good thing it's a solid choice for the future due being open.

baybal2 wrote at 2021-12-02 17:54:25:

This is something genuinely interesting from riscv crowd for the first time

sebow wrote at 2021-12-02 17:47:09:

Sweet, are there any resources on transitioning/migrating or differences between x86_64 and riscv; or the ISAs are drastically different that it's just better to dive in head-first?

ruslan wrote at 2021-12-03 01:02:38:

It's same as x86_64 vs aarm64. If you feel comfortable with aarm64, switching to RV64 will go without a notice.

Note, that switching to ARM from x86 is a pain, esp if you depend on proprietary software.

DeathArrow wrote at 2021-12-03 06:00:53:

We expected new developments since ages ago.

I guess the RISC-V will conquer the desktop the same year Linux will.