💾 Archived View for dioskouroi.xyz › thread › 29415499 captured on 2021-12-03 at 14:04:38. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Don't Use Inline Assembly

Author: optimalsolver

Score: 15

Comments: 10

Date: 2021-12-02 12:54:47

Web Link

________________________________________________________________________________

nextaccountic wrote at 2021-12-02 16:23:24:

If you use inline asm for performance, you should probably also write portable code as a baseline, and use conditional compilation to select between hand-optimized inline asm, or portable code. Maybe add in some unit tests too.

It's possible that in the future, with better compilers, the portable code will be actually faster. Even if not, having portable code is a huge plus.

raxxorrax wrote at 2021-12-02 13:06:50:

It has applications on bare-bone embedded systems for handling interrupt routines for example. You still need to check your registers and restore them according to the processors manual. You might also want to test performance since C compilers often generate good and fast code themselves.

Otherwise it really should not be done. It is extremely unlikely that it helps with performance problems. The article list the exceptions. Perhaps add interrupts to that, since you always want them to return as fast as possible. Not really needed on a higher level platforms though.

ncmncm wrote at 2021-12-02 21:04:09:

Anyone involved in coding such systems has nothing to learn from the article.

37ef_ced3 wrote at 2021-12-02 16:23:09:

NN-512 (

https://NN-512.com

) is an example of using intrinsics instead of inline assembly for a performance-critical application, and getting very, very good code generation results from GCC.

productivepizza wrote at 2021-12-02 14:07:37:

Yes, it should not be the first option. But, under certain circumstances it can be very helpful. For example, if you have to support multiple toolchains and multiple architectures. Using inline assembly can let you consolidate many assembly files into 1 C file.

ncmncm wrote at 2021-12-02 14:01:19:

It doesn't say what to do instead: use compiler intrinsics, if portable code won't do. Those get tailored for different target architectures automatically.

nextaccountic wrote at 2021-12-02 16:19:53:

It does, but it calls it builtins instead

> B) To access CPU-specific instructions.

> It is possible that the reason gcc isn't generating the specific instruction you expected is that it is known to be slow, problematical, not supported on the processors you are compiling for, etc. But if there is a reason you need a specific instruction, gcc has builtins for many of the more useful ones. Using these in preference to inline asm allows gcc's optimizers to produce better code. However it is possible that an instruction is new enough or obscure enough that there is no builtin for it.

ncmncm wrote at 2021-12-02 21:00:07:

The most common cause for failure to use the instruction you wanted is that the compiler is told, implicitly, to generate code that would run on a 2001-era AMD Opteron or similarly antediluvian target. Almost every instruction added, since, at great expense, is there to solve a performance problem.

A quiet "-mavx" to unlock these particular shackles fixes numerous performance failings.

HelloNurse wrote at 2021-12-02 16:17:04:

The case of C with "inline" assembly that is actually assembly wrapped in easy tedious parts written in C is probably common, particularly for exotic instructions and instruction sequences as noted in other comments.

afr0ck wrote at 2021-12-02 14:51:26:

It is clear that inline assembly is mostly needed in situations where the compiler doesn't implicitly know how to generate assembly instructions for your use case.

For example, let's say an operating system's kernel has a procedure to disable cache snooping on the CPU. There is no way the compiler can implicitly generate this for you, except if a special built-in function is provided. But built-in functions do not exist for every case, and not for every supported CPU architecture.

In this example, every architecture-specific code of the kernel shall implement an interface named _disable_cache_snooping()_ wrapping an inline assembly statement, invoking the underlying (x86, arm, mips, sparc, riscv, etc.) CPU's special instructions to disable cache snooping.