💾 Archived View for dioskouroi.xyz › thread › 29359828 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content

View Raw

More Information

-=-=-=-=-=-=-

Endianness, API Design, and Polymorphism in Rust

Author: lukastyrychtr

Score: 46

Comments: 14

Date: 2021-11-27 12:59:35

Web Link

________________________________________________________________________________

glandium wrote at 2021-11-29 01:58:32:

Somehow this is connected to

https://news.ycombinator.com/item?id=29367649

I had never thought of this until that thread, but our writing of numbers comes from Arabic, and Arabic is written from right to left. And from our western perspective numbers in Arabic being written left to right is backwards. But in fact, it's us who are backwards: Arabic is written from right to left, and numbers too... from the least significant to the most significant. But we took the numbers from Arabic and kept their left-to-right order and now write them from MSD to LSD...

masklinn wrote at 2021-11-29 05:23:41:

> But we took the numbers from Arabic and kept their left-to-right order and now write them from MSD to LSD...

roman numbers were already “big endian”, though.

mmastrac wrote at 2021-11-29 03:24:54:

Most significant digit first probably makes the most sense because you can immediately skip past the number after you identify the magnitude (the number of digits can be seen in the periphery of your vision without being read).

schuyler2d wrote at 2021-11-29 18:41:58:

My understanding is that in Arabic numbers were also left-to-right which facilitated a left column left-aligned with values and then a right column that with a right-margin with the descriptions. Thus both columns were aligned (for easy scanning and summing) and extra space was in the middle of the page.

ttybird wrote at 2021-11-29 11:07:34:

"but our writing of numbers comes from Arabic"

Not India?

ekimekim wrote at 2021-11-29 06:02:57:

About this section:

Unfortunately, as the ??? shows, the different output arrays have different lengths – a u16 would be 2 bytes and a u64 8 bytes – and so the Rust trait system at the time of this writing is (to my knowledge) not powerful enough to represent this trait as is.

It seems to me that this is possible using const generics.

If you define your EndianBytes like this:

trait EndianBytes<const N: usize> {
        fn be_bytes(self) -> [u8; N];
        fn le_bytes(self) -> [u8; N];
    }

Then provide a concrete length with each impl:

impl EndianBytes<4> for u32 {
        fn be_bytes(self) -> [u8; 4] {
            self.to_be_bytes()
        }
        fn le_bytes(self) -> [u8; 4] {
            self.to_le_bytes()
        }
    }

(I've renamed their methods to avoid conflicting with u32::to_be_bytes to make it easier to read)

Then that allows you to use the EndianBytes type as long as you remain generic over both the value type and its length:

fn write<T: EndianBytes<N>, const N: usize>(val: T) {
        println!("{:?}", val.be_bytes());
    }

    fn main() {
        write(0x1234u16);
        write(0x12345678u32);
    }

This code (plus an impl for EndianBytes<2> for u16) works for me on 1.51 stable.

EDIT: I thought about doing this with an associated constant, ie:

trait EndianBytes {
        const LENGTH: usize;
        fn be_bytes(self) -> [u8; Self::LENGTH];
        fn le_bytes(self) -> [u8; Self::LENGTH];
    }

However it turns out that this kind of usage is not quite supported yet:

https://github.com/rust-lang/rust/issues/60551

tyingq wrote at 2021-11-29 01:06:11:

While I know the performance cost is relatively minimal, it's interesting to think about the collective cycles wasted byte swapping back/forth to big endian for the network...now that little-endian machines dominate the landscape.

hermitdev wrote at 2021-11-29 05:51:05:

Not just instruction cycles, but how many human cycles wasted, too? The years is 2021 and I'm dealing with yet another random bug somewhere because two sides of a serialization don't agree on an order...

VWWHFSfQ wrote at 2021-11-29 02:36:56:

At this point I would be surprised if there was any performance or energy cost at all. Every CPU has an endian bit-flipper now.

monocasa wrote at 2021-11-29 02:00:30:

x86 at least has movbe instructions these days for byte swapping loads/stores, so it's not the worse thing in the world.

acomjean wrote at 2021-11-29 06:01:43:

I wrote a library to endian swap last century. It was when our swap in test machine was x86 and our machines were PA-RISC or sparc. That combined with some serial port wiring fun made and otherwise great project 10% sucky.

We did it in C and it was pretty fast.

IshKebab wrote at 2021-11-29 16:01:01:

Yeah there are still people who think network protocols have to use "network byte order".

Most recent weird case I discovered is CBOR, which is reasonably well designed (it's hard to get binary JSON very wrong to be fair), but the designers inexplicably decided to use big endian. In 2013.

Gibbon1 wrote at 2021-11-30 08:31:59:

Lorawan's protocol which is even more recent than that also uses big endian.

Which is a serious head desk. It's seriously nothing but a pain in the ass.

ncmncm wrote at 2021-11-29 11:15:45:

The detour through "struct BigEndian" etc. is an artifact of a built-in limitation on expressiveness in Rust that I gather is soon to be remedied, at least in part, once proper support ("const generics", a weird name; in C++ that is "non-type template parameters") is considered stable enough to commit to.