💾 Archived View for dioskouroi.xyz › thread › 29359828 captured on 2021-11-30 at 20:18:30. Gemini links have been rewritten to link to archived content
-=-=-=-=-=-=-
________________________________________________________________________________
Somehow this is connected to
https://news.ycombinator.com/item?id=29367649
I had never thought of this until that thread, but our writing of numbers comes from Arabic, and Arabic is written from right to left. And from our western perspective numbers in Arabic being written left to right is backwards. But in fact, it's us who are backwards: Arabic is written from right to left, and numbers too... from the least significant to the most significant. But we took the numbers from Arabic and kept their left-to-right order and now write them from MSD to LSD...
> But we took the numbers from Arabic and kept their left-to-right order and now write them from MSD to LSD...
roman numbers were already “big endian”, though.
Most significant digit first probably makes the most sense because you can immediately skip past the number after you identify the magnitude (the number of digits can be seen in the periphery of your vision without being read).
My understanding is that in Arabic numbers were also left-to-right which facilitated a left column left-aligned with values and then a right column that with a right-margin with the descriptions. Thus both columns were aligned (for easy scanning and summing) and extra space was in the middle of the page.
"but our writing of numbers comes from Arabic"
Not India?
About this section:
Unfortunately, as the ??? shows, the different output arrays have different lengths – a u16 would be 2 bytes and a u64 8 bytes – and so the Rust trait system at the time of this writing is (to my knowledge) not powerful enough to represent this trait as is.
It seems to me that this is possible using const generics.
If you define your EndianBytes like this:
trait EndianBytes<const N: usize> { fn be_bytes(self) -> [u8; N]; fn le_bytes(self) -> [u8; N]; }
Then provide a concrete length with each impl:
impl EndianBytes<4> for u32 { fn be_bytes(self) -> [u8; 4] { self.to_be_bytes() } fn le_bytes(self) -> [u8; 4] { self.to_le_bytes() } }
(I've renamed their methods to avoid conflicting with u32::to_be_bytes to make it easier to read)
Then that allows you to use the EndianBytes type as long as you remain generic over both the value type and its length:
fn write<T: EndianBytes<N>, const N: usize>(val: T) { println!("{:?}", val.be_bytes()); } fn main() { write(0x1234u16); write(0x12345678u32); }
This code (plus an impl for EndianBytes<2> for u16) works for me on 1.51 stable.
EDIT: I thought about doing this with an associated constant, ie:
trait EndianBytes { const LENGTH: usize; fn be_bytes(self) -> [u8; Self::LENGTH]; fn le_bytes(self) -> [u8; Self::LENGTH]; }
However it turns out that this kind of usage is not quite supported yet:
https://github.com/rust-lang/rust/issues/60551
While I know the performance cost is relatively minimal, it's interesting to think about the collective cycles wasted byte swapping back/forth to big endian for the network...now that little-endian machines dominate the landscape.
Not just instruction cycles, but how many human cycles wasted, too? The years is 2021 and I'm dealing with yet another random bug somewhere because two sides of a serialization don't agree on an order...
At this point I would be surprised if there was any performance or energy cost at all. Every CPU has an endian bit-flipper now.
x86 at least has movbe instructions these days for byte swapping loads/stores, so it's not the worse thing in the world.
I wrote a library to endian swap last century. It was when our swap in test machine was x86 and our machines were PA-RISC or sparc. That combined with some serial port wiring fun made and otherwise great project 10% sucky.
We did it in C and it was pretty fast.
Yeah there are still people who think network protocols have to use "network byte order".
Most recent weird case I discovered is CBOR, which is reasonably well designed (it's hard to get binary JSON very wrong to be fair), but the designers inexplicably decided to use big endian. In 2013.
Lorawan's protocol which is even more recent than that also uses big endian.
Which is a serious head desk. It's seriously nothing but a pain in the ass.
The detour through "struct BigEndian" etc. is an artifact of a built-in limitation on expressiveness in Rust that I gather is soon to be remedied, at least in part, once proper support ("const generics", a weird name; in C++ that is "non-type template parameters") is considered stable enough to commit to.