
Big-Endian “is effectively dead” - userbinator
http://thread.gmane.org/gmane.linux.kernel/1930358/focus=1937733
======
weland
Linkbait much? Further down below, Torvalds explains precisely why BE is
anything _but_ dead. All that's dead is general-purpose CPUs using it, mostly
because x86 happened.

[http://geekandpoke.typepad.com/geekandpoke/2011/09/simply-
ex...](http://geekandpoke.typepad.com/geekandpoke/2011/09/simply-
explained-1.html)

(Edit: which is precisely why I prefer BE for anything, unless there is a
pressing - usually hardware- and performance-related - requirement for the
opposite. Computers are good at reading bytes in swapped order. I suck at it.)

~~~
justincormack
Well, and IBM switched Power to little endian recently (the machines are dual
endian, but all the new OSs are littl endian only). Power was the main BE
architecture left. Most arm machines are dual endian, and NetBSD supports
both, but almost everyone uses little endian on arm. There is a bit of big
endian mips around still, eg the Cavium network appliances.

~~~
bandrami
IBM also for years screamed at devs that you can't rely on a PPC system being
BE. Which means all oldworld MacOS software was at least theoretically written
by people who couldn't tell you what byte order they were writing for. (How
seriously the devs took that admonition is a different question.)

~~~
duskwuff
The PowerPC _architecture_ was bi-endian, but classic Mac OS was very
definitely big-endian. The only situation where the CPU would switch to
little-endian would be if it was running Virtual PC. (And even then, it'd
switch back for other applications.)

~~~
bandrami
Sorry, yeah, I stated that badly:

 _Apple_ kept screaming at devs that _IBM_ couldn't be guaranteed to keep the
arch bi-endian, so _Apple_ couldn't guarantee a big-endian platform and
developers shouldn't assume one.

------
bandrami
ARM still lets you switch endianness, but nobody other than me ever seems to
(and they claim they will probably deprecate that going forward).

What is really alarming to me is that I occasionally run into middle-endian
systems on 64-bit chips (two little-endian doubles in big-endian relative
order, to signify a single quad). This is an abomination and must be killed
with fire.

~~~
w8rbt
They do. They are bi-endian and default to little on most systems I have seen.
The only real big endian system I have left is an old sun Netra.

~~~
tritium
And it's funny you should mention Sun Microsystems, because this is the
influence behind Java's endian-ness, and subsequently Dalvik (mentioned below
in another comment), which happens to be... you guessed it... big endian.

So the big endian gene lives on in the JVM and its relatives.

~~~
gradstudent
Java is seriously nasty if you need to work with bitpacked representations of
data structures. Big endian nonsense, no unsigned types, booleans that cannot
be converted to integer types, no typedefs... gahhhh! It's really freaking
hard to write optimised code in this stupid language.

~~~
Alupis
> no unsigned types

Java 8 has unsigned types[1]

> boolean that cannot be converted into integer types

Don't know why you would want to do that when a boolean is a primitive type...
and also has an object variant, but...

int myInt = (myBoolean) ? 1 : 0;

> no typedefs

A typedef is really like a java bean or class object... it's just a custom
data structure.

> It's really freaking hard to write optimised code in this stupid language.

Not true, some of the most highly performant systems on the planet run Java...
HFT, stock exchanges, banking, nuclear plant control systems, etc...

There's also the JVM with it's optimizing compiler... one of the best (the
best?) optimizing compilers around. Long running Java applications eventually
compile hot paths down to native machine instructions, achieving C performance
without a lot of the hassle.

But then again, Java wasn't intended to do bit-twiddling, it's a higher
abstraction.

[1]
[https://docs.oracle.com/javase/8/docs/api/java/lang/Integer....](https://docs.oracle.com/javase/8/docs/api/java/lang/Integer.html#toUnsignedLong-
int-)

~~~
TheLoneWolfling
> Java 8 has unsigned types

Right. Now try that with a long, and you're replacing every piece of code with
something that's, what, 20x more verbose? If not more? And becuase of how bad
the JVM is, substantially slower.

Look at JGit.

> int myInt = (myBoolean) ? 1 : 0;

Now try to do that throughout the code. Not to mention which, that's a
conditional branch for what should be (and is in bytecode) a no-op. (Well,
assuming your bytecode is well-conditioned. But the JVM being the JVM, there's
no such thing as a boolean, which means you can have the "boolean" value 2,
for instance, which seriously messes all sorts of things up.)

Sure, the JVM is supposed to optimize that out. But you can't rely on it being
able to do so. At least not if you're not going to use a mainstream JVM.

And no, the JVM is not "one of the best" optimizing compilers around. It's not
even a _good_ optimizing compiler. Nowhere near. For a quick counterexample,
this:
[https://github.com/RS485/LogisticsPipes/commit/bb8a57665c4f8...](https://github.com/RS485/LogisticsPipes/commit/bb8a57665c4f80a76fc1911d77ae39153783fdc3)

Halving the time taken. Why? Because the JVM wasn't smart enough to realize
that copying an EnumSet for a readonly foreach loop was unnecessary. Oh, and
there's more low-hanging fruit there w.r.t. the amount of work (read:
reflection) EnumSet has to do behind the scenes to work around Java's type
system. Simple. But no, the Java compiler doesn't optimize it out, and the JVM
doesn't either.

And that's _with_ adding an additional unnecessary layer of abstraction
(unmodifiableSet) that you wouldn't need if Java had a sane type system.

Copying an EnumSet of a small number of elements would be, in a sane language,
quite literally just a register move. Ditto, containsAll a bitwise-not and a
bitwise-and. But no, "one of the best" optimizing compilers around cannot even
do that.

Trying to write high-performance code in Java generally requires tossing out
all the supposed advantages of Java. Write your own object pools, whee!
Hardcode your own primitive types, whee! Avoid temporary objects, whee! Avoid
using polymorphism, whee! Manually unpack arrays inside objects, whee!

~~~
mreiland
My experience with a lot of 'Javaheads' is that they get a little _too_
emphatic about Java's optimizations, most especially the claim that it can out
perform C.

BUT

In this case, to be fair, I took his statement to include Hotspot, which can
do some pretty cool stuff provided you meet the requirements for it. ie, be
long running, have enough memory and horsepower available on the machine to
run hotspot, and have paths through the code that rarely jump around (meaning
most executions go through the same path).

If you can meet those requirements, my understanding is that the Java
_ecosystem_ does a damned fine job.

The issue is that a lot of javaheads will extrapolate that out to the rest of
the language and tech and start making claims about Java being the best
overall at X, or as fast as language Y (C or C++, take your pick).

~~~
TheLoneWolfling
This was something that was supposed to be about the single best case for
hotspot: a single code path almost always, inside a single while loop that's
doing the same thing over and over, with a couple sanity / bail-out checks
that are rarely (if ever) called.

And it _still_ didn't optimize something as trivial as avoiding an unnecessary
copy that it was taking ~50% of the time doing.

It's too bad - Java is a fine language in many ways (though it tends to be
rather overly verbose for no good reason, but meh. Looking at you getters and
setters and lack of operator overloading), but it's saddled with a reliance on
the arcane to actually get non-hideous performance out of it. I mean: 13ms per
copy of what should be a single integer? (Milliseconds! I'm not joking. 595ms
inside 47 calls to EnumSet.copyOf (mainly inside Object.clone))

That's, and I'm saying this quite literally, more than a _million_ times
slower than what it should be.

Assuming it needs to be done at all, and you can trivially show that it
doesn't.

(That being said, I need to explicitly check that Hotspot does do it's full
optimization pass on that chunk of code. I see no reason why it wouldn't, but
maybe Hotspot doesn't want to. Though that'd be a WTF in and of itself.)

(On a side note: does Java cache hotspot optimizations? I think it does, in
which case there's definitely no excuse. And if it doesn't that's a wtf in and
of itself.)

(On another side note: is there a Java bytecode-to-bytecode optimizer that'll
do optimizations based on the code you've got in front of you _now_?)

~~~
mreiland
I honestly don't know too much about Java and its technologies outside of a
general understanding of it. I specifically chose to stay out of the Java
ecosystem years ago because I disliked the Java community as a whole. They had
a real beef with C and C++ being more performant and constantly pushed and
railed against both C and C++ to the point of being what I considered
completely divorced from reality.

Java as a tech is strong, but Java as a community was full of pretentious
assholes who had a complex about performance (in my opinion of course).

I have no doubt your example was most likely due to some technical issue
preventing hotspot from doing what it should have. When hotspot can do it's
work it's amazing, you just have to enable it, and you're right about doing
arcane things to get performance. That's true in any GC'd language though,
even .Net has it's boogeymen.

~~~
TheLoneWolfling
This isn't anything to do with GC.

This is purely an optimization issue, and one that can be done regardless of
if a language is GC'd or not.

~~~
mreiland
You're speaking of this specific issue, I'm speaking in general.

------
tempodox
I'm not much into low-level CPU architectures, so I have to ask: Is there a
technical reason (performance or whatever) why LE would be objectively better
than BE?

I do read hex dumps & disassemblies, and there LE is a huge pain in the
backend. Everything is in reverse. Why would a sane programmer voluntarily use
LE?

God Save the Network Byte Order!

~~~
andrewla
I think the main advantage I see in LE architectures is that integral types
don't have to relocate. For example, the int32 value 30 is represented in
memory as "1E 00 00 00". The int16 value 30 is represented in memory as "1E
00". So casting downwards means that the pointers are the same, and you just
ignore the trailing 0s. Contrast this to BE architectures, where you have "00
00 00 1E" vs "00 1E" \-- casting from the larger to the smaller type means
that you have to move the pointer.

~~~
cesarb
Little-endian also feels more natural for big integers: the byte at offset x
has weight 256^x.

Contrast with a big-endian representation, where the byte at offset x would
have weight 256^(length - 1 - x).

~~~
wyager
>Little-endian also feels more natural for big integers: the byte at offset x
has weight 256^x.

This is literally the opposite of how most programmers in the world write
numbers naturally.

See "1234". The digit at offset x _from the right_ has weight 10^x. That
corresponds to big-endian.

Considering that the only important difference between little and big endian
is when people have to read or write it by hand, we should probably model it
after common human representation...

~~~
hamstergene
By the way, this "natural" way of writing numbers comes unchanged from right-
to-left writing system of Arabic. They naturally read 1234 starting from the
lowest significant digit (here, '4').

Think about the irony :)

~~~
KMag
I'm not sure about Arabic, but my understanding of Hebrew (another member of
the Semitic language family) is that they still pronounce the numbers left to
right, and if a line break forces digits to be split across lines, the most
significant digits are put on the top line.

~~~
hamstergene
You got me curious. I googled and found[1] that indeed they only pronounce
numbers 21-99 that way, e.g. for 31 the order of words is "one thirty", but
higher order components go left to right, e.g. for 25031 the order of words is
"five twenty thousand one thirty".

1\.
[http://arabic.tripod.com/VocabNumbers.htm](http://arabic.tripod.com/VocabNumbers.htm)

------
DonHopkins
I once heard somebody jokingly refer to MIPS as SPIM when booted in "other-
endian" mode. If SPARC could do that, then it would be CRAPS.

~~~
SeanLuke
An odd joke. SPIM is a well-known MIPS emulator, used in a great many
university compiler classes.

[http://spimsimulator.sourceforge.net/](http://spimsimulator.sourceforge.net/)

------
JosephRedfern
Going against what Linus suggests, The Dalvik DEX format (used by Android) has
a flag representing the endianness of the enclosed bytecode:
[https://source.android.com/devices/tech/dalvik/dex-
format.ht...](https://source.android.com/devices/tech/dalvik/dex-format.html)
(see "ENDIAN_CONSTANT and REVERSE_ENDIAN_CONSTANT"). By default, it's Little
Endian.

I've always wondered WHY this flag is present - why would an implementation
wish to change the byte-order? I could understand if the file contained
machine code, and would vary with different architectures - but given it's
running in a VM, why bother?

~~~
davtbaum
Even though the bytecode is being executed on a VM, the VM itself has been
ported to the architecture, which has a native endianness. Byte order in the
dex that matches the host OS's will likely result in a performance gain as the
VM can cast more efficiently.

~~~
JosephRedfern
That's true - do you suppose that this flag might be used if an app developer
was targeting a specific CPU architecture, then?

I guess it could be argued that if you're that concerned about performance and
know your target architecture, that it might be worth going native rather than
running on a VM.

~~~
yincrash
This is just a guess, but I think the odex pass on dalvik switches the
endianness to the architecture's native.

------
hsivonen
The Web is now little-endian thanks to ArrayBufferViews exposing endianness
and everyone testing only the little-endian case. It doesn't make sense to
make big-endian hardware that needs to run a Web browser anymore, so chances
are that big-endian will never come back.

~~~
dtech
I have a hard time believing that the whole web uses ArrayBufferViews.

------
krazydad
From Wikipedia: [Gulliver's Travels] describes an intra-Lilliputian quarrel
over the practice of breaking eggs. Traditionally, Lilliputians broke boiled
eggs on the larger end; a few generations ago, an Emperor of Lilliput, the
Present Emperor's great-grandfather, had decreed that all eggs be broken on
the smaller end after his son cut himself breaking the egg on the larger end.
The differences between Big-Endians (those who broke their eggs at the larger
end) and Little-Endians had given rise to "six rebellions... wherein one
Emperor lost his life, and another his crown". The Lilliputian religion says
an egg should be broken on the convenient end, which is now interpreted by the
Lilliputians as the smaller end. The Big-Endians gained favour in Blefuscu.

[http://en.wikipedia.org/wiki/Lilliput_and_Blefuscu](http://en.wikipedia.org/wiki/Lilliput_and_Blefuscu)

------
restalis
This got me thinking if in general (outside computing) it was a good choice
having numbers represented the way they are. What if we had 1234 read as "four
thousand, three hundred and twenty one"? Or even more different than how we
pronounce it today - in order to read the number left-to-right to spell it
"one and twenty and three hundred and four thousand"! This way the last digit
lingering in your mind would be the one representing the most significant part
of the given number (unlike the current practice of hearing first then tuning
off for the less interesting smaller parts). Would it have been practical?

~~~
jerf
If you take into account the fact that you can obtain the order-of-magnitude
for most "human" numbers without actually focusing on them, then putting the
most-significant digit first in the stream makes sense. 30 is easy to see,
even if you look at the 3 you can at least tell there's two digits. 12,048 is
pretty easy. Even 8,293,254 isn't that hard to order-of-magnitude from the
near-fovea while you're focused on the 8. By the time you get into the larger
numbers that are hard, well, they're hard under either order so it's a wash.

~~~
restalis
At reading, one may be indeed hit at first by the most significant digit ('8'
in your last example) that is somewhat useful, but the reader can't easily
figure out digit's order of magnitude without reading the entire number.
There's actually more. The reader can't spell the number right away, as it has
to be read entirely and compiled in mind first. This compilation involves
counting all digits in the given number to determine the number's size, then
starting again by interpreting and spelling accordingly in its order each
digit and its designation. This double-pass doesn't appear that efficient to
me. On the other hand, if the number from your last example would be written
backwards (452'392'8), its reading and interpretation would come efficiently
in one pass ("four fifty and two hundred, three ninety and two hundred
thousand, and eight million"). Even when all we're interested in would be the
most significant digit and its order of magnitude, that would still be faster
to determine (i.e. again, in one pass).

~~~
jerf
"but the reader can't easily figure out digit's order of magnitude without
reading the entire number."

Yes, they can; that was my entire point. Human-sized numbers fit within the
eye's fovea. Numbers that don't fit easily within the fovea require effort
under either scheme. Your model of having to go digit-by-digit is a computer's
scanning model, not a human perception model; we do not scan that way, we scan
in fovea-sized chunks, which are smaller than people may think because the
brain is _very_ good at interpolating before the information hits the
conscious mind, but is still large enough to fit "millions" in quite
comfortably.

------
antirez
I would not follow Linus advice here. To stick with a given endianess for a
protocol? Sure! But why to use BE if everything is LE? If you use LE at
compile time you can convert the conversion calls to no operations: zero cost.
The cost will be incurred only into BE systems that are every day less common.
So my suggestion is to stick to a specific endianess in protocols and
serialization factors but picking LE instead of BE. Macros to convert the
value that will do nothing if the host is already LE are trivial to write.

~~~
toyg
That's what Linus is saying: hey, BE is basically dead and you shouldn't use
it, but _if you really really want to use it_ , that's fine as long as you
don't go for dynamic switching.

------
rwmj
When POWER8 came out (ppc64le) the writing was pretty much on the wall. Is
there any major or even minor architecture which still defaults to BE? SPARC
possibly?

~~~
sanxiyn
Yes, SPARC and zSeries. Among architectures supported by recently released
Debian, zSeries is the only architecture that defaults to BE. (SPARC support
is dropped.)

------
nudpiedo
Another subjective opinion from someone who doesn't look much out of his box.

~~~
olm
As opposed to an objective opinion?

