Hacker News new | comments | ask | show | jobs | submit login

I managed to ask Chris Lattner this very question at WWDC (during a moment when he wasn't surrounded by adoring crowds). "So, you're signaling a new CPU architecture?" But, "No; think more along the lines of 'adding a new multiply instruction'. By the time you're in Bitcode, you're already fairly architecture-specific" says he. My hopes for a return to big-endian are dashed. [Quotes are approximate.]

That sounds about right. No radical architectural shifts, but bitcode submissions should let Apple optimize apps automatically for whatever latest tweaks are available in issue width or fused instructions.

My most radical speculation is an iPhone A-series chip with an additional low power ARM core especially to support Watch apps without burning too much "host" device battery.

Why on earth would you ever want to _return_ to big-endian?

Because big-endian matches how most humans have done it for most of history ("five hundred twenty one" is written "521" or "DXXI", not "125" or "IXXD"). Because the left-most bit in a byte is the high-order bit, so the left-most byte in a word should be the high-order byte. Because ordering two 8-character ascii strings can be done with a single 8-byte integer compare instruction (with the obvious generalizations). Because looking for 0x12345678 in a hex dump (visually or with an automatic tool) isn't a maddening task. Because manipulating 1-bit-per-pixel image data and frame buffers (shifting left and right, particularly) doesn't lead to despair. Because that's how any right-thinking person's brain works.

The one place I've seen little-endian actually be a help is that it tends to catch "forgot to malloc strlen PLUS ONE for the terminating NUL byte" bugs that go undetected for much longer on big-endian machines. Making such an error means the NUL gets written just past the end of the malloc, which may be the first byte of the next word in the heap, which (on many implementations of malloc) holds the length of the next item in the heap, which is typically a non-huge integer. Thus, on big-endian machines, you're overwriting a zero (high order byte of a non-huge integer) with a zero, so no harm done, and the bug is masked. On little-endian machines, though, you're very likely clobbering malloc's idea of the size of the next item, and eventually it will notice that its internal data structures have been corrupted and complain. I learned this lesson after we'd been shipping crash-free FrameMaker for years on 68000 and Sparc, and then ported to the short-lived Sun386i.

Humans never chose to communicate via hex dumps either. We're making some concessions to hardware concerns here at the cost of human readability anyway, so one can debate whether endianness should also be considered that way. Protocol decoders are really great.

Little endian is more natural for data structures. The least significant bit goes in the byte with the least address, and the most significant bit goes in the byte with the greatest address, so you never have to remember which way you're going, which is particularly nice when working with bitvectors and bit fields.

"Left" and "right" can go either way, depending on which kind of diagram you draw, even on big-endian machines, so those words always end up ambiguous. Stick to bit significance and address order and everything is unambiguous and naturally inclined to little endian.

I'm not sure what you mean by the bit shifting case. The 8 char ASCII sting compare is a neat trick with limited applicability these days.

Well for starters, ARM has runtime-selectable endianness, so if Apple had felt any reason to do so, they would have used a big-endian ABI by now.

This article covers the practical tradeoffs of little and big-endianness well: https://fgiesen.wordpress.com/2014/10/25/little-endian-vs-bi...

The tl;dr is that little-endian was a smart performance optimization in the early microprocessor days when nearly all arithmetic was effectively bignum arithmetic (because the ALUs were only 4 or 8 bits wide), but that doesn't really matter now, so we're stuck with little-endian despite big-endian having some small developer productivity benefits.

The thing is, little-endian won pretty much everywhere outside of network protocols, so almost all of the common data formats store words in little-endian format as a performance optimization. By going big-endian, you'd be both forced to eat a byte-swapping performance hit on every load or store to these formats, and you'd break a tremendous amount of software that assumes it is running on a little-endian architecture. Dealing with those headaches would absolutely not be worth the trouble for the almost insignificant benefit of slightly easier to read hex dumps, or the slightly more useful benefit of string comparison via memcmp that could be better performed by dedicated SIMD instructions anyway.

> Because big-endian matches how most humans have done it for most of history ("five hundred twenty one" is written "521" or "DXXI", not "125" or "IXXD").

Actually, it is possible that that was nothing more than an accident. We use Arabic numerals, and Arabic languages are written right-to-left. Then there are languages like German where digits are read in reverse, so "42" is read as "two-and-forty".

The German "decade flip" is restricted to the tens and unit places; otherwise the order is English-like, with larger terms leading.

The cardinal number systems for most major languages lead with larger terms (as in English). I don't think there's anything deep about this, it's probably an accident. And there are languages which lead with smaller terms, such as Malagasy (the national language of Madagascar).

The ordering of digits in Arabic is not obviously relevant, per se, since spoken English ("one hundred twenty one") matches the order of the Arabic numbers, too.

It's funny how the Germans and Dutch (rightfully) ridicule Americans for writing dates in middle-endian order like 9/11/2001, yet they say numbers with the decade flip "two and forty". That's just as ridiculous.

MM/DD/YYYY is simply a direct transliteration of spoken English, which makes it easy to read and write dates. In other languages, the spoken version is little endian or big endian, and the written version aligns accordingly. (At least for the languages I know.)

ITYM "... spoken American".

Are you saying it is commonly referred to as "The 11th of September, 2001" in England?

We would normally refer to that as September 11 because it's much more talked about in the US, and that's the phrase used there.

Any other dates will likely be in the same order as written. For instance, the rhyme for bonfire night is 'remember remember, the fifth of November'. I believe that many in the US also talk about the fourth of July, rather than July fourth, so it's not like English has the hard-and-fast rule you were proposing.

Ok, fair enough. What do you say for non-special dates like July 3rd, 2015?

'Third of July, 2015' or more likely 'third of July'. The date format really isn't lying to us in UK English.

As a British English speaker, I'd say "yes".

Technically, I'd drop the "th of" and just say/write "11 September 2001".

Interesting. Would you choose "three July twenty fifteen", "July third twenty fifteen", or "the third of July twenty fifteen" (substitute "two thousand" for "twenty" if you like)? Assume someone has asked you the date and you're responding out loud.

I would say "three July twenty fifteen".

It took me a while to figure this out because actually it's quite rare to speak a date including a year without reading it - most spontaneously-spoken dates are this year (so the year is implied) and for a read dates, I'd probably say whatever was written.

The clincher was how I'd say my birth date, which would be of the form above.

I'm not claiming to be the definitive British English speaker, though! ;)

...and as another poster commented, it might depend on context - for example, "September 11" is often used in British English because it refers to an American event.

As a Brit, IMO both are perfectly acceptable in English prose. It isn't unusual to say "October the 4th" as opposed to "the 4th of October".

Americans read and write dates in middle endian. Germans and Dutch only read numbers with decade flip, they are written as usual. Furthermore with "two and forty" there can be no confusion since it's not "two and four", so it's clear that "forty" refers to the tens position and "two" to the units position. It is of course not ideal, but not nearly as much cause of confusion as the middle endian dates, because there's simply no way to know what 9/11/2001 means.

At least they (we) stay consistent between 13 and 99, while a certain other language elects to switch the flip at 20.

I think we should all take a moment to admire the francophone Swiss for boldly dropping much of the madness that is french counting. (Yes, I am looking at you, quatre-vingt-dix-neuf!)

> The ordering of digits in Arabic is not obviously relevant, per se, since spoken English ("one hundred twenty one") matches the order of the Arabic numbers, too.

I think it is relevant. It is possible that Western mathematics copied the Arabic notation (with right-to-left numbers), without also copying the correct way to read it (also right-to-left). For a similar situation in language, think of accents and the many different ways you can pronounce the same word.

Unlikely. Cardinal numbers in Old English, long before the slightest chance of contact with Arabs, were virtually identical to the modern system with respect to order of terms.

"For a similar situation in language, think of accents and the many different ways you can pronounce the same word."

Could you be more specific as to what you mean?

I mean the writing is identical, but the pronounciation varies wildly.

I think the way we do it is more natural. Numbers which have an infinite decimal expansion towards the right side of the decimal point are relatively much more common and useful compared to numbers which have an infinite decimal expansion towards the left.

For example, you can write out e as 2.7182...

However, if we were to flip this notation, ...2817.2, it isn't clear where to begin writing the number, if we read(and write) from left to write. With the regular representation, you write out the 'major' parts of the number first and then give out as many details as you want. You have the beginning of your string in mind. With a reversed system, you don't have the beginning but the end of the string in mind.

See https://en.wikipedia.org/wiki/P-adic_number for an overview of the system of numbers that actually works this way, similar to https://en.wikipedia.org/wiki/Two's_complement with infinitely long registers.

All you are pointing out is that mixing little-endian and big-endian may cause trouble. You're not saying anything about which of the two is better.

Anyway, it doesn't matter what you think is 'more natural.' Computing in binary probably feels less natural to you, but nobody is going to stop making binary computers because of that.

Do they also call 1042 "Forty two and one thousand"?

After looking up some German for beginners (German speakers, feel free to correct me), I found out that 1042 is read like "one thousand two and forty".

German speaker here, that's correct. When the number is between 1000-1999 the "one" in "one thousand" is sometimes omitted, so "thousand two and forty".

Also known as middle endian.

Even if its weird, its nothing like French numerals.

Aren't the only weirdness in French 70, 80 and 90? That's nothing compared to German :)

Georgian has a fun number system too - numbers between 20-90 are expressed as a multiple of 20 + a number between 1 and 19: http://blog.conjugate.cz/georgian-is-fun

Its weird its not consistent! :D

The german thing is just weird until you get used to it, with french I constantly go, wait, crap this is over 70, whats the deal again. I blame the wine consumption.

Can't be worse than Dutch counting!

Yeah, but the way individual languages do numbers doesn't necessarily make sense. In French, 99 is read as "four-twenties and ten-plus-nine"

Little endian is easier to deal with at the machine level as you don't need to adjust pointer offsets when referencing a smaller sized variable. A pointer to an 8, 16, 32, 64, 128 bit quantity will be the same. Big endian you will need to adjust the pointer up/down accordingly.

I always thought it was just one of those arbitrary choices made by companies with massive headaches resulting. I never thought to ask anyone if there was a good reason for one or the other. You've made quite the case for big-endian. I'm saving your comment (with attribution) for the next time this comes up in discussion. ;)

"Framemaker: it's riddled with features!"

>why on earth

there's your problem, you're living on earth. try living in the cloud. :) (network byte order)

Because BSD vs SysV just doesn't have that frisson of '80s nerdwar any longer?

Remember the Tie Fighter -vs- Death Star poster swag from Usenix?

4.x > V ∀ x from 0..∞

To think different

Exactly. Clang makes platform-specific lowerings and bakes in some ABI-specific gunk. Granted not so much as the object code on the other side of the back-end. Enabling target-specific vectorization and whole-program optimization are probably the goal.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact