If there is no carry, then I don't understand. Four bytes will have the same value deducted. regardless of the byte order. If there is no carry implication (all are digits) then there is no issue.
ARM has unaligned access since v6 (introduced in 2001); if you're on linux, unaligned access will be patched by the kernel (as was the case prior ARMv6 and even for MIPS afaik).
Anyway, the point of his post was about possible gains from removing validation, not about being portable or production code.
The subtraction isn't the issue, the cast is. the string "2016" is represented by the byte sequence [0x32, 0x30, 0x31, 0x36]. Casting this array to a uint32* in big endian gives you the integer 0x32303136 (or 842019126) while in little endian gives you the integer 0x36313032 (or 909193266).
When you show the string in memory order, they are the same. Its the operation on the string that's important, not the way you print the hex byte-order-dependent value. Both become 01 00 01 06
But we're trying to convert the string "2016" to the integer 2016.
we want to turn the sequence [0x32, 0x30, 0x31, 0x36] (same on both architectures) into [0x00, 0x00, 0x07, 0xe0] in big endian or [0xe0, 0x07, 0x00, 0x00] in little endian. You can't simply perform the same procedure in both architectures since it'll result in a reversed sequence in one of them...