
What is the performance impact of using int64_t on 32-bit systems? - networked
http://stackoverflow.com/questions/16841382/what-is-the-performance-impact-of-using-int64-t-instead-of-int32-t-on-32-bit-sys
======
ChuckMcM
When the discussion was ongoing about what the 'native' length for an int
should be in Java the group did some profiling on machines to compare 32 bit,
64 bit on 32 bit, and 64 bit native, and 32 bit on 64 bit performance.
Generally 32 bit on native or 64 bit platforms was pretty much the same.
Emulated 64 bit (where the compiler had to generate extra code to do the math
operations) was about .1% slower. (yes one tenth of one percent). In
investigating it the dominant factor in program execution was not instructions
per second but fetch time from DRAM into the cache. All of the math operations
fit in the cache and so executed at full clock rate. As the typical CPU at the
time was 1Ghz with an execution efficency of 1.2 to 1.5 IPC, during the 110ns
where the CPU was waiting for the next bit of data from the RAM to get into
the cache it could execute 60-80 instructions. So generally the "extra"
instructions were essentially free for the most part. So Java got 64 bit
integers regardless of the underlying processor architecture.

Footnote: I have mis-remembered the system performance numbers, the SS10 was a
50Mhz machine. The R4000 (native 64 bit machine was only 100Mhz. That said,
the CPU being 'starved' while it waited for the cache to fill was the
proximate analysis of why the performance differed very little.

~~~
boomzilla
Java has both int (32 bit) and long (64 bit) types. What did you refer to as
'native'?

Lucene, the popular core search library, has always used int as internal
document IDs. As a result, the number of documents in one Lucene index is
limited to 2 billions (Java int are signed). It's even worse if there are many
deleted docs in the index as they are are just marked as deleted and consume
up IDs. There are a number of reasons why no one is keen on moving internal
docIDs to long, and performance is one of them.

~~~
ChuckMcM
Wow, and thanks for that. You are absolutely correct that Java exported all of
the various integer sizes. Odd that I can clearly remember the debate and mis-
remember the final outcome. Bill Joy was pretty adamant about "fixing" the
various sizes of integers problem that C had been going through on x86 with
the whole sizeof(int) != sizeof(ptr) code and we benchmarked it and it did not
affect performance hardly at all. Wow it sucks getting old.

------
joosters
Something not mentioned there is locking. For the specific use case
(timestamps) that the OP wants, it might not be important, but for many common
use cases, like 64 bit counters, you can suddenly hit consistency problems in
programs. I think that older x86 processors couldn't atomically write 64 bit
numbers, so you might have to add locks around all the accesses. This could be
a huge performance hit.

ISTR this was a problem for Linux kernels? There used to be several kernel
counters that were stuck at 32 bits because the overhead of making them 64
bits was just too much. I think one example was network byte counters for
NICs? I might be misremembering this, though.

~~~
quincunx
The x86 instruction for this would be LOCK CMPXCHG8B and would allow
atomically writing 64 bits. (The x64 version would be LOCK CMPXCHG16B for
writing 128 bits.) Instruction goes back a while, there was a bug on pentium
[https://en.wikipedia.org/wiki/Pentium_F00F_bug](https://en.wikipedia.org/wiki/Pentium_F00F_bug)

~~~
qb45
Yep, but first you have to load this counter and increment it manually in some
CPU register. And you also need to repeat the whole sequence in case CMPXCHG8B
fails.

With 32 bit integers you just execute XADD and move on with your life.

------
rewqfdsa
> [Using 64-bit registers in 32-bit mode] is not possible. If the system is in
> 32-bit mode, it will act as a 32-bit system, the extra 32-bits of the
> registers is completely invisible, just as it would be if the system was
> actually a "true 32-bit system".

That's not entirely true. Under Linux, with the x32 ABI (see
[https://lwn.net/Articles/456731/](https://lwn.net/Articles/456731/)) you have
access to the entire register file, but still use 32-bit pointers.

In many ways, it combines the advantages of 32- and 64-bit mode code.

~~~
simcop2387
As addressed in the comments on SO[1], x32 mode is actually still long mode.
It's just a change in ABI to use the upper and lower 2GB of address space so
that you can fit pointers in 32bits. It doesn't really act like a 32bit system
because it isn't it's a 64bit system with a restricted address space.

That said it can be useful for certain loads where you aren't consuming large
amounts of memory but are doing lots of calculations.

[1] [http://stackoverflow.com/questions/16841382/what-is-the-
perf...](http://stackoverflow.com/questions/16841382/what-is-the-performance-
impact-of-using-int64-t-instead-of-int32-t-on-32-bit-
sys#comment24292419_16841814)

~~~
joosters
x32 mode can be a huge win if your data structures contain lots of
pointers/references. It can shrink the memory usage and speed up computation
by surprising amounts (compared to running the same process in a 'full' 64 bit
mode). It's not just for when you are writing your own intricate code, either.
An x32 build of perl sped up some of my programs dramatically, and let me run
stuff on a 4GB machine that previously couldn't fit the data in.

Ubuntu has some x32 packages available for their standard 64 bit distribution,
but the selection is limited.

~~~
qb45
Did you compare x32 versus the old fashioned i386?

~~~
joosters
Nope, other stuff on the machine needed 64 bits so unfortunately it would be
difficult to do the comparison.

In _theory_ , x32 should still be faster because the code gets to use all the
other x64 features, like a larger register set and so on. I've no idea how big
a difference that actually would make though.

~~~
qb45
You can run 32 bit binaries on 64 bit system. I bet Ubuntu has several 32 bit
packages available because they are needed by Wine and and closed-source 32
bit binaries.

The theoretical possibility for speedup is exactly why I asked. The x32
website has benchmarks where it's as fast as i386 and 40% faster than amd64 on
pointers or as fast as amd64 and 40% faster than i386 on 64 bit math, but
what's missing to really justify x32 is some "20% better than either" case.

~~~
joosters
Sorry, I see what you mean now! For some reason, I completely forgot about
running a plain 32 bit version... sorry but I don't have the chance to easily
run the code again right now and compare.

------
pmjordan
I seem to remember Agner Fog mentioning that the `adc` x86 instruction (add
with carry) was considerably slower on many CPUs than `add` (regular addition)
because it essentially had 3 dependencies (2 operand registers and carry
flag), and had to be decomposed into 2 instructions with 2 inputs each for
register renaming purposes. Adding 2 64-bit values on x86 is an `add` followed
by a dependent `adc`, so that's considerably worse than 2 regular, independent
32-bit additions.

~~~
emn13
IIRC that limitation was lifted in haswell (the then new fused multiply-add
would also otherwise be pointlessly inefficient), though I have no idea about
how or whether that affects things like adc.

~~~
pbsd
In the case of the carry flag it was only lifted in Broadwell, which also
introduced ADCX and ADOX to let you perform two carry chains concurrently.
Conditional moves were also improved; you can do 2 per cycle now.

------
GFK_of_xmaspast
The original question was in the context of storing times, and that makes me
wonder: how many freaking operations are you doing on time stamps that you
even care about this?

~~~
oliver_from_so
Original poster from SO here: there are some components that are doing
schedule calculations, so in some parts there are quite a lot of time-related
calculations, and performance can be important in these parts.

Also, I see now that the sentence "Our C++ library currently uses time_t for
storing time values" might have been misleading; by "storing" I meant "keeping
in RAM and registers, for calculations" rather than "keeping in non-volatile
memory, for archival".

------
dgreensp
Where would one encounter a 32-bit system today?

~~~
nextweek2
You probably have one in your pocket.

Most ARM chips are 32bit, so most phones are 32bit. There is little reason for
mobile chips to be 64bit.

