

It’s my stack frame, I don’t care about your stack frame - lt
http://blogs.embarcadero.com/abauer/2010/01/14/38904

======
dchest
After a bit of googling I found this:

 _The caller ensures that the stack is 16-byte aligned at the point of the
function call.

“At the point of the function call” means just before the CALL instruction is
executed, the top of the stack. No further explanation is given, but an
inference can be made: the Darwin kernel is using SIMD instructions to quickly
move data on the stack, possibly during context switches._

[http://pages.cs.wisc.edu/~weinrich/papers/method_dispatch.pd...](http://pages.cs.wisc.edu/~weinrich/papers/method_dispatch.pdf)

------
sern
The x86_64 ABI mandates a 16 byte aligned stack. I think Apple's just doing it
in their 32-bit ABI as well to keep it consistent.

Regardless of the rationale, it _really_ isn't a big deal at all. He'll need
to jump through the stack alignment hoops anyway when he gets to finishing his
64-bit compiler, and so what if he has to do the same with the 32-bit
compiler?

~~~
barrkel
It really is a big deal when you have an unknown amount of customer's hand-
coded 32-bit assembly that needs to be reviewed in order to target the
platform. 64-bit is an orange in this comparison of the Apple; that's a
separate architecture.

------
jey
Eh? If the MacOS ABI says your stack alignment must be 16 bytes, then it must
be 16 bytes. It's not any different than other platforms mandating a
particular stack alignment, except that stack alignment on x86 is "usually" 4
bytes. Either way, some alignment is dictated by the platform's ABI, so the
MacOS 16-byte stack alignment really shouldn't be _that_ surprising. Sure,
it's a bit different than "most" x86 ABIs, but so what?

I agree that the ABI could use the convention that the callee sets ESP to have
whatever alignment it wants, if needed for its SSE ops or whatever. But
really, why care? Sure, it'd be interesting to hear the rationale... but in
the end, we just have to follow the ABI, and mandating 16-byte stack alignment
isn't all that crazy.

~~~
barrkel
It's "so what" because it hurts the portability of other software to the
platform. It's a bad business decision because it increases the costs of third
parties supporting the platform.

------
m_eiman
Could it be that it's a PPC emulation thing? If an Intel binary can load a PPC
lib, then the calls between them would have to follow the same rules (and
perhaps PPC wants things 16 byte alinged?).

~~~
JoachimSchipper
Intel binaries can't load PPC libs. Rosetta can be used to run (most) PPC
binaries, but that's translation (in userland, I understand.)

Consider a call to "void f(long x)": this means very different things for i386
code (long is 32-bit, little-endian) and Mac powerpc code (long is 64-bit,
big-endian).

------
ssp
Presumably, Apple is relying on SSE for floating point, like x86-64 does.
_That_ certainly is both faster and simpler than dealing with the bizarre x87.

And you don't have to insert extra code at all call sites. You can assume that
_your_ stack is correctly aligned at entry too, so if you just make your stack
frame a multiple of 16 bytes, you'll be fine.

~~~
jedbrown
Most compilers produce exclusively SSE instructions for floating point these
days (unless you're explicitly generating code for machines that don't support
it). But the stack doesn't need to be aligned at the call site for this, it
can be done in the preamble of whatever function uses it (it costs one BAND
instruction).

The stack is not aligned at function entry because the CALL instruction pushes
the address on the stack, thus having alignment at the call site forces the
stack to be unaligned at function entry, however, it is unaligned by a known
amount so you can produce aligned addresses by subtracting a constant value
(skipping the BAND instruction). The featured rant misses this point, you get
aligned addresses just as easily at the bottom of your frame as at the top.

~~~
ssp
Right, "just make your stack frame a multiple of 16 bytes" was wrong; I should
have said "just make your stack frame 16 * k - 4 bytes long". This is
basically free because the size of the stack frame is known at compile time.

Realigning the stack at _runtime_ though, is expensive and complicated.

~~~
jedbrown
Aligning the stack at runtime is one instruction in the preamble so it's not a
big deal, especially if the function is big enough to have non-register SSE
locals. Also, when using VLAs, it's not optional. I use SSE all the time and
have no problems with the alignment requirements, but it's not really a
performance issue and I'm doubtful that it was worth the time for people to
learn and deal with this idiosyncrasy.

~~~
ssp
You need to restore the stack to what it was before on return. Normally you
can easily do that because you know the stack pointer delta at compile time.
But when realigning at runtime, you don't, so you need to compute the
difference and then store that somewhere, and on return you have to read that
back and add it to the stack pointer.

Or is there some trick that I'm missing?

~~~
jedbrown
It's in the frame pointer.

Edit: You're right, optimized code might not save the frame pointer and this
wouldn't provide any benefit if your frame had unknown size. But usually the
preamble saves the old ESP in EBP and it is restored at the end. Maybe there
is another alternative.

~~~
ssp
Hmm, I was actually about to edit my comment to say that _you_ are right
because usually you'll store the old stack pointer as the new base pointer.

I guess the answer is that realigning at runtime costs a register, but most
code has to pay that cost anyway.

------
GeneralMaximus
What I find more interesting is that this is from the Embarcadero blog. Does
this mean we'll have Delphi on the Mac soon?

~~~
lt
Yes, that has been announced for a while.

There's no specific timeframes (and it looks a bit outdated), but here's the
latest roadmap:

<http://edn.embarcadero.com/article/39934>

