Hacker News new | past | comments | ask | show | jobs | submit login
What is “the stack”? (jvns.ca)
225 points by jor-el on March 3, 2016 | hide | past | favorite | 110 comments

It started with the x86 and PDP-11 thing. Some CPU architectures have a specific stack pointer register, and some don't. IBM mainframes don't have a dedicated stack pointer; the stack is a software construct. Neither do SPARC machines.

There are also real stack machines, where instructions take operands from the stack and push results on the stack. The Burroughs 5000 and the English Electric LEO-Marconi KDF9 introduced that architecture around 1958. It lives on in the x86 FPU register stack.

FORTRAN and COBOL originally forbade recursion. Compilers could allocate all temporary memory at compile time. Stack overflow is impossible. Some real-time systems still work that way.

The x87 FPU "stack" is not truly a stack though: it's rather a ring buffer of 8 registers.

And you can model it pretty well even without a "head pointer", just with moves between all registers, e.g. push(x) is ST(7) = ST(6), ST(6) = ST(5), ..., ST(1) = ST(0), ST(0) = x.

I've implemented it as such in a lifter from x86 machine code and all the redundant moves just disappear, if the uses of the pseudo-stack are balanced, and you're left with traditional registers.

Some microcontrollers (e.g. some PICs) even have a hardware stack for storing return addresses. This stack is on the processor die like the registers and not in the RAM which makes it even more fundamental.

Yes, and the PIC's return point stack is tiny. 2 (!) on the smallest machines, 8 on most, 31 on the "high end" PICs.[1]

[1] http://embeddedgurus.com/stack-overflow/2009/04/pic-stack-ov...

Regarding BM mainframes don't have a dedicated stack pointer; the stack is a software construct., Gene Ahmdahl, when asked why the 360 architecture didn't have a stack, he said "Too expensive". One might suspect that wasn't the whole story, as the 8085 (back when $5 retail quantity one) did have a stack.

The Burroughs 5000 and the rest of their line was quite interesting architecture. Arthur Sale once told me that he would use the B6700 architecture as a "universal counterexample" when talking about language conventions, such as C using zero as NULL.

This only scratches the surface of the topic. Here's a fine article that talks about implementing a crash handler for JNI that goes into some of the more complex issues of multiple stacks, signal handling, stack unwinding, and calling across languages:


Aside: it's interesting how complex things which I learned in my younger years have become. From understanding how a car operates and being able to perform maintenance on it (compare rebuilding a carburetor or adjusting the timing to fuel injection and computer controlled distributorless electronic ignition, etc), to this topic, computer architecture. I'm glad I first learned these things when they were simpler. I think they might seem daunting to start learning about today. Though I guess like anything else, you start at the surface and keep digging till you're satisfied you understand the topic well enough. It's just that if you like to get to the bottom of those rabbit holes, they're all so much deeper now. Shrug.

p.s. I really enjoy reading the author's posts. They are often about something I'm familiar with, but it's neat to read someone writing about learning these things through fresh eyes. I also sometimes realize I don't know something as well as I thought. So thank you for sharing.

I think it's entirely appropriate that the article just scratches the surface; as you say, going into all the crazy details just makes the topic scary to a newcomer.

In college I met a guy that had co-authored Marathon, the first big hit for Bungie Studios. (This was Ryan Martell, not Jason Jones.) Ryan taught me how my C code compiled to assembly, and we were using 68K Macs back in that day, so it was pretty straightforward to understand once you had somebody to show you the basics.

Oh man, that was a revelation. Stepping through compiled code made concepts like pointers, stack/heap, registers, etc. just as clear as day. Those handful of hours with Ryan did more for my understanding of computing than the whole rest of my college courses. (Well, excepting my operating systems class, that one was good.)

Funny enough, my first job was working on an embedded operating system that didn't have source-level debugging, I had to debug in 68K assembly. So I'll just shout this out to the ether: thanks again, Ryan.

Conversely, I often think how lucky I am to have entered the industry at this point.

I learnt HTML by initially downloading websites using ctrl-s, and then editing them in notepad. I learnt HTML/CSS, and then JavaScript, and then other languages.

I had the advantage that I had a GUI text editor. While I'm now quite comfortable using VIM and command-line editors, I sometimes think if I had tried to become a developer before the advent of GUI editors the learning curve would have been too steep.

Perhaps under the hood things are more complex, but the interfaces have improved.

Interfaces have improved because the scope of the thing has grown.

I learnt HTML in 1995, I think shortly before HTML 2 was released. It was tiny back then. Using anything other than a text editor was inconceivable (I doubt there was any tooling back then). CSS and JavaScript didn't even exist. We had CGI for emailing form content, but most of the web was static.

By 1998 I was convinced that every app ever would be online. Today I've come full circle, avoiding web dev like the plague (too many tools) and stick with desktop and phone apps.

Notepad? GUI? Are you kidding? Emacs was a GUI, Notepad is a joke of a GUI.

One of the really nice things about Julia and her writing style, is she makes quite complicated subjects fairly easy to grasp from a high level.

For beginners, or those who work much higher up the technology chain it provides a nice launching off point for further investigation.

I find her style quite fun and engaging.

"This isn't a theoretical question at all -- Rust actually has a different calling convention from C"

Technically correct, but not at the register/stack management level: LLVM still controls all of that, Rust only chooses the order of the arguments and whether they are passed "directly" (in registers or on the stack, when registers are exhausted) or "indirectly" (pointer to the value, whether it's on the stack or somewhere else).

So the following statement, "which means it sets up its stack differently" is incorrect: even if Rust wanted to, it couldn't do so with the LLVM backend.

In turn, isn't LLVM's stack management and argument passing dictated by the hardware's ABI? ARM, for one, publishes its calling convention in AAPCS (ARM Architecture Procedure Call Standard) which establishes the calling convention on the C stack.

If the hardware manufacturer publishes an ABI most people will use it for the sake of convenience and interoperability. But it's not like the ARM police can stop you from using a different calling convention for your language if you want.

ARM and modern x64 have pretty rigorously-defined calling conventions, yeah. x86, on the other hand, has six or seven different ones, and it was up to you as a C programmer to choose which one to use for each given function, by sticking e.g. a __stdcall or a __fastcall in your typespec.

x64 on Linux has a different calling convention than x64 on Windows.

I believe Go, however, uses its own stack rather than the C stack.

I think Rust used to do the same thing and use segmented stacks (rather than the C stack) until late 2013 or early 2014, point at which the language's base was drastically lowered (in the abstraction stack) away from builtin GC, green threads and the like.

At no point while using LLVM, could Rust have used a different calling convention.

It was managing its stack, but it does that even today: when creating a thread, it allocates a 8MB stack.

Within that stack, however, function calls work as they do in C, and they've been like that effectively forever (although the Ocaml bootstrap compiler might have had a non-LLVM x86 backend, not sure about that).

It couldn't? I'm by no means an expert on this, as my only experience with LLVM was for a undergraduate research project back when I was still in college, but I could swear that it allows custom calling conventions (would verify if I weren't on my phone).

Assuming I'm not mistaken about their existence, is there a reason Rust couldn't use one?

It has a predefined set of calling conventions (sadly, this doesn't list x86 and other architecture-specific ones): http://llvm.org/docs/LangRef.html#calling-conventions

GHC and HiPE appear to have gotten themselves LLVM calling conventions, I guess. But none of this can be customized further without modifying LLVM.

Besides, none of these choices seem to affect anything other than the (ordered) set of registers available for arguments, and the sets of caller/callee-save registers (i.e. what you expect from x86 calling conventions).

Once you get to the stack it's really all the same.

What I meant was that Rust could have picked any of those choices, which would make it incompatible with C (it already is for some argument types because of the simplistic handling around them), but it couldn't have used its own special one, not without adding it to all LLVM targets it wants to support.

`push` decrements the stack pointer on x86 architecture, unlike on ARM. The stack is strange in that it grows downwards. That is also why in the disassembly of the `blah` function the argument is temporarily stored in `ebp - 4` as ebp points to the top of the stack (which is at the lowest address).

I also wouldn't really say assembly follows the C stack pointer. The C language makes no assumptions about stacks, stack pointer or return addresses. It just describes "functions" to which the execution can be transferred and from which can be returned. How this is implemented, is up to the compiler designer and is platform dependent. I wouldn't be surprised to find out there were platforms without a stack/stack pointer but with a C compiler

> `push` decrements the stack pointer on x86 architecture, unlike on ARM.

Why do you say "unlike on ARM"? In 25 years I've never seen anything but a full, descending stack used as the program stack. You can have ascending stacks, but that's more an artifact of the fact that `pop` and `push` (which didn't exist in the assembly language until Thumb) are just specific uses of the general multiple load/store instructions `LDM` and `STM`, which can "write back" an incremented or decremented base register.

I always just drew the stack downwards in school. Makes it easier to for that assignment to diagram the stack at level of deepest recursion, at least.

There are. Parallax Propeller has no stack, no interrupts.

And just wait until you find out about Harvard architecture[1] CPUs that don't have a stack in the same sense that C has a stack—they generally have a call stack that is only manipulated implicitly by JSR and RET style instructions. It doesn't co-mingle return addresses with stack variables, like x86 (and von Neumann architecture[2] CPUs).

[1] https://en.wikipedia.org/wiki/Harvard_architecture

[2] https://en.wikipedia.org/wiki/Von_Neumann_architecture

At some point I was messing around with writing a toy VM and the choice of co-mingling return addresses and what where the functions arguments felt weird to me so I split them into two different stacks. It actually worked out. I mean it meant I diverged from the book I was using to do this but that was part of the fun to see which parts were simpler and which parts were more complicated in my design.

This was the book I used in case folks are curious http://www.springer.com/us/book/9783642149085.

See Forth, Factor a.o. which are "harvard" languages.

Separate call/local stacks is not really a property of Harvard architectures, which are distinguished by having different address spaces for code and data. The split is really orthogonal to the architecture type.

IA64 is Von Neumann architecture, but still has separate stacks. In fact, at least with clang with SafeStack enabled, you can have separate stacks independently from architecture.

Of course our a modern CPUs have generic enough instructions that let you do that. Harvard pretty much forces it though, so much so that many Harvard CPUs don't have a way to get code into data registers and vice versa. Since they are separate busses they may be different data widths and therefor incompatible.

This is great - I'm self taught and have been working for years with no idea what the stack is - I'd love one for this "heap" thing I keep hearing about, as well.

The heap is basically the memory that isn't your program code, your stack, or the space reserved by the compiler for your global variables (speaking of C, here). The heap memory traditionally gets allocated from a unix kernel with a syscall called sbrk(), which adds on to one end of the heap so that it stays one big contiguous chunk of memory. Traditionally, the heap begins at the opposite end of memory from the stack and grows towards it (some CPU's stacks grow up and some grow down). This may not be the case any more on 64 bit machines, since 64 bit address space is so big.

Technically it's up to your app to use that memory as it pleases. Of course, 99.9999% of programs use the standard C library functions malloc() and free() (or "new" and "delete") which handle all this for you, and keep track of which chunks of memory are in use. They're also cross platform so you don't have to worry about what the system API in Windows is for low level memory allocation. But you don't have to use them if you really don't want to (though in that case be careful calling shared library functions that might call malloc() behind your back).

This is a good comment because lots of folks confuse the heap and the stack.

Let's say you assign a variable in c, like so int i = 0; No matter where you do this, you're assigning a variable on the stack. A location on the stack is set aside and your variable goes in there.

This works great for small-ish things that you keep around to work on, but sometimes you want a big thing -- or you want something that lives on after the function dies.

In this case, you create something on the heap. In this case, you tell the computer to set aside a chunk of memory somewhere else -- the heap -- and let you store things there. You can continue using this location after your function is completed.

For an integer, it'd look like this.

int* j = malloc(sizeof(int));

So if you have a function creating bitmaps, you might have a lot of local (stack) variables, and you'd probably either create or use a pointer to a big hunk of memory where you kept all the bits for the bitmap (heap)

One way of thinking about it is if you're using a stack variable, only one piece of information gets stored in the computer -- the value. If you're using a heap variable, you're actually creating and storing two pieces of information, the value (on the heap), and a pointer to the value (on the stack). (hat tip to StackOverlow for the nice example http://stackoverflow.com/questions/6770596/memory-allocation... )

  > No matter where you do this, you're assigning a variable 
  > on the stack.
In the C standard this is referred to as "automatic storage duration", and it can be weird to some (though obvious in retrospect) to consider that C actually features automatic memory reclamation (albeit limited).

The important insight is that if you have a piece of memory that's scoped to a function, it's so trivial to free that memory that even C does it for you automatically. This is important because of what happens when you want to take this one step further: is it possible to make this same property apply to memory that's allocated on the heap rather than the stack?

For one, there's escape analysis, which is a static analysis that tries to determine if a piece of heap-allocated memory has the potential to leave a function's scope. If not, then you can allocate that memory on the stack instead without changing the semantics of your program. Go does this, and it's an important optimization for reducing GC traffic.

A step beyond that is linear typing (or maybe call it unique ownership), which is a static analysis that extends the "automatic storage duration" concept with one additional feature: a function can now hand off a piece of memory for some other function to free at the end of its scope. With this concept it's now possible to extend automatic storage duration to heap-allocated values as well, meaning you need neither a garbage collector nor explicit `free` calls. Rust does this, and it's one of the fundamental pillars of the language.

It's cool to see a bunch of really thorough, great answers here. The summary I use to keep them straight:

When a program starts, it asks the operating system for memory to store the program and all its initial data. The heap is the part of memory nothing is using yet -- it's just a heap of memory over in the corner for you to use later if you need it.

As well, I think about storing data in 2 ways:

A. When you only need the data as long as the current function is executing, you declare a variable in that function, and the OS includes it in the calculation for how big to make that function's stack. You use it then it gets cleared out when the function is done.

B. When you need the data to exist after the current function is done executing, but don't want to pass the data itself around as returns and arguments for other functions, you can ask the OS to give you part of that heap of memory to store the data in. But clearing heap data requires attention. Since we don't have the simplicity of clearing it when the function finishes, we generally have to keep track of what's been stored on the heap and make sure to free it up when we no longer need it.

Some languages implement a garbage control process to track and clear no-longer-needed heap memory for us, but really the more you have on the heap, the less memory available to other programs on the computer, which isn't cool.

If you remember back to the days of needing to restart Firefox after half a day because that one open tab was somehow using 1.8GB of your RAM, it's an example of what happens when a lot of things are allocated to the heap and aren't freed.

One slight correction: all of the memory setup is done by the operating system before the program starts at all. An executable file is basically a memory dump -- to start the program, the OS sets up some virtual memory space, copies the program into it(1), and then jumps to a predefined spot in the memory it just copied.

(1) This step is a bit more complicated due to dynamically-linked libraries

You're absolutely right and I should have been clearer there. Thanks!

That Firefox anecdote is ironically funny. I now experience the same thing with Chrome, except it's for disk space backed memory. So every day I have my "Your startup disk is almost full" warning. I then quit Chrome to see 10-15Gb of disk space freed up. Recently I managed to get more space on my disk so thankfully I won't see it for a while, but still...

You might also like this https://www.youtube.com/watch?v=1S0aBV-Waeo

Although call stacks aren't the main topic, it gives a really nice outline of them

Things like the stack and heap and many other things (like memory mapped I/O, interrupts, etc) are often taught in first year university courses under a name like "introduction to computer organization". There are many online courses you can do for free, or you can simply go to a book store and pick up a text book. Although none of it is something that you desperately need in order to work as a programmer these days, it's one of those things that will undoubtedly be useful almost every day, so I highly recommend it.

As for the heap, I noticed another description, but I'll have a go at describing it in a slightly different way. Back in the old days you had a big chunk of contiguous memory. It started at address 0 and ended at address 65535 (if you were lucky enough to have 64K of RAM!).

As you saw in the article, you need to set aside some of that memory for the stack. In most languages that are compiled to machine language, the stack holds storage for variables local to a function as well as the function return address (the object that holds all the information for one function is called a "stack frame"). As others have said, other organizations are possible, but this is the most common.

Allocating memory on the stack is ideal because it is really easy to allocate and deallocate. When you enter a new function, you look at how big each of the local variables are and you simply move the "stack pointer" that many bytes (either up or down, depending on the architecture of the machine). When the function is finished, you simply move the stack pointer back to the start of that stack frame. Super easy and super fast.

The problem is that as soon as your function finishes, the variable is deallocated. So let's say you have a function that loads a picture from disk. You make a big array in your picture loading function, put the picture data in it and return it from the function... uh oh... As soon as the function ends, the memory is deallocated :-(. That data is "gone" (what's worse is that the data is still there, so it will appear to work until the stack grows big enough to overwrite that data -- at which point your picture will be corrupted). So we need a way to allocate memory that will stick around until we tell the system that it is not needed any more.

That's what the heap is. I can call a system function (usually called "malloc", which stands for "memory allocate") and say "I want 4K of memory". It finds a contiguous chunk of 4K of memory, makes a note that the memory is in use and then gives me a pointer to it. I can then use that pointer to write into the memory. The system will not allocate any memory in that area unless I tell it to "free" the memory (usually with a call to something called "free"). After that it can be used again.

The place the system looks for in the memory for this kind of allocation is called the heap. So you can imagine the big block of memory in a computer is divided up into at least 2 sections: the stack and the heap. Often a computer system will put them at the opposite sides of the memory so that you can grow the stack as much as possible. So I might start the stack at 65535 and grow downward. And I might start allocating the heap from 0 upwards. In practice there are other things you need to reserve memory/addresses for, so this is not how it really works, but you get the idea.

Many programmers like allocating from the heap because it is easy. You don't have to worry about the scope of your program. You allocate a variable and it is there until you tell it to go away. But there are a couple of problems. First, if you forget to free the memory, it never goes away. This is called a "memory leak". Also, you might accidentally try to use the memory after you freed it, resulting in the same corruption noted earlier. But there is yet another problem.

Let's imagine that I allocated 1024 40 byte strings. That will take up 40K (almost all of my memory on this hypothetical old machine). Let's say that every second string is only needed at the beginning of the program, so let's free them all. That frees up 20K. However, the heap is now "fragmented". It has 40 bytes allocated, 40 bytes unallocated, 40 bytes allocated, 40 bytes unallocated... If I want to allocate 45 bytes, I will have to walk through 512 unallocated blocks, noting that each is too small, before I decide to allocate at the end. This is very slow. If there isn't a 45 byte chunk available, then my memory allocation will fail, even though I have 20K of free memory.

So, for many old timers like me, we favoured allocating on the stack. This leads to fast, efficient, safe code that is also surprisingly elegant. Unfortunately, it is also hard to design ;-)

These days, operating systems do a lot behind the scenes to make sure that you don't fragment your heap too much, etc, etc. Modern interpreted languages use garbage collection and allocate on the heap incrementally -- as if it were stack allocation. Occasionally they wander through and deallocate unused variables and compact the heap. Some of the algorithms used are very sophisticated and quick.

I keep thinking about designing a functional language with stack allocation, though... It appeals to my roots ;-).

Hope that description helps. Keep in mind that many details will be incorrect for any specific system. I mostly just tried to make the description simple, not accurate.

I don't know if it's still the case, but past versions of the Chicken Scheme system [1] implemented a remarkable stack-based allocation method from Henry Baker, fondly called "Cheney on the MTA" [2]. If you like thinking about how functional languages interplay with stack allocation, you would probably enjoy bending your head around this. :)

[1] http://www.call-cc.org/

[2] http://home.pipeline.com/~hbaker1/CheneyMTA.html

Sorry for the slow response. I'm reading this now and enjoying it very much. Thank you!

> I keep thinking about designing a functional language with stack allocation, though...

You might be interested in concatenative programming, effectively a modern functional take on Forth-like languages. Factor is a mature, dynamically typed concatenative language, with an excellent interactive environment reminiscent of Smalltalk. I have also been working on a statically typed concatenative language for a while now.

I found the Joy language to be pretty interesting - it's described as "Forth's functional cousin":


Thank you very much! (and to the parent as well!). FORTH programming was my first paying gig. I will look at this with much joy :-)

Mlkit is using region analysis to convert heap allocations to the stack allocations. Same with Harlan (it targets opencl, therefore no heap at all).

malloc almost certainly isn't a system call. It's a library call (provided by libc), and you can have different malloc functions (different allocators).

The underlying OS call is VirtualAlloc on Windows and brk/sbrk/mmap on Linux.

A heap is a data structure used by the malloc() and free() functions to find appropriately-sized chunks of RAM to give to the rest of the program.

The defining feature of heap-allocated memory is that it's persistent up the call stack; a common newbie error in C is to return a pointer to something allocated on the stack, which will become invalid the instant the other function sees it: If a() calls b(), everything in b()'s section of the call stack is cleaned up when b() returns to a(), so if a() tries to dereference a pointer to something b() allocated on the stack, the program will die.

The flip side is that programmers have to manage this by hand, which is the cause of memory leaks. Garbage collection is used to mitigate this; to, at the very least, turn a program from one with quickly-growing memory usage into one with slow-growing memory usage.

Technically (at least on x86 and ARM), if b() allocates memory on the stack and returns a pointer to a(), as long as the pointer is dereferned immediately after the call the b(), the value should still be the same. It's not something you should count on (and purposely relying on it is a sin), but it won't kill your program.


It is an undefined behavior. It is very likely that something will go wrong, especially when optimizations are on. It is something you should never really on.

Why would the program die? It can work depending on what happens between the invocations [1].

[1]: http://codepad.org/Z7Ojq9ml

What came first in modern systems was the ABI standards documentation, which for Linux is the System V ABI, which is apparently followed even on BSD.


So System V came out in 1983, which postdates the introduction of the x86 architecture because the IBM PC was x86 from day one and it first came out in 1981. (I'm sure the 8086 is older, but it doesn't really matter here.)

Of course, it isn't the only standard out there for x86. Microsoft exists, after all.


Anyway, all of these standards have to specify which registers get saved by the called function and which get clobbered (and, therefore, must be saved by the calling function if their values are important) and you can't know which registers will exist on a given ISA until it exists, so the specific System V ABI for the x86 couldn't possibly have existed until the chips did. Ditto the System V ABI for the x86-64, the MIPS, and all of the other ISAs for which there is such a document.

However, chip designers aren't stupid. They know that their chips will be used to run binaries generated by C compilers, and that C compilers like stack frames and registers. Therefore, modern ISAs are designed with those things in mind, and I wouldn't be surprised if at least a few chip designers worked with one eye on some ABI documentation.

(Compare modern chips with older minicomputer and mainframe ISAs, where call stacks weren't always available and where subroutine calls sometimes involved self-modifying code. The PDP-8, for example, did subroutine calls by writing the the return address to the first (12-bit) word of memory in the subroutine, and then jumping to the address just past it; returning was a simple matter of jumping to a memory location loaded from a known address, but recursion was effectively impossible. The PDP-8 was a fine machine for assembly language programmers, but almost perversely unfriendly to modern mid-level languages, such as C and Pascal.)

Burroughs, IBM mainframes, Ada Machines (by Rational), Xerox Parc and ETHZ workstations are probably the best examples of more higher level friendly architectures.

Or at least the better documented ones.

Bit of a nitpick, but ELF binaries are not used "only in Linux". OS X (Mach) and Windows (PE) are really quite exceptional here.

System V R4, HP-UX, NetBSD, Linux, Solaris, AIX, IRIX, FreeBSD, OpenBSD, DragonFly BSD, OpenVMS, QNX, Syllable, MINIX, BeOS/Haiku, RISC OS, GP2X, Dreamcast, Gamecube, PSX, PS2, PS3 (maybe NetBSD or FreeBSD ish), PS4(based on FreeBSD), AmigaOS 4, Morph OS, AROS, Open Firmware, and probably a whole bunch of other systems use ELF binaries.

I think you can had existing mainframe OSes to the exceptions list.

I would have thought so, but this https://www-01.ibm.com/support/knowledgecenter/SSLTBW_2.1.0/... suggest maybe otherwise.

So it seems, thanks for the hint.

This question reminds me of this Twitter thread, started by @raymondh:

Programmer survey: Do visualize consecutive memory locations as running left-to-right, right-to-left, top-to-bottom, or bottom-to-top?


Left-to-right for me; vertical is confusing because I like to put address 0 at the top and address MAX at the bottom, but then the stack "grows down" and I get confused.

Strangely, on the processors I'm familiar with, the stack would grow up in that orientation.

Left-to-right when I'm reasoning about general address spaces, but top-to-bottom when reasoning about the stack specifically.

Good reasoning for common processors that grow the stack downwards. Those are what I've worked with. Mentioning it because I saw about some other types in this thread.

> I found it pretty surprising that the x86 assembly instruction set knows about the C stack pointer and how the stack in C is laid out. Of course, you could invent your own new smart stack format, but you wouldn't be able to use the callq and retq instructions to call functions.

I find it pretty amazing how modern "programmers" are lacking in knowledge of basic computer science concepts. Let me nit-pick this statement as it is obviously wrong.

The x86 assembly language knows ABSOLUTELY nothing about the "C stack pointer and how the stack in C is laid out". I will now illustrate why the first part of this statement is wrong.

You can have a bare-metal x86 device that is only programmed though assembly and push/pop would still be available. You are clearly confusing the roles of the compiler, operating system, and the language your program is written in. On a modern operating system like Linux, each program, when started, runs in a new process, which has its own view of memory due to the virtual memory system. Each program has its view of memory split into several sections, and one of them is allocated as "stack space" (it's usually the high memory addresses but this is architecture and/or OS dependent). When the program is executed, %rsp would be set-up appropriately by the OS, and on context switches it would save/restore it. Moreover, the actual address of your stack in real memory is different for every program since each process/program has its own mapping of virtual to physical memory, again provided by the OS.

Now, the second part about the "layout of the stack". The language you write your program in (in this case C) knows ABSOLUTELY nothing about the layout of "the stack". However, when you give your source code to the compiler, it will analyse it and, because C is low-level and staticly typed, will infer HOW it actually needs to lay out the stack FOR you. The output of the compiler is assembly code, and the assembly code produced "knows" about the layout of the stack because it has been written that way, generated by the compiler. It is, actually, the compiler that knows about all of this, not the source or assembly language.

Also, the example with python was a bad one - python is an interpreted language, and AFAIK it has no concept of a stack, and even if it does, it is the interpreter that allocates frames for it's own computations, not the program that you are running. And, moreover, even if python does allocate stack frames, they are most probably on the heap since it needs to do so dynamically (again, since it's not compiled but interpreted).

> python is an interpreted language, and AFAIK it has no concept of a stack,

Here, I could say, "I find it pretty amazing that old-timey[1] 'programmers' fail to understand modern VM architecture". In fact, Python's interpreter (cpython) is partly based on the concept of a stack.

But instead, I'll just explain that most modern VM's are either stack-based or register-based. Stack-based VM's, like Python's, are (not surprisingly) based on a virtual machine that uses a stack as it interprets Python's bytecode. Yes, not the same level of abstraction as the call stack, but it's still a stack.

So instead of expressing dismay at how little modern programmers know about your little corner of the programming world (low-level systems programming), why not spend some time pondering about how so much programming has moved up the stack, and how abstractions have grown multifold.

In the 1970s and 1980s, it used to be that the typical self-taught programmer had to know all the details low-level C and assembly, because that was almost all there was to know, unless you were a high-level Lisp hacker.

I mean, I'm a little surprised at how little some C kernel hackers know about how the modern Web works (particularly regarding async programming patterns; eg, been asked by one why I didn't just use a mutex somewhere in JS program, wasn't trolling.). But then I realize that there's just too much for any but the most exceptional programmers to truly understand the full modern stack, from system architecture, to GPU, to networking, to Web, on so on.

Why not instead welcome how this particular web-focused programmer is interested in these low-level concepts? Is it that it wouldn't give you as much of a chance to show off your knowledge?

1. Said by another old-timey programmer.

VMs, and VM-based languages, are applications that run on top of all that kernel programming.

Applications come and go so there's no reason to feel amazed that not everyone know how the Python VM works - especially considering the details aren't considered a core part of the Python curriculum, and they're almost completely inaccessible to a typical Python user.

But not understanding the heap or the stack has real implications for application performance and stability in whatever language or VM you're working in. The abstractions have gotten more abstract, but developers who no longer understand what happens when you run out of memory, or have a buffer under/overflow, or your stack explodes because you're using recursion without tail calls, are definitely a bad thing.

You seem to be arguing that anyone who does web or VM dev doesn't really need to know what happens inside a computer. I think that's an unhelpful view, because it implies that all those abstractions "just work" and no one who uses them needs to care how.

Do they really "just work"? I think the evidence suggests they don't. Perhaps if developers understood hardware better that could be improved.

> VMs, and VM-based languages, are applications that run on top of all that kernel programming.

I'm guessing the vast majority of all modern programming is done using VMs: Python, Java, C#, Lua^, JavaScript^, etc. Basically, if your language uses a runtime, it's likely using a documented or undocumented bytecode and VM. They're pretty important to understand.

> You seem to be arguing that anyone who does web or VM dev doesn't really need to know what happens inside a computer.

That's not my argument. I highly recommend new programmers to understand as much of machine architecture as reasonable. I'm just arguing that there's no place for a systems programmer to be so dismissive toward a web programmer because he/she doesn't know his part of the stack as well as he does. There is always going to be a gap in knowledge about low-level programming between these two specializations. That gap is no reflection on the intelligence or competence of the higher-level programmer.

In this particular case, I believe that Julie has only been programming for a few years. If that's the case, and assuming she only retains a fraction of what she writes about, she's further along than most of us were at that point (me included). Just because many of us bluff and bluster our way through interactions with our peers, and Julie is up-front about her state of knowledge, doesn't mean that she's an inferior programmer (at all). I suspect her honest curiosity will carry her along to the top of our profession in due time.

So if the parent had written something like, "it's so important for new programmers to work on this type of material, and it's heartening to see these kinds of blog posts from younger programmers", then I would have applauded.

That also would have been more in line with the HN guidelines ("avoid gratuitous negativity"), rather than the approach that was actually taken.

^ If you want to advance the "Lua, Python, and JS" aren't VMs, but rather interpreters, you'll need to convince the likes of Mike Pall and Lars Bak that they're not working on a VM (both have explicitly referred to their creations as "VMs").

If everyone needed to have the same level of understanding of all of the technology they used, we would never get technological progress.

> "I find it pretty amazing that old-timey[1] 'programmers' fail to understand modern VM architecture"

Especially given that they've been "modern" from around 1970s, not changing much since then (tracing JITs are likely to be the only significant improvement).

> how little some C kernel hackers know about how the modern Web works

Uhm... Screw the web. It's boring and irrelevant. It does not really worth knowing. It's better to stay away from it, for a sake of sanity.

> async programming patterns

Had been around long, long before all that web crap.

> I find it pretty amazing how modern "programmers" are lacking in knowledge of basic computer science concepts.

IMO this is the natural evolution of programming. It has gotten remarkably easier to write complex programs that deliver real value to society. How fantastic is it that modern programmers don't need to understand the science of computers in order to be productive?

Yes, we will always need science and engineering and folks who understand the underlying technology. But the field is broadening into new places. Like it or not, when uncle Fred writes an excel macro, he's programming. And good for him!

I mostly agree with you, but I actually think the original poster was incorrect in calling such low-level details "basic computer science" concepts. They are basic systems programming concepts. One can know a lot of computer science without knowing such things - you can probably find computer science professors who do strictly theoretical work that wouldn't know these details well enough to explain it.

I'd be very surprised to meet a computer scientist who doesn't know what a stack is. It really is a fundamental part of computer science.

I agree with your statement. However, that is not what I meant. The poster gcc_programmer was talking about very specific points about how "the stack" (not a stack) for a stack-based language gets implemented.

Good point, thanks for the correction.

wellll if you are going to nitpick you ought do it a little more cleanly: you are confusing machine language with assembly for starters, and even at the level of machine code, processors such as the x86 family were designed keeping support for higher level language conventions in mind, so it's difficult to pretend that there is no commonality of stack conventions baked into the microcoded x86 virtual machine.

Concepts you refer to such as "assembly language" exist a level up from bare metal, and at the same level as object libraries, linkers, and loaders and that implies full cooperation WRT calling conventions.

Your post has good technical information, but the attitude is not helpful. Please do not scold people for having less knowledge than you. It is not welcoming.

Same here. I guess this relates to why it is so difficult to hire people. Just a couple days ago, somebody crashed and burned in the interview by not knowing how to do bit manipulation. Why are these people allowed to graduate?

Those apps ain't going to write themselves.

I have a lot of fun explaining to people that the architecture I work on (IBM Power aka PowerPC) uses calling conventions that for many function calls doesn't touch the stack whatsoever. :)

How does it work? Pass the return address in a register and have enough caller-saved registers that leaf functions can often do without spilling? Or is there a more sophisticated mechanism?

Apple's Developer documents[1] detail it pretty well

[1]: https://developer.apple.com/library/mac/documentation/Develo...

a) Return address is passed in the "link register" - our call instruction is "bl" (branch and link) and return is "blr" (branch link register)

b) We have 32 general purpose registers - the ABI defines a certain range of registers (8 general purpose registers + a bunch of floating point and vector registers) for passing parameters

c) The register range allocated for passing parameters is designated as "volatile", so the caller can't rely on them coming back unmodified. This gives you a few registers to play with.

(NB: I'm not hugely experienced in PPC assembly programming, I've just had to touch it a couple of times and so I've read bits and pieces of the ABI, don't trust what I say here...)

Aha, but if you have at least function 2 calls, assuming none is inlined, you HAVE to use the stack to save the prev LR. I actually think it's one of the default processes at the beginning of a function, but maybe the compiler optimizes in many cases.. I'm really curious if you can find some assembly examples of functions where you don't use the stack. I've seen it on Renesas MCU's and on PPC but only for veery simple functions.

Also, the architecture isn't exactly the one to blame for this, it's more dependant on compiler calling conventions. I see things in the arch that facillitate this, but I see it possible to implement this mechanism with or without this arch support. I've only seen certain registers being reserved for something in the compiler docs not the MCU docs.

Eh, maybe I'm rambling.

Oh, yeah, only for leaf functions. Obviously not for multiple levels of calls.

My understanding is that PPC encourages register based parameter passing just by virtue of having a tonne of registers.

Nope, non-leaf functions don't really need a stack. Normal compilers will do it, but there are plenty of registers that can be used instead.

"I found it pretty surprising that the x86 assembly instruction set knows about the C stack pointer and how the stack in C is laid out. Of course, you could invent your own new smart stack format, but you wouldn't be able to use the callq and retq instructions to call functions."

Exactly the problem that everyone runs into coming up with more secure alternatives to C or existing stack. One example was a 10+% penalty for MULTICS-style, reverse stack due to indirection necessary. It's why I tell people to use a RISC system if they want to build security on. Ideally, microcode will come back with another HLL-to-microcode compiler so we can implement secure alternatives directly as instructions. One could also implement abstract machines like JVM's that way with no attacks through abstraction gaps. Not to mention efficiency and arbitrary atomic actions.

This sort of reminds me of stack virtual machines vs register based virtual machines.

That is the low level code (assembly/bytecode) of function calls in a programming language is very different depending on whether its stack or register based.

Usually I browse quickly the HN comments before reading (or skipping) an article, in this case I browse through the blog post first and then I thouhgt, "now the HN comments are going to teach me a lot, pointing out all the errors in this post", wasn't dissapointed.

"I was really surprised that assembly instructions make assumptions about how memory is organized and what information is in which registers."

This layout is a part of the specific architecture the compiler is generating the assembly code for so it better know where the stack pointer is stored on your platform :)

It's basically the same way as if you were manually writing Assembly code yourself - you would need to know what registers are used for what purpose on your platform.

Partly off-topic, but interesting, IMO:

This thread (the word "stack") triggered a memory of stack machines, which I had read about much earlier, in computer mags such as BYTE. Looked it up just now, and saw:


which says that the JVM, CPython and a couple of Ruby VMs are virtual stack machines.

Yet another wrinkle or dimension to this is the C vs Pascal calling conventions [1]. I remember from Win 32 SDK days (MS VC, Borland C, etc.), e.g.:

"long far pascal" as part of the type decl of functions.

[1] which was about the order in which function args get pushed onto the stack before the jump to the function is taken - L to R or R to L.

Used to wonder what non-tech people would think if they heard a programmer saying such stuff out aloud :)

Great post!

If you liked this kind of descent to the low-level realm, the post author (Julia Evans) did a great presentation on Pycon 2015:


Sometimes when shit hits the fan it's nice to know what is going on behind all those layers of abstraction.

Many ISAs do not provide any specific stack handling instructions, and yet a C ABI is based on a stack model very similar to x86 one. Instructions are irrelevant, ABI is only a convention.

Sure. But I think it's fair to say that a structure that's part of the ABI is more distinguished/fundamental than an ordinary datastructure defined in the language.

If a stack grows down how do you generally draw it on paper? Do you start with some arbitrarily high address like 8000000 and just subtract from it? What happens when you get to 0?

FPGAs are a way to avoid stacks too. E.g use c\ash to convert Haskell to a circuit.

Most modern FPGAs got two-port block rams. Ideal for stacks and FIFOs. Many non-trivial designs are using stacks one way or another.

That's a radically different design domain, though.

After reading about FP, TCO, the question shifted from `what` to `why`.

In case anyone is as confused as I was at first, agumonkey is referring to the fact that Functional Programming languages are capable of Tail Call Optimization, which allows functions to be composed without adding a new frame to the stack.

It's kind of interesting that TCO goes together with FP, because logically it makes at least as much sense for OOP, as Guy Steele pointed out at http://www.eighty-twenty.org/2011/10/01/oo-tail-calls.html

The concept of combining many values to get another value of the same type is the heart of FP - function composition is just the most prominent example of it. That definition of IntSet is effectively a cons list, a very FP-oriented data structure. It's not an example of OOP needing tail calls because it's not an example of OOP style (at least, if we believe that's something disjoint from functional style) at all.

I don't consider OO to be disjoint from functional; the first OO language was the untyped lambda calculus (as William Cook's said). In the taxonomy at http://www.paulgraham.com/reesoo.html we're looking at 2. protection and 6. "all you can do (with an object) is send a message", like how all you can do with a function is call it.

I don't agree that FP owns compositionality. In the 80s and 90s I used to have to argue for the value of functional programming; nowadays the closeminded attitudes come more from the "other side", and from one POV that's heartening progress, but from another it's silly to be running culture wars around technical ideas.

I've never seen an OO article / book that managed to demonstrate composition even at 1% that the average Haskell book does. I'm trying not to take sides too much, but I grew up in OO-land, left and keep getting mesmerized by what the FP guys have been doing. Most recently monadic thinking (I know, monads) but bind as an abstract composition operator is maddeningly beautiful. Although I've heard that Monads themselve don't compose ~_~;

It's great how Haskell and friends keep spreading, isn't it? Here is an OO book I like: http://www.erights.org/talks/thesis/

Thanks for decyphering cryptic acronyms. To add to this, Tail call makes you realize that logically, function call != pushing on stack. Then I started looking at how people handled recursion on hardware before stack support was added. And how people managed the stack explicitely themselves. (can't find that article again sadly).

Uh, lots of languages are capable of tail call optimizations, return value optimizations. Things are not pushed to stack unless optimizations are impossible I believe

Well done.

I feel old, though.


Rudely dismissive comments are not allowed here. We ban accounts that do this, so please don't.

If you know more than other people, politely help them learn. If you can't or don't want to do that, please don't comment.

Very informative, thanks for the summary!

I hoped to find something about "the stack" as in "full stack development" (DB, middleware, frontend etc) because _that_ stack also needs some sorting out.

That hipstor buzzword hardly deserves "THE" in front of it.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact