There are also real stack machines, where instructions take operands from the stack and push results on the stack. The Burroughs 5000 and the English Electric LEO-Marconi KDF9 introduced that architecture around 1958. It lives on in the x86 FPU register stack.
FORTRAN and COBOL originally forbade recursion. Compilers could allocate all temporary memory at compile time. Stack overflow is impossible. Some real-time systems still work that way.
And you can model it pretty well even without a "head pointer", just with moves between all registers, e.g. push(x) is ST(7) = ST(6), ST(6) = ST(5), ..., ST(1) = ST(0), ST(0) = x.
I've implemented it as such in a lifter from x86 machine code and all the redundant moves just disappear, if the uses of the pseudo-stack are balanced, and you're left with traditional registers.
The Burroughs 5000 and the rest of their line was quite interesting architecture. Arthur Sale once told me that he would use the B6700 architecture as a "universal counterexample" when talking about language conventions, such as C using zero as NULL.
Aside: it's interesting how complex things which I learned in my younger years have become. From understanding how a car operates and being able to perform maintenance on it (compare rebuilding a carburetor or adjusting the timing to fuel injection and computer controlled distributorless electronic ignition, etc), to this topic, computer architecture. I'm glad I first learned these things when they were simpler. I think they might seem daunting to start learning about today. Though I guess like anything else, you start at the surface and keep digging till you're satisfied you understand the topic well enough. It's just that if you like to get to the bottom of those rabbit holes, they're all so much deeper now. Shrug.
p.s. I really enjoy reading the author's posts. They are often about something I'm familiar with, but it's neat to read someone writing about learning these things through fresh eyes. I also sometimes realize I don't know something as well as I thought. So thank you for sharing.
In college I met a guy that had co-authored Marathon, the first big hit for Bungie Studios. (This was Ryan Martell, not Jason Jones.) Ryan taught me how my C code compiled to assembly, and we were using 68K Macs back in that day, so it was pretty straightforward to understand once you had somebody to show you the basics.
Oh man, that was a revelation. Stepping through compiled code made concepts like pointers, stack/heap, registers, etc. just as clear as day. Those handful of hours with Ryan did more for my understanding of computing than the whole rest of my college courses. (Well, excepting my operating systems class, that one was good.)
Funny enough, my first job was working on an embedded operating system that didn't have source-level debugging, I had to debug in 68K assembly. So I'll just shout this out to the ether: thanks again, Ryan.
I had the advantage that I had a GUI text editor. While I'm now quite comfortable using VIM and command-line editors, I sometimes think if I had tried to become a developer before the advent of GUI editors the learning curve would have been too steep.
Perhaps under the hood things are more complex, but the interfaces have improved.
By 1998 I was convinced that every app ever would be online. Today I've come full circle, avoiding web dev like the plague (too many tools) and stick with desktop and phone apps.
For beginners, or those who work much higher up the technology chain it provides a nice launching off point for further investigation.
I find her style quite fun and engaging.
Technically correct, but not at the register/stack management level: LLVM still controls all of that, Rust only chooses the order of the arguments and whether they are passed "directly" (in registers or on the stack, when registers are exhausted) or "indirectly" (pointer to the value, whether it's on the stack or somewhere else).
So the following statement, "which means it sets up its stack differently" is incorrect: even if Rust wanted to, it couldn't do so with the LLVM backend.
I think Rust used to do the same thing and use segmented stacks (rather than the C stack) until late 2013 or early 2014, point at which the language's base was drastically lowered (in the abstraction stack) away from builtin GC, green threads and the like.
It was managing its stack, but it does that even today: when creating a thread, it allocates a 8MB stack.
Within that stack, however, function calls work as they do in C, and they've been like that effectively forever (although the Ocaml bootstrap compiler might have had a non-LLVM x86 backend, not sure about that).
Assuming I'm not mistaken about their existence, is there a reason Rust couldn't use one?
GHC and HiPE appear to have gotten themselves LLVM calling conventions, I guess.
But none of this can be customized further without modifying LLVM.
Besides, none of these choices seem to affect anything other than the (ordered) set of registers available for arguments, and the sets of caller/callee-save registers (i.e. what you expect from x86 calling conventions).
Once you get to the stack it's really all the same.
What I meant was that Rust could have picked any of those choices, which would make it incompatible with C (it already is for some argument types because of the simplistic handling around them), but it couldn't have used its own special one, not without adding it to all LLVM targets it wants to support.
I also wouldn't really say assembly follows the C stack pointer. The C language makes no assumptions about stacks, stack pointer or return addresses. It just describes "functions" to which the execution can be transferred and from which can be returned. How this is implemented, is up to the compiler designer and is platform dependent. I wouldn't be surprised to find out there were platforms without a stack/stack pointer but with a C compiler
Why do you say "unlike on ARM"? In 25 years I've never seen anything but a full, descending stack used as the program stack. You can have ascending stacks, but that's more an artifact of the fact that `pop` and `push` (which didn't exist in the assembly language until Thumb) are just specific uses of the general multiple load/store instructions `LDM` and `STM`, which can "write back" an incremented or decremented base register.
This was the book I used in case folks are curious http://www.springer.com/us/book/9783642149085.
IA64 is Von Neumann architecture, but still has separate stacks. In fact, at least with clang with SafeStack enabled, you can have separate stacks independently from architecture.
Technically it's up to your app to use that memory as it pleases. Of course, 99.9999% of programs use the standard C library functions malloc() and free() (or "new" and "delete") which handle all this for you, and keep track of which chunks of memory are in use. They're also cross platform so you don't have to worry about what the system API in Windows is for low level memory allocation. But you don't have to use them if you really don't want to (though in that case be careful calling shared library functions that might call malloc() behind your back).
Let's say you assign a variable in c, like so
int i = 0;
No matter where you do this, you're assigning a variable on the stack. A location on the stack is set aside and your variable goes in there.
This works great for small-ish things that you keep around to work on, but sometimes you want a big thing -- or you want something that lives on after the function dies.
In this case, you create something on the heap. In this case, you tell the computer to set aside a chunk of memory somewhere else -- the heap -- and let you store things there. You can continue using this location after your function is completed.
For an integer, it'd look like this.
int* j = malloc(sizeof(int));
So if you have a function creating bitmaps, you might have a lot of local (stack) variables, and you'd probably either create or use a pointer to a big hunk of memory where you kept all the bits for the bitmap (heap)
One way of thinking about it is if you're using a stack variable, only one piece of information gets stored in the computer -- the value. If you're using a heap variable, you're actually creating and storing two pieces of information, the value (on the heap), and a pointer to the value (on the stack). (hat tip to StackOverlow for the nice example http://stackoverflow.com/questions/6770596/memory-allocation... )
> No matter where you do this, you're assigning a variable
> on the stack.
The important insight is that if you have a piece of memory that's scoped to a function, it's so trivial to free that memory that even C does it for you automatically. This is important because of what happens when you want to take this one step further: is it possible to make this same property apply to memory that's allocated on the heap rather than the stack?
For one, there's escape analysis, which is a static analysis that tries to determine if a piece of heap-allocated memory has the potential to leave a function's scope. If not, then you can allocate that memory on the stack instead without changing the semantics of your program. Go does this, and it's an important optimization for reducing GC traffic.
A step beyond that is linear typing (or maybe call it unique ownership), which is a static analysis that extends the "automatic storage duration" concept with one additional feature: a function can now hand off a piece of memory for some other function to free at the end of its scope. With this concept it's now possible to extend automatic storage duration to heap-allocated values as well, meaning you need neither a garbage collector nor explicit `free` calls. Rust does this, and it's one of the fundamental pillars of the language.
When a program starts, it asks the operating system for memory to store the program and all its initial data. The heap is the part of memory nothing is using yet -- it's just a heap of memory over in the corner for you to use later if you need it.
As well, I think about storing data in 2 ways:
A. When you only need the data as long as the current function is executing, you declare a variable in that function, and the OS includes it in the calculation for how big to make that function's stack. You use it then it gets cleared out when the function is done.
B. When you need the data to exist after the current function is done executing, but don't want to pass the data itself around as returns and arguments for other functions, you can ask the OS to give you part of that heap of memory to store the data in. But clearing heap data requires attention. Since we don't have the simplicity of clearing it when the function finishes, we generally have to keep track of what's been stored on the heap and make sure to free it up when we no longer need it.
Some languages implement a garbage control process to track and clear no-longer-needed heap memory for us, but really the more you have on the heap, the less memory available to other programs on the computer, which isn't cool.
If you remember back to the days of needing to restart Firefox after half a day because that one open tab was somehow using 1.8GB of your RAM, it's an example of what happens when a lot of things are allocated to the heap and aren't freed.
(1) This step is a bit more complicated due to dynamically-linked libraries
Although call stacks aren't the main topic, it gives a really nice outline of them
As for the heap, I noticed another description, but I'll have a go at describing it in a slightly different way. Back in the old days you had a big chunk of contiguous memory. It started at address 0 and ended at address 65535 (if you were lucky enough to have 64K of RAM!).
As you saw in the article, you need to set aside some of that memory for the stack. In most languages that are compiled to machine language, the stack holds storage for variables local to a function as well as the function return address (the object that holds all the information for one function is called a "stack frame"). As others have said, other organizations are possible, but this is the most common.
Allocating memory on the stack is ideal because it is really easy to allocate and deallocate. When you enter a new function, you look at how big each of the local variables are and you simply move the "stack pointer" that many bytes (either up or down, depending on the architecture of the machine). When the function is finished, you simply move the stack pointer back to the start of that stack frame. Super easy and super fast.
The problem is that as soon as your function finishes, the variable is deallocated. So let's say you have a function that loads a picture from disk. You make a big array in your picture loading function, put the picture data in it and return it from the function... uh oh... As soon as the function ends, the memory is deallocated :-(. That data is "gone" (what's worse is that the data is still there, so it will appear to work until the stack grows big enough to overwrite that data -- at which point your picture will be corrupted). So we need a way to allocate memory that will stick around until we tell the system that it is not needed any more.
That's what the heap is. I can call a system function (usually called "malloc", which stands for "memory allocate") and say "I want 4K of memory". It finds a contiguous chunk of 4K of memory, makes a note that the memory is in use and then gives me a pointer to it. I can then use that pointer to write into the memory. The system will not allocate any memory in that area unless I tell it to "free" the memory (usually with a call to something called "free"). After that it can be used again.
The place the system looks for in the memory for this kind of allocation is called the heap. So you can imagine the big block of memory in a computer is divided up into at least 2 sections: the stack and the heap. Often a computer system will put them at the opposite sides of the memory so that you can grow the stack as much as possible. So I might start the stack at 65535 and grow downward. And I might start allocating the heap from 0 upwards. In practice there are other things you need to reserve memory/addresses for, so this is not how it really works, but you get the idea.
Many programmers like allocating from the heap because it is easy. You don't have to worry about the scope of your program. You allocate a variable and it is there until you tell it to go away. But there are a couple of problems. First, if you forget to free the memory, it never goes away. This is called a "memory leak". Also, you might accidentally try to use the memory after you freed it, resulting in the same corruption noted earlier. But there is yet another problem.
Let's imagine that I allocated 1024 40 byte strings. That will take up 40K (almost all of my memory on this hypothetical old machine). Let's say that every second string is only needed at the beginning of the program, so let's free them all. That frees up 20K. However, the heap is now "fragmented". It has 40 bytes allocated, 40 bytes unallocated, 40 bytes allocated, 40 bytes unallocated... If I want to allocate 45 bytes, I will have to walk through 512 unallocated blocks, noting that each is too small, before I decide to allocate at the end. This is very slow. If there isn't a 45 byte chunk available, then my memory allocation will fail, even though I have 20K of free memory.
So, for many old timers like me, we favoured allocating on the stack. This leads to fast, efficient, safe code that is also surprisingly elegant. Unfortunately, it is also hard to design ;-)
These days, operating systems do a lot behind the scenes to make sure that you don't fragment your heap too much, etc, etc. Modern interpreted languages use garbage collection and allocate on the heap incrementally -- as if it were stack allocation. Occasionally they wander through and deallocate unused variables and compact the heap. Some of the algorithms used are very sophisticated and quick.
I keep thinking about designing a functional language with stack allocation, though... It appeals to my roots ;-).
Hope that description helps. Keep in mind that many details will be incorrect for any specific system. I mostly just tried to make the description simple, not accurate.
You might be interested in concatenative programming, effectively a modern functional take on Forth-like languages. Factor is a mature, dynamically typed concatenative language, with an excellent interactive environment reminiscent of Smalltalk. I have also been working on a statically typed concatenative language for a while now.
The underlying OS call is VirtualAlloc on Windows and brk/sbrk/mmap on Linux.
The defining feature of heap-allocated memory is that it's persistent up the call stack; a common newbie error in C is to return a pointer to something allocated on the stack, which will become invalid the instant the other function sees it: If a() calls b(), everything in b()'s section of the call stack is cleaned up when b() returns to a(), so if a() tries to dereference a pointer to something b() allocated on the stack, the program will die.
The flip side is that programmers have to manage this by hand, which is the cause of memory leaks. Garbage collection is used to mitigate this; to, at the very least, turn a program from one with quickly-growing memory usage into one with slow-growing memory usage.
So System V came out in 1983, which postdates the introduction of the x86 architecture because the IBM PC was x86 from day one and it first came out in 1981. (I'm sure the 8086 is older, but it doesn't really matter here.)
Of course, it isn't the only standard out there for x86. Microsoft exists, after all.
Anyway, all of these standards have to specify which registers get saved by the called function and which get clobbered (and, therefore, must be saved by the calling function if their values are important) and you can't know which registers will exist on a given ISA until it exists, so the specific System V ABI for the x86 couldn't possibly have existed until the chips did. Ditto the System V ABI for the x86-64, the MIPS, and all of the other ISAs for which there is such a document.
However, chip designers aren't stupid. They know that their chips will be used to run binaries generated by C compilers, and that C compilers like stack frames and registers. Therefore, modern ISAs are designed with those things in mind, and I wouldn't be surprised if at least a few chip designers worked with one eye on some ABI documentation.
(Compare modern chips with older minicomputer and mainframe ISAs, where call stacks weren't always available and where subroutine calls sometimes involved self-modifying code. The PDP-8, for example, did subroutine calls by writing the the return address to the first (12-bit) word of memory in the subroutine, and then jumping to the address just past it; returning was a simple matter of jumping to a memory location loaded from a known address, but recursion was effectively impossible. The PDP-8 was a fine machine for assembly language programmers, but almost perversely unfriendly to modern mid-level languages, such as C and Pascal.)
Or at least the better documented ones.
System V R4, HP-UX, NetBSD, Linux, Solaris, AIX, IRIX, FreeBSD, OpenBSD, DragonFly BSD, OpenVMS, QNX, Syllable, MINIX, BeOS/Haiku, RISC OS, GP2X, Dreamcast, Gamecube, PSX, PS2, PS3 (maybe NetBSD or FreeBSD ish), PS4(based on FreeBSD), AmigaOS 4, Morph OS, AROS, Open Firmware, and probably a whole bunch of other systems use ELF binaries.
Programmer survey: Do visualize consecutive memory locations as running left-to-right, right-to-left, top-to-bottom, or bottom-to-top?
I find it pretty amazing how modern "programmers" are lacking in knowledge of basic computer science concepts. Let me nit-pick this statement as it is obviously wrong.
The x86 assembly language knows ABSOLUTELY nothing about the "C stack pointer and how the stack in C is laid out". I will now illustrate why the first part of this statement is wrong.
You can have a bare-metal x86 device that is only programmed though assembly and push/pop would still be available. You are clearly confusing the roles of the compiler, operating system, and the language your program is written in. On a modern operating system like Linux, each program, when started, runs in a new process, which has its own view of memory due to the virtual memory system. Each program has its view of memory split into several sections, and one of them is allocated as "stack space" (it's usually the high memory addresses but this is architecture and/or OS dependent). When the program is executed, %rsp would be set-up appropriately by the OS, and on context switches it would save/restore it. Moreover, the actual address of your stack in real memory is different for every program since each process/program has its own mapping of virtual to physical memory, again provided by the OS.
Now, the second part about the "layout of the stack". The language you write your program in (in this case C) knows ABSOLUTELY nothing about the layout of "the stack". However, when you give your source code to the compiler, it will analyse it and, because C is low-level and staticly typed, will infer HOW it actually needs to lay out the stack FOR you. The output of the compiler is assembly code, and the assembly code produced "knows" about the layout of the stack because it has been written that way, generated by the compiler. It is, actually, the compiler that knows about all of this, not the source or assembly language.
Also, the example with python was a bad one - python is an interpreted language, and AFAIK it has no concept of a stack, and even if it does, it is the interpreter that allocates frames for it's own computations, not the program that you are running. And, moreover, even if python does allocate stack frames, they are most probably on the heap since it needs to do so dynamically (again, since it's not compiled but interpreted).
Here, I could say, "I find it pretty amazing that old-timey 'programmers' fail to understand modern VM architecture". In fact, Python's interpreter (cpython) is partly based on the concept of a stack.
But instead, I'll just explain that most modern VM's are either stack-based or register-based. Stack-based VM's, like Python's, are (not surprisingly) based on a virtual machine that uses a stack as it interprets Python's bytecode. Yes, not the same level of abstraction as the call stack, but it's still a stack.
So instead of expressing dismay at how little modern programmers know about your little corner of the programming world (low-level systems programming), why not spend some time pondering about how so much programming has moved up the stack, and how abstractions have grown multifold.
In the 1970s and 1980s, it used to be that the typical self-taught programmer had to know all the details low-level C and assembly, because that was almost all there was to know, unless you were a high-level Lisp hacker.
I mean, I'm a little surprised at how little some C kernel hackers know about how the modern Web works (particularly regarding async programming patterns; eg, been asked by one why I didn't just use a mutex somewhere in JS program, wasn't trolling.). But then I realize that there's just too much for any but the most exceptional programmers to truly understand the full modern stack, from system architecture, to GPU, to networking, to Web, on so on.
Why not instead welcome how this particular web-focused programmer is interested in these low-level concepts? Is it that it wouldn't give you as much of a chance to show off your knowledge?
1. Said by another old-timey programmer.
Applications come and go so there's no reason to feel amazed that not everyone know how the Python VM works - especially considering the details aren't considered a core part of the Python curriculum, and they're almost completely inaccessible to a typical Python user.
But not understanding the heap or the stack has real implications for application performance and stability in whatever language or VM you're working in. The abstractions have gotten more abstract, but developers who no longer understand what happens when you run out of memory, or have a buffer under/overflow, or your stack explodes because you're using recursion without tail calls, are definitely a bad thing.
You seem to be arguing that anyone who does web or VM dev doesn't really need to know what happens inside a computer. I think that's an unhelpful view, because it implies that all those abstractions "just work" and no one who uses them needs to care how.
Do they really "just work"? I think the evidence suggests they don't. Perhaps if developers understood hardware better that could be improved.
> You seem to be arguing that anyone who does web or VM dev doesn't really need to know what happens inside a computer.
That's not my argument. I highly recommend new programmers to understand as much of machine architecture as reasonable. I'm just arguing that there's no place for a systems programmer to be so dismissive toward a web programmer because he/she doesn't know his part of the stack as well as he does. There is always going to be a gap in knowledge about low-level programming between these two specializations. That gap is no reflection on the intelligence or competence of the higher-level programmer.
In this particular case, I believe that Julie has only been programming for a few years. If that's the case, and assuming she only retains a fraction of what she writes about, she's further along than most of us were at that point (me included). Just because many of us bluff and bluster our way through interactions with our peers, and Julie is up-front about her state of knowledge, doesn't mean that she's an inferior programmer (at all). I suspect her honest curiosity will carry her along to the top of our profession in due time.
So if the parent had written something like, "it's so important for new programmers to work on this type of material, and it's heartening to see these kinds of blog posts from younger programmers", then I would have applauded.
That also would have been more in line with the HN guidelines ("avoid gratuitous negativity"), rather than the approach that was actually taken.
^ If you want to advance the "Lua, Python, and JS" aren't VMs, but rather interpreters, you'll need to convince the likes of Mike Pall and Lars Bak that they're not working on a VM (both have explicitly referred to their creations as "VMs").
Especially given that they've been "modern" from around 1970s, not changing much since then (tracing JITs are likely to be the only significant improvement).
> how little some C kernel hackers know about how the modern Web works
Uhm... Screw the web. It's boring and irrelevant. It does not really worth knowing. It's better to stay away from it, for a sake of sanity.
> async programming patterns
Had been around long, long before all that web crap.
IMO this is the natural evolution of programming. It has gotten remarkably easier to write complex programs that deliver real value to society. How fantastic is it that modern programmers don't need to understand the science of computers in order to be productive?
Yes, we will always need science and engineering and folks who understand the underlying technology. But the field is broadening into new places. Like it or not, when uncle Fred writes an excel macro, he's programming. And good for him!
Concepts you refer to such as "assembly language" exist a level up from bare metal, and at the same level as object libraries, linkers, and loaders and that implies full cooperation WRT calling conventions.
b) We have 32 general purpose registers - the ABI defines a certain range of registers (8 general purpose registers + a bunch of floating point and vector registers) for passing parameters
c) The register range allocated for passing parameters is designated as "volatile", so the caller can't rely on them coming back unmodified. This gives you a few registers to play with.
(NB: I'm not hugely experienced in PPC assembly programming, I've just had to touch it a couple of times and so I've read bits and pieces of the ABI, don't trust what I say here...)
Also, the architecture isn't exactly the one to blame for this, it's more dependant on compiler calling conventions. I see things in the arch that facillitate this, but I see it possible to implement this mechanism with or without this arch support. I've only seen certain registers being reserved for something in the compiler docs not the MCU docs.
Eh, maybe I'm rambling.
My understanding is that PPC encourages register based parameter passing just by virtue of having a tonne of registers.
Exactly the problem that everyone runs into coming up with more secure alternatives to C or existing stack. One example was a 10+% penalty for MULTICS-style, reverse stack due to indirection necessary. It's why I tell people to use a RISC system if they want to build security on. Ideally, microcode will come back with another HLL-to-microcode compiler so we can implement secure alternatives directly as instructions. One could also implement abstract machines like JVM's that way with no attacks through abstraction gaps. Not to mention efficiency and arbitrary atomic actions.
That is the low level code (assembly/bytecode) of function calls in a programming language is very different depending on whether its stack or register based.
This layout is a part of the specific architecture the compiler is generating the assembly code for so it better know where the stack pointer is stored on your platform :)
It's basically the same way as if you were manually writing Assembly code yourself - you would need to know what registers are used for what purpose on your platform.
This thread (the word "stack") triggered a memory of stack machines, which I had read about much earlier, in computer mags such as BYTE. Looked it up just now, and saw:
which says that the JVM, CPython and a couple of Ruby VMs are virtual stack machines.
"long far pascal" as part of the type decl of functions.
 which was about the order in which function args get pushed onto the stack before the jump to the function is taken - L to R or R to L.
Used to wonder what non-tech people would think if they heard a programmer saying such stuff out aloud :)
If you liked this kind of descent to the low-level realm, the post author (Julia Evans) did a great presentation on Pycon 2015:
Sometimes when shit hits the fan it's nice to know what is going on behind all those layers of abstraction.
I don't agree that FP owns compositionality. In the 80s and 90s I used to have to argue for the value of functional programming; nowadays the closeminded attitudes come more from the "other side", and from one POV that's heartening progress, but from another it's silly to be running culture wars around technical ideas.
I feel old, though.
If you know more than other people, politely help them learn. If you can't or don't want to do that, please don't comment.