Segmentation violation (core dumped)
Actually "violation" sounds much clearer to me. It's telling me that the code I'm running does something that was not part of the contract. With "fault"... well, it's someone's fault... probably someone else's fault... who knows what happened... ¯\_(ツ)_/¯
I wouldn't be surprised if it was found out to sound smoother to managerial ears.
[EDIT] fixed a typo.
Neither PDP-7 (original Unix machine) nor PDP-11 had segmented memory. Segmentation was fundamental in the design of Multics, and I suspect the term, if not the signal itself of course, was carried over from there. Memory segmentation was pretty common back then, though far from universal.
By the way there were a lot of machines under the "PDP" umbrella with some vary different architectures. PDP-1, -7, and -9 had 18-bit words; PDP-5, -8 and -12 had 12-bit words; the PDP-6, -10 (and renamed from PDP -20) had 18-bit addresses and 36-bit words; and the PDP-11/LSI-11 had 16-bit words (this was later pumped up into the 32-bit VAX line). Unix only really ran on the 11s (and VAXes which BTW gave us the MIPS metric) apart from the one PDP-7 development machines. We used a variety of Ones on all those machines, often homegrown.
In those days it was quite common for a company to write their own OS and programming language just for their computers, and for their customers to do the same!
So "segmentation fault" is a weird combination of terms.
And of course "core" is from "magnetic core memory" that nobody uses any more.
The SIGSEGV constant having come from AT&T Unix propagated into other variants, for the sake of source code compatibility, even though those other variants didn't use segmented memory.
I suspect that what happened was that hackers not working with segmented memory at all somehow adopted the "segmentation" term from the AT&T code and documentation, but did not warm up to the "violation" part, sticking with the "fault" terminology of their paged world.
The "Segmentation fault" string you see displayed by the shell comes from the strsignal function that maps signals to descriptive strings. I think that originated in BSD Unix. "(core dumped)" is locally generated by the shell if that flag is true in the process status.
Bash internationalizes that with gettext. Here it is in Hungarian:
po/hr.po:msgid " (core dumped)"
po/hr.po-msgstr " (jezgra izbačena)"
po/eo.po:msgid " (core dumped)"
po/eo.po-msgstr "(nekropsio elŝutita)"
And yeah, a "necropsy" (synonym for autopsy) definitely sounds cooler than a "dump"...
Otherwise: great summary, almost more interesting (and more compact) than the original blog post!
And the translation is abysmal: "jezgra" = "core" (of nuclear reactor) or "nucleus" (of an atom); the closest translation for "izbačena" is "ejected". I believe that "core" comes originally from memory technology of the time (https://en.wikipedia.org/wiki/Magnetic-core_memory) so it should have been possible to find a more meaningful translation.
EDIT: As a native Croatian speaker, I have NEVER used any program in Croatian locale. It is simply unintelligible. Funnily, I find Norwegian (where I live now) translations more approachable.
You are probably more accustomed to Norwegian translations because you never learned Norwegian IT-speak before the translations appeared, whereas I imagine you grew up on English interfaces while speaking Croatian (we term that Srblish, maybe you've got Croglish :)).
> "избацити смеће" (izbaciti smeće) in Serbian, so I am surprised it's such a stretch in Croatian
But that's exactly it! You "izbacis smece" and you don't care what happens to it after. Likewise with "izbacivac", he throws out a person and doesn't care what the person does afterwards. You don't "izbacis" something FOR someone to use it after. That's why the translation is bothering me.
Whereas the kernel writes the core FOR THE USER to inspect it, back it up, delete it, whatever.
The closest word I can come up with for English "dump" https://www.merriam-webster.com/dictionary/dump is "ostaviti" ("leave around") ili "(is)pustiti" (in the meaning of "drop", not "flushing the toilet" :D). A better literal translation would thus be "Jezgra pustena." (Like a space probe is "pustena" into space and there it goes.) I guess the authors originally used "dump" because it's kinda random (system-wide config) where it ends up.
The most meaningful translation to Croatian would be, IMHO, "Memorija sacuvana." [or "Memorija spremljena."] ("Memory preserved".)
> I imagine you grew up on English interfaces while speaking Croatian
Indeed. Some professors at the university did use Croatian terms, but everything about it was awfully alien to me. "Thread" would be "dretva". Which is ironic, as the professor explained that it's borrowed and mangled from German, whereas we have a perfectly valid, even nice, Croatian word "nit" that is literally "thread".
If anything, the translation is too literal, and you are advocating for a better translation for the actual action. It's a common complaint, but it's a hard balance to strike: literal translation is (usually) easier to translate back (not the case here), but a translation that is more descriptive is easier to understand.
I think I am in the same boat as you, in that I usually prefer descriptive translations when a literal one uses a metaphore or concept that's alien to the local culture.
Still, in this particular case, I'd use a less commonly used term (I imagine spremiti/sačuvati is also used for "save") like "zapisana" or "zabeležena", just so there's a better chance to keep 1-1 mapping between English and Croato-serbo-bosnian-montenegrin language.
What English has mostly done was keep using old concepts (like "core" to represent "memory"), or repurposed seldom used words which I always found intriguing. Such approach would require some re-learning for us who grew up on English IT terminology, but every profession has a specialised terminology like that too (I like to bring up the example of maths, where in Serbian it's integral and izvod for integral and differential: always try coming up with a good native word, and if it _is_ good, it will stick :)).
Note that people translating free software are usually volunteers working without much local support, and without an established vocabulary for all these specialised terms, so they will frequently come up with awkward translations.
But this is exactly the point, it's hard work, it's not always done by people who understand the actual concepts or history, and they are doing it in their spare time. Imagine spending this much time and saying "I translated one message".
Thus, I'd encourage you to contribute your suggestions upstream :)
"In confidence": I miss the days when there was only serbo-croatian and croato-serbian :D (I grew up in Yugoslavia and still remember cyrillic.) Politicizing of the languages is just f*up. It says enough that I understand "urban" (i.e. newspapers/TV) serbian better than heavy croatian dialects from Dalmatia, Istra or Zagorje :p
> integral and izvod
Wow, "izvod" is really nice, I like it :D We used just "derivacija". If I had to guess what some eager translator would translate "integral" to, it'd be something like "ocjeljivanje" :D
Sad to say that's also not Estonian -- it's Esperanto. :-)
can confirm SEGV is in v7 signal.h from 1979:
where it is happily #defined as 11, and continues as such to this day:
(etc, in the other BSD-derived systems)
if anyone wants to dig further.
Thus, I always considered "paging" a shorthand for "swapping of memory pages".
(so does Hungarian, as sibling has pointed out)
It's a very nuanced and structured terminology, but does make sense after reading all those hefty and well-written manuals. People like to use those volumes as monitor risers these days, but those are often true standards of quality technical writing. I once was amazed by the clarity, so the VMS doc volume migrated to the shelf in exchange for some old conference proceedings volume of the same heft.
There's also the tennis fault. It's a noun corresponding to the adjective "faulty."
If the program wishes to do something different, it can register a handler, and you won't see the message. If you see the message, there was a violation. If you don't see the message... well, there was still a violation, but it got handled.
Depending on the program, a segmentation fault may be routine. That’s why it doesn’t make sense to call it a violation.
I'm not the only one who says this, look at the name, -V
edit: actually, though hardware support is required for certain OS features, it's the OS that sets up the segmentation and the fault handlers so ... it is ultimately an OS contract.
The contract with the OS is that if you access unmapped memory, your program is sent SIGSEGV. Just like the contract with open() is that it returns -1 if the file is not found.
Also reminds me of this: https://news.ycombinator.com/item?id=4157777
Sounds like something from Neuromancer
An interrupt and a CPU exception are different things. UNIX treats them similarly because the PDP-11 did. An interrupt is something outside the CPU wanting to be serviced, like an I/O completion. An interrupt can be deferred during a critical section, which is what "preventing interrupts" does. Some machines direct interrupts to one of many CPUs, so whoever is free can handle I/O. Interrupts have priorities, queuing, and are handled like events on a queue.
A hardware exception is the CPU doing something that stops execution. Inaccessible memory - could be the need to page something in from disk, or a program error. The OS has to decide that. Floating point overflow. Divide by zero. An illegal instruction. The CPU can't continue. So exceptions cannot be deferred, even if in a critical section. The CPU that raised the exception must handle the exception; it can't be handled by another CPU.
UNIX/Linux signals are rarely used for I/O completions in user space, but that is supported. See "aio". Apparently Oracle uses this.
You can see NMIs for examples of interrupts outside the CPU that can't be deferred like the distinction you're making.
Additionally software interrupts are an example of interrupts that come from user space and can't be deferred from user space's perspective, but must be handled before their instruction stream continues.
You can also see processors like slave DSPs who's exceptions are routed to other processors to be handled just like any other interrupt on that other core. The N64's RSP, and the Cell's SPEs are great examples of this.
You gave the example of AIO for peripheral interruption to user space like an interrupts, (which is used by more than just Oracle), but the classic example is SIGLARM as a corollary to a timer interrupt.
This is not a Unix/PDP-11 thing, but pretty much every hardware arch and every OS out there. I say this as someone who's ported a non Unix derived RTOS to MIPS, PowerPC, ARMv7A, ARMv7M, ARMv8-A64, X86_64 linux user mode, Microblaze, and SH4, and has written drivers for Linux, FreeBSD, Windows CE, Windows NT, and that aforementioned RTOS.
That's more of a support processor thing, where the special-purpose processor doesn't really do interrupts. GPU exceptions usually create interrupts in the controlling CPU, for example, rather than being handled within the GPU. (How the GPUs should talk to the CPUs is a whole subject in its own right.)
Timer interrupts are usually deferrable.
The Cell. Is it totally gone now? (If they'd had, say, 16MB/SPE instead of 256K, it might have been good for something.)
I would say that's out of date. GPU exceptions typically don't exist for most shader code (unmapped memory loads are just RAZ, division by zero is defined and doesn't trap, etc.). For the ones that do exist, they're typically handled on GPU these days for latency reasons, but that's just a config register to route it externally or not.'
> Timer interrupts are usually deferrable.
I didn't say they weren't. Just like SIGALRM can be masked.
> The Cell. Is it totally gone now? (If they'd had, say, 16MB/SPE instead of 256K, it might have been good for something.)
16MB would have never made sense. You're only supposed to keep the working set in memory, and 64x the amount of memory was never in the cards from a gate count perspective.
Terrifying but fun!
Next C statement is pretty cute, though.
I mean, it's not wrong, but it is crazy.
What you really need to do is figure out which instruction was executing and increment RIP appropriately.
colleague: "Caller says she's getting an error 'No Resumé' ?!?"
us: ... huh?..... it is a document management system, but still ...
me: Oh! On Error No Resumé
us: much hilarity. No Resumé indeed.
Beyond that there are sometimes very valid reasons for allowing segfaults to occur in certain conditions and catching/patching them. For instance in an emulator's dynamic recompiler you could optimize your generated code by assuming that most memory accesses target the emulated RAM region (generally a reasonable assuption). Then you map the RAM buffer in such a way that if it turns out that the emulated program was actually attempting to access an address outside of RAM a memory fault occurs, which you can then catch and recompile the offending code block with a slower but more comprehensive address decode.
Yeah yeah yeah but this ALSO fixes use-after-free bugs! Really an amazing little trick, I wonder why compilers don't just do it automatically.
ON ERROR RESUME NEXT
If you are using an Intel chip, they still do.
I don't recall exactly but I don't think segmentation was heavily used after 2000 or so, it doesn't really do a lot for you if you have page tables.
(Having said that, I remember optimizing thread local storage away by explicit pointers some time ago in my code, because it was calling some function to get the address constantly, so maybe there are some subtleties there)
A lot of old OpenGL code and uses 'hither' and 'yon' for the near and far clipping plane for this reason. :)
(Also funny how the article says "this is from around 1978" when the date on the listing says May 24 1976)
Also, when creating abbreviations, it feels weird to create one that is only one letter shorter than the full version.
This is Unix, which gave us the "creat" system call (an abbreviation of "create"). https://man7.org/linux/man-pages/man2/creat.2.html
> Ken Thompson was once asked what he would do differently if he were redesigning the UNIX system. His reply: "I'd spell creat with an e."; Kernighan, Brian W.; Pike, Rob (1984). The UNIX programming environment. Prentice-Hall. ISBN 0139376992. OCLC 10269821., p. 204.
It's not that far fetched, there's a Referer header after all.