
Why is there a “V” in SIGSEGV Segmentation Fault? - caution
https://blog.cloudflare.com/why-is-there-a-v-in-sigsegv-segmentation-fault/
======
dzsekijo
Well some aspects are still not clear. If this thing was originally called
"segmentation violation", who, when switched to calling it "segmentation
fault"? Why we don't get

    
    
        Segmentation violation (core dumped)
    

when this thing fires?

Actually "violation" sounds much clearer to me. It's telling me that the code
I'm running does something that was not part of the contract. With "fault"...
well, it's someone's fault... probably someone else's fault... who knows what
happened... ¯\\_(ツ)_/¯

I wouldn't be surprised if it was found out to sound smoother to managerial
ears.

~~~
kazinator
Actually, the "segmentation" comes from segmented memory management, which was
integrated into in AT&T Unix. BSD went with paged virtual memory, and there we
get "fault" from "page fault", which is an often correctable situation that is
invisible to the application, but is sometimes an access violation.

So "segmentation fault" is a weird combination of terms.

And of course "core" is from "magnetic core memory" that nobody uses any more.

The SIGSEGV constant having come from AT&T Unix propagated into other
variants, for the sake of source code compatibility, even though those other
variants didn't use segmented memory.

I suspect that what happened was that hackers not working with segmented
memory at all somehow adopted the "segmentation" term from the AT&T code and
documentation, but did not warm up to the "violation" part, sticking with the
"fault" terminology of their paged world.

The "Segmentation fault" string you see displayed by the shell comes from the
strsignal function that maps signals to descriptive strings. I think that
originated in BSD Unix. "(core dumped)" is locally generated by the shell if
that flag is true in the process status.

Bash internationalizes that with gettext. Here it is in Hungarian:

    
    
      po/hr.po:msgid " (core dumped)"
      po/hr.po-msgstr " (jezgra izbačena)"
    

Estonia gets a cool one:

    
    
      po/eo.po:msgid " (core dumped)"
      po/eo.po-msgstr "(nekropsio elŝutita)"
    

Trivia: very early Linux kernel versions used 80386 segments for processes.

~~~
rob74
"hr" is Croatian, Hungarian would be "hu" (and no doubt it sounds cool too).

And yeah, a "necropsy" (synonym for autopsy) definitely sounds cooler than a
"dump"...

Otherwise: great summary, almost more interesting (and more compact) than the
original blog post!

~~~
zvrba
> "hr" is Croatian, Hungarian would be "hu" (and no doubt it sounds cool too).

And the translation is abysmal: "jezgra" = "core" (of nuclear reactor) or
"nucleus" (of an atom); the closest translation for "izbačena" is "ejected". I
believe that "core" comes originally from memory technology of the time
([https://en.wikipedia.org/wiki/Magnetic-
core_memory](https://en.wikipedia.org/wiki/Magnetic-core_memory)) so it should
have been possible to find a more meaningful translation.

EDIT: As a native Croatian speaker, I have NEVER used any program in Croatian
locale. It is simply unintelligible. Funnily, I find Norwegian (where I live
now) translations more approachable.

~~~
necovek
The goal of "izbačena" here is clearly to indicate that the memory has been
stored elsewhere. FWIW, "dumping trash" would be readily translated to
"избацити смеће" (izbaciti smeće) in Serbian, so I am surprised it's such a
stretch in Croatian. Perhaps the Croatian translation has been based off
Serbian one.

You are probably more accustomed to Norwegian translations because you never
learned Norwegian IT-speak before the translations appeared, whereas I imagine
you grew up on English interfaces while speaking Croatian (we term that
Srblish, maybe you've got Croglish :)).

~~~
zvrba
Croatian "izbaciti" is closest to this meaning: [https://www.merriam-
webster.com/dictionary/eject](https://www.merriam-
webster.com/dictionary/eject) or the phrase "throw out". F.ex., "Izbacivac"
[sorry, haven't bothered with setting up HR keys] is the guy throwing out too
drunk people from a club.

> "избацити смеће" (izbaciti smeće) in Serbian, so I am surprised it's such a
> stretch in Croatian

But that's exactly it! You "izbacis smece" and you don't care what happens to
it after. Likewise with "izbacivac", he throws out a person and doesn't care
what the person does afterwards. You don't "izbacis" something FOR someone to
use it after. That's why the translation is bothering me.

Whereas the kernel writes the core FOR THE USER to inspect it, back it up,
delete it, whatever.

The closest word I can come up with for English "dump" [https://www.merriam-
webster.com/dictionary/dump](https://www.merriam-webster.com/dictionary/dump)
is "ostaviti" ("leave around") ili "(is)pustiti" (in the meaning of "drop",
not "flushing the toilet" :D). A better literal translation would thus be
"Jezgra pustena." (Like a space probe is "pustena" into space and there it
goes.) I guess the authors originally used "dump" because it's kinda random
(system-wide config) where it ends up.

The most meaningful translation to Croatian would be, IMHO, "Memorija
sacuvana." [or "Memorija spremljena."] ("Memory preserved".)

> I imagine you grew up on English interfaces while speaking Croatian

Indeed. Some professors at the university did use Croatian terms, but
everything about it was awfully alien to me. "Thread" would be "dretva". Which
is ironic, as the professor explained that it's borrowed and mangled from
German, whereas we have a perfectly valid, even nice, Croatian word "nit" that
is literally "thread".

~~~
necovek
Those are all great points, but in English, "dumping" is also used when you
want to get rid of something/someone (dump a boyfriend, dump into trash...).

If anything, the translation is too literal, and you are advocating for a
better translation for the actual action. It's a common complaint, but it's a
hard balance to strike: literal translation is (usually) easier to translate
back (not the case here), but a translation that is more descriptive is easier
to understand.

I think I am in the same boat as you, in that I usually prefer descriptive
translations when a literal one uses a metaphore or concept that's alien to
the local culture.

Still, in this particular case, I'd use a less commonly used term (I imagine
spremiti/sačuvati is also used for "save") like "zapisana" or "zabeležena",
just so there's a better chance to keep 1-1 mapping between English and
Croato-serbo-bosnian-montenegrin language.

What English has mostly done was keep using old concepts (like "core" to
represent "memory"), or repurposed seldom used words which I always found
intriguing. Such approach would require some re-learning for us who grew up on
English IT terminology, but every profession has a specialised terminology
like that too (I like to bring up the example of maths, where in Serbian it's
integral and izvod for integral and differential: always try coming up with a
good native word, and if it _is_ good, it will stick :)).

Note that people translating free software are usually volunteers working
without much local support, and without an established vocabulary for all
these specialised terms, so they will frequently come up with awkward
translations.

~~~
necovek
I think I posted while you edited your post so you added a couple of good
examples for a better literal translation.

But this is exactly the point, it's hard work, it's not always done by people
who understand the actual concepts or history, and they are doing it in their
spare time. Imagine spending this much time and saying "I translated one
message".

Thus, I'd encourage you to contribute your suggestions upstream :)

------
coldpie
Never thought of that solution to segfaults before. Great trick for writing
bug-free programs, going to go integrate that into all my code now.

~~~
tom_mellior
You're joking, but this can be used in a semi-practical way to keep programs
alive and mostly functioning, see
[http://people.csail.mit.edu/rinard/paper/pldi14.pdf](http://people.csail.mit.edu/rinard/paper/pldi14.pdf)
for instance. The idea here is that you catch certain faulting operations and
drop/fix them: segfaulting store? ignore! segfaulting read? manufacture a
result value of 0, it's usually not too wrong. And by using LD_PRELOAD magic,
this can even be retrofitted onto existing applications without changing or
recompiling them.

~~~
RMPR
Signals always seem (at least to me) to be an early implementation of
exceptions

~~~
monocasa
Sort of? They're really an implementation of interrupts, but sitting on the
kernel/user boundary rather than the hardware/kernel boundary. It's a hold
over from when a process was really thought of as closer to a virtualized
computer rather than a distinct concept in it's own right. And it's not
uncommon for the interrupts managing CPU faults to be called exceptions
[https://wiki.osdev.org/Exceptions](https://wiki.osdev.org/Exceptions) , so
their nomenclature does converge if you squint hard enough.

~~~
Animats
_" It's not uncommon for the interrupts managing CPU faults to be called
exceptions, so their nomenclature does converge..."_

An interrupt and a CPU exception are different things. UNIX treats them
similarly because the PDP-11 did. An interrupt is something outside the CPU
wanting to be serviced, like an I/O completion. An interrupt can be deferred
during a critical section, which is what "preventing interrupts" does. Some
machines direct interrupts to one of many CPUs, so whoever is free can handle
I/O. Interrupts have priorities, queuing, and are handled like events on a
queue.

A hardware exception is the CPU doing something that stops execution.
Inaccessible memory - could be the need to page something in from disk, or a
program error. The OS has to decide that. Floating point overflow. Divide by
zero. An illegal instruction. The CPU can't continue. So exceptions cannot be
deferred, even if in a critical section. The CPU that raised the exception
must handle the exception; it can't be handled by another CPU.

UNIX/Linux signals are rarely used for I/O completions in user space, but that
is supported. See "aio".[1] Apparently Oracle uses this.

[1] [https://man7.org/linux/man-
pages/man7/aio.7.html](https://man7.org/linux/man-pages/man7/aio.7.html)

~~~
monocasa
CPU exceptions are very much a type of interrupt (and vis-versa).

You can see NMIs for examples of interrupts outside the CPU that can't be
deferred like the distinction you're making.

Additionally software interrupts are an example of interrupts that come from
user space and can't be deferred from user space's perspective, but must be
handled before their instruction stream continues.

You can also see processors like slave DSPs who's exceptions are routed to
other processors to be handled just like any other interrupt on that other
core. The N64's RSP, and the Cell's SPEs are great examples of this.

You gave the example of AIO for peripheral interruption to user space like an
interrupts, (which is used by more than just Oracle), but the classic example
is SIGLARM as a corollary to a timer interrupt.

This is not a Unix/PDP-11 thing, but pretty much every hardware arch and every
OS out there. I say this as someone who's ported a non Unix derived RTOS to
MIPS, PowerPC, ARMv7A, ARMv7M, ARMv8-A64, X86_64 linux user mode, Microblaze,
and SH4, and has written drivers for Linux, FreeBSD, Windows CE, Windows NT,
and that aforementioned RTOS.

~~~
Animats
_You can also see processors like slave DSPs who 's exceptions are routed to
other processors to be handled just like any other interrupt on that other
core. The N64's RSP, and the Cell's SPEs are great examples of this._

That's more of a support processor thing, where the special-purpose processor
doesn't really do interrupts. GPU exceptions usually create interrupts in the
controlling CPU, for example, rather than being handled within the GPU. (How
the GPUs should talk to the CPUs is a whole subject in its own right.)

Timer interrupts are usually deferrable.

The Cell. Is it totally gone now? (If they'd had, say, 16MB/SPE instead of
256K, it might have been good for something.)

~~~
monocasa
> GPU exceptions usually create interrupts in the controlling CPU, for
> example, rather than being handled within the GPU

I would say that's out of date. GPU exceptions typically don't exist for most
shader code (unmapped memory loads are just RAZ, division by zero is defined
and doesn't trap, etc.). For the ones that do exist, they're typically handled
on GPU these days for latency reasons, but that's just a config register to
route it externally or not.'

> Timer interrupts are usually deferrable.

I didn't say they weren't. Just like SIGALRM can be masked.

> The Cell. Is it totally gone now? (If they'd had, say, 16MB/SPE instead of
> 256K, it might have been good for something.)

16MB would have never made sense. You're only supposed to keep the working set
in memory, and 64x the amount of memory was never in the cards from a gate
count perspective.

------
mwcampbell
On a BBS forum in the 90s, I read some lyrics for a blues song where each
verse ended with "segmentation violation -- core dumped blues". Here is what
seems to be the definitive version of that song:

[https://www.netfunny.com/rhf/jokes/92q3/coredb.html](https://www.netfunny.com/rhf/jokes/92q3/coredb.html)

------
thomond
I always thought the V was actually 5 as System V UNIX. Maybe to denote a
change that started in that version.

------
ktm5j
The author makes a big fuss about the old UNIX documentation using sigseg
instead of sigsegv.. but then completely ignores the comment in the same line
that does use the word violation

------
waynecochran
> Long long time ago, computers used to have memory segmentation.

If you are using an Intel chip, they still do.

~~~
zaarn
While technically the modern 64bit CPUs still support segmentation in 16 and
32bit modes (not very well but it works), in 64bit if you're not setting the
segment registers to "everything" you're essentially operating outside
supported margins. Some strange things happen if you do that.

I don't recall exactly but I don't think segmentation was heavily used after
2000 or so, it doesn't really do a lot for you if you have page tables.

~~~
Erwin
One thing the segment registers are still used for are thread local storage
(on Linux). So you read data from FS (different per thread) segment but same
address, if you've prefixed your variable with __thread.

(Having said that, I remember optimizing thread local storage away by explicit
pointers some time ago in my code, because it was calling some function to get
the address constantly, so maybe there are some subtleties there)

~~~
rkeene2
FWIW, there was a good LWN article recently on the work to expose FS to
userspace control safely

------
jdxcode
I've always read it like Dracula is telling me there was a seg fault: "A seg
vault! Muah hah hah hah!"

~~~
arooaroo
Lol. Couldn't resist
[https://i.redd.it/efmm0153po551.png](https://i.redd.it/efmm0153po551.png)

------
fred256
It's interesting to see all signal names in that early version had six letters
(SIGQIT instead of SIGQUIT, even) but SIGPIPE was the exception. Was that one
added later?

(Also funny how the article says "this is from around 1978" when the date on
the listing says May 24 1976)

~~~
adrianmonk
I don't know the real answer, but I've always assumed it's because there's no
way to get the right "I" vowel sound without that trailing "E".

Also, when creating abbreviations, it feels weird to create one that is only
one letter shorter than the full version.

~~~
cesarb
> Also, when creating abbreviations, it feels weird to create one that is only
> one letter shorter than the full version.

This is Unix, which gave us the "creat" system call (an abbreviation of
"create"). [https://man7.org/linux/man-
pages/man2/creat.2.html](https://man7.org/linux/man-pages/man2/creat.2.html)

~~~
rkeene2
If it makes you feel any better, the creators of UNIX regrets this.

> Ken Thompson was once asked what he would do differently if he were
> redesigning the UNIX system. His reply: "I'd spell creat with an e.";
> Kernighan, Brian W.; Pike, Rob (1984). The UNIX programming environment.
> Prentice-Hall. ISBN 0139376992. OCLC 10269821., p. 204.

------
rkeene2
I started a project similar to the fictional "skip instructions that cause
segmentation violations" for SIGILL (illegal instruction) which tried to
implement SSE3 replacements on hosts without SSE3. It had two modes: replace
the illegal instruction in memory, or handle it in the signal handler:

[https://github.com/rkeene/sse3-emu](https://github.com/rkeene/sse3-emu)

------
anoncake
> Was there a "Segmentation Vault?"?

It's not that far fetched, there's a Referer header after all.

------
necovek
The original cited SIGSEG constant definition in the OP still has a
"segmentation violation" right there in the comment. Which suggests that
"violation" was the norm even then.

------
khm
Prior code is available. Before V4, there were no 'signals' per se; errors
were trapped individually with dedicated system calls.

------
fortran77
This didn't really answer the question! However, I've been using Unix since
the early 80s and never once wondered about this.

------
SomeoneFromCA
SIGSEG sounds inappropriate in some Turkic languages. Extra V kinda masks the
issue.

------
solarkraft
Huh, this doesn't explain why they added the V.

------
gcoguiec
Maybe V like in System V?

------
hbosch
The shape of a "V" is a fault.

------
staycoolboy
So much for the "do not change" comments. I love these archaeological digs
into Unix history.

