
Unleashing use-before-initialization vulnerabilities in the Linux kernel [pdf] - nickpsecurity
http://www.cc.gatech.edu/~klu38/publications/ubi-ndss17.pdf
======
zellyn
I used to think only crazy ivory-tower research project proposed rebuilding
our operating systems from the ground up using sound methods, proved
correctness in the inner parts, and better languages: a nice idea, but a bit
crazy, and unlikely to succeed outside of academia.

The more time goes by, and the more ubiquitous our computing devices become,
the crazier it seems _not_ to take this approach. It sounds like Fuchsia is
(might be?) a step in the right direction… I'm excited to see what happens
there.

~~~
logicallee
But when it's trivial to try QUADRILLION fuzzing operations (sixteen four
gigahertz cores do this many operations every 4.1 hours) do you really need
formal correctness? Can't you brute-force correctness, by literally trying
each and every possible input and output, and literally trying every possible
state?

We have so much power it's obscene. We're producing SHA-1 collisions without
damaging or analyzing the algorithm, just using brute force. Why not take that
approach to kernel security?

I'd sooner trust code whose quintillion input and output states have been
exhaustively brute-forced over a few months (so that literally every possible
path and value that code can take has ACTUALLY been run), than code which has
been formally proven to be correct but without having been subject to such a
test.

~~~
caf
_Can 't you brute-force correctness, by literally trying each and every
possible input and output, and literally trying every possible state?_

No.

Take a simple function that has two 64-bit numbers as input and one output;
your sixteen 4Ghz cores would take over 11 billion times the current age of
the universe to verify it.

You can, instead, do things like symbolic execution... and it turns out, that
kind of thing is often a big part of "formally proven correct".

~~~
logicallee
What do fuzzers do in practice? (I am not up on the state of the art)? They
try to explore edge cases, but how do they know what is likely to be an edge
case? Or how do they try to get coverage of their fuzzing?

~~~
cnvogel
They no longer just try out random combinations or cover the input space
exhaustively (which is not feasible anyway).

Nowadays they use instrumentation in the fuzzed/tested code to have guidance
in what direction the input should be modified to get more cases the code
covered.

Good example:
[http://lcamtuf.coredump.cx/afl/](http://lcamtuf.coredump.cx/afl/)

------
Animats
_In particular, widely used systems such as OpenSSL explicitly use
uninitialized data for the generation of entropy._

Uh oh. Another OpenSSL bug.

If you're running OpenSSL in a container or VM, there's a good chance of
repeatability.

~~~
userbinator
AFAIK it's not the only source of entropy, but gets mixed in with all the
other entropy sources its RNG uses.

------
coldcode
Given all the process, code reviewing and so many eyes on linux, I find it
amazing to think there are so many issues with this.

~~~
gue5t
It's statistically inevitable: many eyes also means many fingers, and you'd
need a _lot_ of nines on programmer perfection to avoid any use-after-frees in
a project as big and frequently-changing as Linux. This is why people are so
hype about memory safety recently--get a computer to do something perfectly,
rather than trying to push every human up the asymptotic slope to perfection
(when they have much more important things to spend their time on!).

~~~
milesrout
Except compilers are also big MLOC+ projects with their own bugs, and every
compiler bug is a potential source of memory errors.

Plus (at least in Rust) in order to ever do anything useful, you have to use
unsafe anyway, usually in the most complex bits of code that have the most
potential for subtle memory errors. Personally that's the biggest problem I
have with it: it seems like it helps avoid issues a lot, but only in the easy
cases where I'm very confident I can avoid them anyway. In the really tricky
subtle cases (or even fairly basic things like cyclic data structures, which
are _not_ niche) the compiler throws up its hands. Those, honestly, are the
cases where I want compiler help.

That's not to say I don't think it's useful. In fact, making all the easy
cases trivial does mean you can focus your effort, analysis, thought, design,
whiteboard space, etc. on the trickier cases. But it's not anywhere near as
useful as it is sometimes claimed to be.

~~~
vitiral
The easy cases... Like 99% of your entire project including checking for data
races?

Until we have perfect AI writing all our code, it's going to be pretty
difficult to fix the other cases. That's what code reviews and extensive unit
tests are for.

~~~
milesrout
If you're going to respond to my post with such trivial rubbish I'm not going
to respond to yours with anything other than this comment.

~~~
i336_
Hang on. That's not ideal.

Could you possibly go into further detail?

I feel that Rust is envisioned by its followers as the just-unveiled never-
before-known perfect solution to memory safety and related problems, when in
reality C and other languages continue to have similar issues _because the
issues themselves are not solved problems_. So I'm holding Rust at arm's
length until someone can thoroughly and objectively debunk the views that I
have.

But I'd love to be able to argue effectively. I agree with all the statements
you made two levels up. Can you provide further detail to counter the parent
comment?

~~~
gue5t
I think the problem you see in Rust evangelism is a conflation of overall
program correctness (aka "security"; any bug can be considered a security bug
in a contrived-enough threat model) with memory safety.

The C programs I encounter in the wild tend to be both insecure and not
memory-safe. The difference between regular bugs in memory-safe programs and
memory-safety bugs is that with a regular bug your program is still evaluating
the code you wrote; perhaps you forgot a permission check, or you made a silly
mistake in code that you didn't think was security-relevant but which a user
might be using to keep their house doors locked. These are frustrating, but
can usually be understood once found by established debugging techniques like
bisecting a codebase with a test case.

With a memory-safety bug, your program runs the buggy code and, if you're
unlucky, will diverge from the semantics you associate with its source code.
Whereas a C programmer thinks of "x->name = foo;" as making an assignment of a
struct field, if "x" is an invalid pointer, this code might write to an
unrelated integer variable, modify 4 or 8 bytes of a string buffer, cause a
segfault (this is nice, since we find out our program is buggy), alter a
function pointer (to produce a segfault muuch later in code which is itself
correct), or have any number of other effects.

This is much more hard to debug, and it's effectively impossible to mitigate
the impact on _your_ code of memory safety bugs _outside_ your code. If you're
working in a memory-safe language and call a function which you do not know to
be implemented correctly, the worst thing it can do is issue an unexpected
system call, which shows up very quickly in debugging. It can't mangle
unrelated variables to make the program crash in correct, bug-free portions of
code.

As noted, Rust is not, as a whole, memory-safe. Unsafe code can set up bombs
that will cause segfaults and other mischief in safe code, just like the
situation in C if we substitute 'unsafe' for 'incorrect' and 'safe' for
correct. But, in Rust, if you see misbehavior you do not have to audit your
entire codebase. You only need to audit the code explicitly marked as unsafe.
The vast majority of computation in Rust is outside of unsafe blocks, and
using unsafe in the middle of a complicated, subtle algorithm raises big
warning bells in code review.

With respect to "C and other languages [having] similar issues", I don't think
this is the case. Remote native code execution is a security issue only
present in memory-unsafe programs. We do see remote (high-level) code
execution vulnerabilities in programming languages with "eval"-like faculties,
but it's not hard to audit your code and all its dependencies for calls to the
built-in "eval", whereas auditing C code for memory safety is incredibly
difficult.

All of this is in some sense a consequence of the principle of least
privilege: we want to make the world more secure by avoiding having an
omnipresent "do absolutely anything" escape hatch, so that we can trust
modules in proportion to the privileges they need. Rust and memory safety
aren't a silver bullet, but I believe they're a cost-effective step toward
correctness.

The other half of Rust is that having algrebraic datatypes and a good
typesystem _helps you help yourself_ avoid many logic mistakes, and having an
expressive language with generics helps you write less code (which correlates
with having fewer bugs). This is why Haskell programmers like to claim they
see fewer bugs than other languages. It's much harder to make an objective
argument here, because availability of tools for preventing bugs isn't the
same as lack of bugs. There are C codebases with fewer bugs than Haskell
codebases; pure bug count depends on a multitude of factors in the development
process and only formal verification can prove your program to be correct
relative to its specification.

And then your specification still might not mean what you wanted it to mean!
There aren't any silver bullets, but Rust is a useful tool that you should
consider making part of your toolkit, and there's been a lot of work to try to
make it able to do all the things people do in C.

~~~
i336_
Thanks very much for this.

(Some bits have been elided for brevity and to avoid the HN comment length
limit; do note I understand the points you've made in spite of shortening some
parts)

> _The difference between regular bugs in memory-safe programs and memory-
> safety bugs is that with a regular bug your program is still evaluating the
> code you wrote; (...) [t]hese are frustrating, but can usually be understood
> once found by established debugging techniques like bisecting a codebase
> with a test case._

Right. (It's sad the only truly generic way to apply this approach is with
fuzzing.)

> _With a memory-safety bug, (...) it 's effectively impossible to mitigate
> the impact on your code of memory safety bugs outside your code._

Ooohh. I _see_ \- faulty image parsing letting you break out of Chrome, or
faulty font parsing letting you run amok in the NT kernel (yay!).

> _As noted, Rust is not, as a whole, memory-safe. (...) But, in Rust, if you
> see misbehavior you do not have to audit your entire codebase. You only need
> to audit the code explicitly marked as unsafe. The vast majority of
> computation in Rust is outside of unsafe blocks, and using unsafe in the
> middle of a complicated, subtle algorithm raises big warning bells in code
> review._

That makes sense.

> _With respect to "C and other languages [having] similar issues", I don't
> think this is the case. Remote native code execution is a security issue
> only present in memory-unsafe programs._

It's so sad the Burroughs-style architecture
([https://news.ycombinator.com/item?id=14073079](https://news.ycombinator.com/item?id=14073079))
fizzled out :(

> _We do see remote (high-level) code execution vulnerabilities in programming
> languages with "eval"-like faculties, but it's not hard to audit your code
> and all its dependencies for calls to the built-in "eval", whereas auditing
> C code for memory safety is incredibly difficult._

Unless it's JavaScript -
[https://news.ycombinator.com/item?id=14013155](https://news.ycombinator.com/item?id=14013155)

(Related: [http://stackoverflow.com/questions/42776357/can-i-
effectivel...](http://stackoverflow.com/questions/42776357/can-i-effectively-
use-transpilation-code-tokenzation-regeneration-a-vm-or-a-si))

> _All of this is in some sense a consequence of the principle of least
> privilege: we want to make the world more secure by avoiding having an
> omnipresent "do absolutely anything" escape hatch, so that we can trust
> modules in proportion to the privileges they need. (...)_

Right. On the one hand that [escape hatch mentality] gets you Symbolics Lisp;
on the other hand those Lisp machines never ran banks or multiuser systems :P

> _The other half of Rust is that having algrebraic datatypes and a good
> typesystem helps you help yourself avoid many logic mistakes, and having an
> expressive language with generics helps you write less code (which
> correlates with having fewer bugs)._

The only problem there is that the verbosity of the code can cause oldschool
hackers to flounder:
[http://esr.ibiblio.org/?p=7294](http://esr.ibiblio.org/?p=7294)

> _This is why Haskell programmers like to claim they see fewer bugs than
> other languages. It 's much harder to make an objective argument here,
> because availability of tools for preventing bugs isn't the same as lack of
> bugs._

Can't disagree there. Objectivity FTW.

> _There are C codebases with fewer bugs than Haskell codebases; pure bug
> count depends on a multitude of factors in the development process and only
> formal verification can prove your program to be correct relative to its
> specification._

> _And then your specification still might not mean what you wanted it to
> mean! There aren 't any silver bullets, but Rust is a useful tool that you
> should consider making part of your toolkit, and there's been a lot of work
> to try to make it able to do all the things people do in C._

Thanks for this, it was an interesting read.

If anything, Rust badly needs a rockstar _documentation_ team who can
canonically communicate "the state of _4 minutes ago!_ " and keep everyone
updated in real time. A herculean effort, most definitely, but it would ensure
people had correct ideas about the language from the start, which would be
cool.

I also think I have a theory as to why the parent commentator ("Except
compilers are also...") has the views they have.

Rust requires you to be extremely verbose in describing every last iota of
abstraction (and you have to have the right number of levels) - you can't just
start banging around in the kitchen with random bits of hardware like you can
in C, or hammer strings together like you can in Perl, or do Python's
"executable pseudocode" thing, etc. You have to be neurotically specific (it
feels like it when coming from the familiarity of another language that most
_decidedly_ does not feel like that).

I think that oldschool people encountering this see it as "too much work",
because for the level of effort required, they expect the compiler to do more
work for them than it's doing (for example, wrangling cyclic data structures).
And so they go "I have to do _all this_?! And I only get _that_ as a result?!"
and throw their hands up in the air and find it hard not to criticise.

So this folds back into the documentation problem - people are collectively
memoizing the "Rust is awesome!" from all the impressionable foghorn types.
The foghorns need to be turned down a little and the nuance made a little
clearer.

PS. I must admit that when I saw your "x->name" my brain immediately went
"wait, what's _x_ 's current value?" \- I'm still learning C (I understand the
basics, but I'm still slow with it in many areas), and I'm happy I
instinctively reacted like that :)

~~~
gue5t
Related to "Unless it's JavaScript", what I mean by "not hard" is that a
conservative static analysis can determine whether your code can possibly
reference the built-in eval function. A static analysis can't tell whether a
eval will be _run_ if it is present (halting problem, and there are escape-
hatches like filesystem+exec()) but I think it's very reasonable to expect
people to forbid them outright in high-assurance code.

Anecdotally, I've rewritten fair amounts of code from Go to Rust and it seems
like Rust takes about 60% the lines of Go, which is mostly due to library
reuse for control-flow constructs (higher-order functions rely heavily on
generics to be usable and not merely confusing footguns) and simplified error
handling. The biggest pain point when porting is usage of goroutines, because
these dictate the threading and communication architecture of your Rust code,
and unlike goroutines it isn't cheap to spawn large numbers of real threads.
This would be better with a coroutine library, but I haven't tried that
approach in a ported codebase.

I do think explaining the language clearly and precisely is a big challenge
and hope we can make it happen.

It's true that Rust forces you to be explicit and precise in many places C
doesn't, but coming from a long time of programming C I loved to see that Rust
was enforcing universally the guidelines that C projects slowly develop after
years of developer experience: explicitly state memory ownership (in comments,
in C), explicitly state which shared memory is protected by which locks
(again, in comments or variable names in C), call destructor functions for all
live local variables before returning from a function (done automatically by
the Rust compiler), and so on. Rust is, in many ways, shorthand for well-
written C. It makes you jump through hoops to express many sloppy C practices.

If you're just learning C, after you understand the basics I think the most
important thing to understand is the abstract machine around which the
language is defined; that's how you learn to reason about UB and which
optimizations and transformations compilers will or won't do, along with how
to guide the compiler to the translation you want. C isn't defined against a
single huge byte array of memory, and pointers aren't indices into such!

~~~
i336_
Sorry this reply took a bit long (if you see it at all) - had a bit of a
notification failure.

> _Related to "Unless it's JavaScript", what I mean by "not hard" is that a
> conservative static analysis can determine whether your code can possibly
> reference the built-in eval function. (...)

Oooh, like _that _. I get what you mean now.

> _Anecdotally, I've rewritten fair amounts of code from Go to Rust and it
> seems like Rust takes about 60% the lines of Go, which is mostly due to
> library reuse for control-flow constructs (higher-order functions rely
> heavily on generics to be usable and not merely confusing footguns) and
> simplified error handling.

Hmm.

> The biggest pain point when porting is usage of goroutines, because these
> dictate the threading and communication architecture of your Rust code, and
> unlike goroutines it isn't cheap to spawn large numbers of real threads.
> This would be better with a coroutine library, but I haven't tried that
> approach in a ported codebase.

I'm curious if this is because there's no consensus on _which_ coroutine
library to use yet. The example I linked last message
([http://esr.ibiblio.org/?p=7294](http://esr.ibiblio.org/?p=7294)) also whined
about Rust not having a select() primitive or equivalent functionality in the
core language - [https://github.com/rust-
lang/rust/issues/27800](https://github.com/rust-lang/rust/issues/27800)
remains open since Aug 2015 as of my writing this comment, but I don't know if
there are any standardized (ideally stdlib-level) solutions known outside of
this communication channel.

> I do think explaining the language clearly and precisely is a big challenge
> and hope we can make it happen.

Cool :D

> It's true that Rust forces you to be explicit and precise in many places C
> doesn't, but coming from a long time of programming C I loved to see that
> Rust was enforcing universally the guidelines that C projects slowly develop
> after years of developer experience: explicitly state memory ownership (in
> comments, in C), explicitly state which shared memory is protected by which
> locks (again, in comments or variable names in C), call destructor functions
> for all live local variables before returning from a function (done
> automatically by the Rust compiler), and so on.

That's fair. I get the impression Rust's vision is to be the go-to language
for low-level enterprise-scale (ie, long-term) stuff, while also being
accessible and suitable for hobbyist scripting. I can completely appreciate
that.

> Rust is, in many ways, shorthand for well-written C. It makes you jump
> through hoops to express many sloppy C practices.

That's pretty cool. :>

> If you're just learning C, after you understand the basics I think the most
> important thing to understand is the abstract machine around which the
> language is defined; that's how you learn to reason about UB and which
> optimizations and transformations compilers will or won't do, along with how
> to guide the compiler to the translation you want.

I've only very recently started to notice this and grasp the importance of the
difference between the hardware and what C abstracts that hardware as; thanks
very much for the recommendation, I'll focus more closely on it. On that note,
I find it remarkable how platform-agnostic yet hardware-accessible the C
abstract machine _is_ ; I'm not sure whether to be mildly scared at the
engineering ingenuity of the language or merely duly impressed that the
developers just happened to stumble upon a really good design :)

------
dpramone
I think I'm gonna drop by here more often. Interesting pdf.

------
erik1909hp
There always has to be a core group of knowledgeable people either developing
or maintaining submissions to them. It's just the nature of the game for
highest quality/reliability/security. So far, this has produced almost
entirely proprietary or academic solutions with the FOSS ones not receiving
nearly the contributions they need. You can't just scratch an itch and improve
something that's nearly NSA-proof most of the time.

