Hacker News new | past | comments | ask | show | jobs | submit | adr_'s comments login

I think Google already publicly disclosed the code to trigger this vulnerability: https://chromium.googlesource.com/v8/v8/+/fd29e246f65a7cee13...


For my part, I loved the long form, in-depth post, and I learned a lot. I'll admit breaking it up visually with some diagrams or photos is tempting (maybe a photo of your serial setup?) but I felt the explanations were all clear without it.


Ah, that's a good point. I actually need to port over a Hugo shortcode to handle little image boxes for this kind of thing; once I have that it probably makes sense to add a little photo of my setup as an aside, not so much as an explanation but rather for visual interest.


Clang 11 hasn't been released yet, right?


Right. But we've also observed non-determinism / undefined behavior in Clang 10:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=246630#c26

  ==120363== Conditional jump or move depends on uninitialised value(s)
  ==120363==    at 0x1634474: llvm::ConstantExpr::getGetElementPtr(llvm::Type*, llvm::Constant*, llvm::ArrayRef<llvm::Value*>, bool, llvm::Optional<unsigned int>, llvm::Type*) (Constants.cpp:2191)
  ==120363==    by 0x112D6D9: getGetElementPtr (Constants.h:1163)
  ==120363==    by 0x112D6D9: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1005)
  ==120363==    by 0x112DF70: (anonymous namespace)::ConstantFoldInstOperandsImpl(llvm::Value const*, unsigned int, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1039)
  ==120363==    by 0x112C165: (anonymous namespace)::ConstantFoldConstantImpl(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*, llvm::SmallDenseMap<llvm::Constant*, llvm::Constant*, 4u, llvm::DenseMapInfo<llvm::Constant*>, llvm::detail::DenseMapPair<llvm::Constant*, llvm::Constant*> >&) [clone .part.0] (ConstantFolding.cpp:1114)
  ==120363==    by 0x112C5CF: llvm::ConstantFoldConstant(llvm::Constant const*, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.cpp:1194)
  ==120363==    by 0x188F410: prepareICWorklistFromFunction (InstructionCombining.cpp:3584)
  ==120363==    by 0x188F410: combineInstructionsOverFunction(llvm::Function&, llvm::InstCombineWorklist&, llvm::AAResults*, llvm::AssumptionCache&, llvm::TargetLibraryInfo&, llvm::DominatorTree&, llvm::OptimizationRemarkEmitter&, llvm::BlockFrequencyInfo*, llvm::ProfileSummaryInfo*, unsigned int, llvm::LoopInfo*) (InstructionCombining.cpp:3703)
  ==120363==    by 0x189205F: runOnFunction (InstructionCombining.cpp:3789)
  ==120363==    by 0x189205F: llvm::InstructionCombiningPass::runOnFunction(llvm::Function&) (InstructionCombining.cpp:3768)
  ==120363==    by 0x16F4352: llvm::FPPassManager::runOnFunction(llvm::Function&) (LegacyPassManager.cpp:1482)
  ==120363==    by 0x16F4DE8: llvm::FPPassManager::runOnModule(llvm::Module&) (LegacyPassManager.cpp:1518)
  ==120363==    by 0x16F51A2: runOnModule (LegacyPassManager.cpp:1583)
  ==120363==    by 0x16F51A2: llvm::legacy::PassManagerImpl::run(llvm::Module&) (LegacyPassManager.cpp:1695)
  ==120363==    by 0x1FF4CFE: EmitAssembly (BackendUtil.cpp:954)
  ==120363==    by 0x1FF4CFE: clang::EmitBackendOutput(clang::DiagnosticsEngine&, clang::HeaderSearchOptions const&, clang::CodeGenOptions const&, clang::TargetOptions const&, clang::LangOptions const&, llvm::DataLayout const&, llvm::Module*, clang::BackendAction, std::unique_ptr<llvm::raw_pwrite_stream, std::default_delete<llvm::raw_pwrite_stream> >) (BackendUtil.cpp:1677)
  ==120363==    by 0x2C471A8: clang::BackendConsumer::HandleTranslationUnit(clang::ASTContext&) (CodeGenAction.cpp:335)
  ==120363==  Uninitialised value was created by a stack allocation
  ==120363==    at 0x112C653: (anonymous namespace)::SymbolicallyEvaluateGEP(llvm::GEPOperator const*, llvm::ArrayRef<llvm::Constant*>, llvm::DataLayout const&, llvm::TargetLibraryInfo const*) (ConstantFolding.c


I’m curious, does clang promise to compile deterministically? (This is inspired by your bug but not directly relevant here, it seems?)


Non-deterministic compilation would be pretty bad.

First, producing different outputs from the same input brings the question of caching tools (e.g ccache, FastBuild), who assume the compiler is a pure function of its inputs.

Moreover, assuming your code and the compiler are correct, you might still end-up with a situation where the performance of the resulting binary differs depending on the planet's alignment at build time.

Worse: when the input code is wrong (which does happens when you're writing new code and trying it on your machine) : you build your code locally, and you're in "luck", as the compiler generates binary code "that won't crash". So you push your modifications, and then you start getting complaints from your coworkers, because they pulled your commit and now they're getting crashes. At this moment you don't know what's happening yet, so you might even tell them "it works on my machine, did you try to rebuild all?". And this might appear to solve the issue, if this time your coworkers are "lucky"!

Finally, let's suppose your code is correct, but the compiler has a code generation bug. Have you ever tried, as a user, to diagnose a compiler bug? You spent many hours trying to minimize the input file that triggers the bug, so it's executable without needing the rest of your project, so you can send it to the compiler devs. I wouldn't even try to do this if I knew the compiler was non-deterministic.

In short, non-deterministic compilation is an invitation for trouble and confusion.


I always thought that big compilers (clang/gcc/msvc) have some degree of non-determinism due to myriad of optimization passes in combination with heuristics deciding when and where to use certain kind of optimization. Is that true or not?

And certainly if you define deterministic compilation as always producing the same binary, this is already broken by compiler macros like "__DATE__" and randomly generated names during link-time optimization. [1]

[1] https://blog.conan.io/2019/09/02/Deterministic-builds-with-C...


It's not true. They are usually deterministic for the purpose of reproducibility. The same compiler version, flags, and sources should produce the same binary.

Yes, __DATE__ in sources can break reproducibility. That does not mean compilers get carte blanche to be nondeterministic.


I would think that profile-guided optimization (PGO) also makes compilation nondeterministic (unless the profile is stored for reuse.)

https://en.wikipedia.org/wiki/Profile-guided_optimization

I've been enabling PGO when building Python lately and I imagine the resulting binaries are a little different every time due to random events during profiling.


Except that gcc and clang are nondeterministic, to an extent

Sometimes the compiler needs to generate a random value and base part of the compilation on that value. Thins like trying to predict which is the best branch, or things done at compile time.

A lot of work has been done to reduce the nondeterminism, and some of it can only be reduced by using things like "-frandom-seed=$your_git_commit" for example

build determinism also goes deeper than that, for example static libraries are archives that include the date of the archive creation, and so on

The simplest programs might generate the same hash, but don't expect all code to generate the same-hash binary by default


> Sometimes the compiler needs to generate a random value and base part of the compilation on that value.

As a compiler engineer with experience (among others) in LLVM and GCC this is the first time I'm hearing of this. Could you provide more details or a source?

I can't imagine where such behavior would be useful, let alone required. The only slightly plausible scenario I can think of would be representing some internal data structures as hash tables with random seeds to avoid denial-of-service attacks. But then the compiled code would still have to rely on, at some point, picking an arbitrary element out of such a hash table. I can't think of contexts inside a compiler where this would be a useful thing to do.


No experience here, but from the manpage:

-frandom-seed=string

  This option provides a seed that GCC uses in place of random numbers in generating certain symbol names that have to be different in every compiled file.  It is also used to place unique stamps in coverage data files and the object files that produce them.  You can use the -frandom-seed option to produce reproducibly identical object files.
though my example/guess on branch prediction is probably wrong


Ah, thanks. Looks like this only applies to symbol names, so the generated executable code should be deterministic, only symbol table entries might differ.


> The simplest programs might generate the same hash, but don't expect all code to generate the same-hash binary by default

Debian and others have put quite a lot of work into reproducible software builds:

https://wiki.debian.org/ReproducibleBuilds#Even_more

This of course only works if the compiler cooperates.

The bug linked earlier is a regression in Clang 10. Clang 9 was deterministic for the same file, flags, etc.


> First, producing different outputs from the same input brings the question of caching tools (e.g ccache, FastBuild), who assume the compiler is a pure function of its inputs.

Why should they have to? Shouldn’t they just be able to reach for any valid compilation of this particular object file and slot it in?

> Moreover, assuming your code and the compiler are correct, you might still end-up with a situation where the performance of the resulting binary differs depending on the planet's alignment at build time.

This is already the case due to your environment. If you have the wrong number of environment variables you might penalize your program’s performance by a significant amount already just because you misalign the stack!

> At this moment you don't know what's happening yet, so you might even tell them "it works on my machine, did you try to rebuild all?". And this might appear to solve the issue, if this time your coworkers are "lucky"!

This sounds like the situation already with nondeterministic bugs like races, albeit with the same binary?

> Have you ever tried, as a user, to diagnose a compiler bug? You spent many hours trying to minimize the input file that triggers the bug, so it's executable without needing the rest of your project, so you can send it to the compiler devs.

I deal with nondeterministic programs all the time…they’re a bit more difficult to file bugs for, but it’s still possible.


Nope, 10.0 was just released recently.


Why is OSSFuzz using such a bleeding edge compiler? That seems a little nuts.


Wouldn't you rather catch bugs before they're released in a stable version?


It is unfair to the authors of the software that is actually tested, in this case SQLite.

You are forced to investigate, otherwise people will attribute the bug to your software.

Toolchain bugs take an amazing amount of time and energy and happen more often than people think.


Exactly. This is precisely the point of nightly builds, is it not?


Clang 11 is still in early development stages. Release date is several months away. Clang 10 was released just a couple of months ago. 11 is expected to be buggy and not fit for use yet.

The SQLite devs now have to deal with "is it or isn't it a compiler bug" nonsense, taking their time away from fixing actual problems, working on features etc, from OSSFuzz deciding to use a compiler that the compiler devs themselves don't think is fit for use.

How much trust can you have that even fuzz results exposed are actually legitimate either? False positives, or worse still false negatives?


If you're going to go down that route, I would expect that they test using both the latest stable version and the whatever unstable version they want. Bugs found using the stable compiler should be reported to the project, while bugs found only using the unstable version should be reported to the compiler.


Hydrolysis is chemical breakdown via reaction with water. This is electrolysis. But it's a popsci article, so they put that in the subtitle. "Splitting" makes sense to people who don't use terms like hydrolysis and electrolysis frequently.


Oops. I stand corrected. It's been a while since I've used hydrolysis and electrolysis.


https://code.google.com/p/google-security-research/issues/li...

They haven't published Cisco bugs, but they do have a pretty broad scope.



A similar technique used to build the payload, but the linked paper does have a more sophisticated technique for setting the target's filename arbitrarily, without having to somehow craft a link with a "download" attribute on a target website.


It is a fun mental exercise. I love this stuff, and that's the motivation.

I once wrote an emulator for a 4 bit microprocessor in Befunge (a 2D esoteric programming language). Then, I was definitely showing off something that really does not matter. 100% useless.

This is a little different. The motivation is the same, but it also proves that you cannot sanitise Javascript by removing letters or words. It's very easy to assume that such sanitisation works, and such an assumption can be a security-critical mistake. I've actually read this article before because I needed to solve such a problem.


Unsigned integer overflow is defined.

I agree that the carry flag shouldn't be exposed directly, but the ability to express "add and check for overflow" would be useful and allow for these optimisations where the architecture supports it.

Edit: But if you want to exploit the processor to its fullest, go with assembly. The compiler is good, but proving optimisations is a hard problem, and complex, single-instruction-specific optimisations aren't going to take off any time soon.


If the NSA is working with Intel, they're not going to bother with an RNG... The processor is the most trusted part of the computer security model - why would you choose bad random numbers as your attack vector?

Relevant talk: Hardware Backdooring is Practical - Jonathan Brossard https://www.youtube.com/watch?v=j9Fw8jwG07g


I wouldn't call it more verbose.

AT&T:

    sub    $0x8,%rbx
    callq  *%rax
    mov    (%rbx),%rax
Intel:

    sub    rbx, 0x8
    call   rax
    mov    rax, [rbx]
Intel can be written with:

    mov    rax, QWORD PTR [rbx]
But it's redundant and assemblers don't expect it. It's only necessary in a handful of places to avoid ambiguity, as opposed to the incessant size suffixes and $/% prefixes, which make AT&T feel more verbose to me. Definitely a matter of familiarity, though.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: