Mistrust of commercial solutions does not translate into trust for open-source ones. Have you audited the crypto code of all your packages? Would you even know how?
Exactly. Even more interesting, all of the source code can be OK and just some subtle configuration tweaks can be enough to compromise you. Or just some build flag that you don't even see in sources. Often you don't know the build flags of every binary as soon as you use binaries. You also don't know if the compiler is tweaked to do some preprocessing you don't know about (see Reflections on Trusting Trust by Ken Thompson):
For security conscious the prefect state is the OS which changes very, very slowly, fixing only security bugs and having binaries used by as many people as possible and which change so seldom that more people can even check them by disassembling them. You don't want to only check sources, you want to disassemble the binaries and decide if they match the sources.
And only then you want to be sure that all configurations are what they should be. Not easy at all.
This only works if you are building things yourself or trust the group building things, of course, but it's way easier than audit by disassembling binaries.
Disassemblers produce assembly code, not the HLL code, so they are many orders of magnitude easier to write from the scratch than modern compilers. They typically expect human involvement as soon as there's non-trivial assembly-level engineered self-modifying code. Hopefully there's no much of such code in the results of the compilers we use.
Also if you check the whole discussion you'll see I already discussed Ken's work.
Ok, I appreciate this information (and I'm trying to follow the discussion but I didn't see you talking of Ken's work).
But I'm still curious; even though you can write the disassembler by hand, how can you be sure that you're compiling it with a non-compromised compiler? Or do you mean write it in e.g. ELF format directly (and that's assuming the OS isn't involved in filtering offending code, though it seems extraordinarily unlikely that the OS could be generally modified in such away without detection)?
The more general and diverse the tools you use, the less likely they are all compromised in the same way, and the more likely any compromise will show up in other contexts. Using tools at different meta-levels may also be worthwhile (machine-code vs. interpreter).
> Mistrust of commercial solutions does not translate into trust for open-source ones.
Well, how well can you trust the commercial ones ? At least with open source, you can look into it more easily and eventually find security holes. It's a step towards trust. There is no trust to gain with commercial solutions, but at least with open source, it's at least possible.
The fact that proprietary agrees with a sound market economy makes it somehow more functional and more attractive, but when you're concerned about ethics, it's a totally other concern.
how well can you trust the commercial ones ? At least with open source, you can look into it more easily and eventually find security holes. It's a step towards trust. There is no trust to gain with commercial solutions, but at least with open source, it's at least possible.
Ever heard of reverse engineering? It turns out you'd need even that approach even with open source as soon as you use binaries you haven't compiled yourself. And you'd have to verify the compiler and your disassembler that way too. It's all possible, but requires more than it's currently being done, at least on the level of the stuff openly available.
And even if you manage to verify everything you have to check the computer. Modern computers be it servers or notebooks start to have BIOS-es that can even phone home and allow remote access without your control (having the keys which you can't control!).
"Ever heard of reverse engineering? It turns out you'd need even that approach even with open source as soon as you use binaries you haven't compiled yourself."
This is true: reverse engineering can be used for verification, but it's a whole lot more work than inspecting source.
"And you'd have to verify the compiler and your disassembler that way too."
Am I missing something: does it mean that to verify the compiler with DDC you need a trusted compiler that always produces the same binary output as an untrusted one, so to verify GCC you need a compiler that duplicates the whole GCC functionality but is trusted? What is practicality of that approach? Proving that "hello world" produces the same output doesn't prove that the crypto functions wouldn't be patched?
Please a specific example of what would be needed to prove GCC and LLVM now.
EDIT: I'm not interested in toy compiler and theoretical pie-in-the-sky examples, I want to know how practical it is for the systems in real use. GCC and LLVM as they are now please. If the proposition is "suppose that we have something that can compile gcc sources and we trust it" tell me what is that, does it exist and how hard would it be to make it. Don't talk to me about your experiment where you change one line in TTC and then prove it's changed by comparing the binaries.
The idea is to take one compiler source (S), and compile it with a diverse collection of compilers (Ck being a compiler in C0-CK), producing a diverse collection of binaries that are compilations of S: (Bk = Ck(S)). Because the different compilers are almost certainly not functionally identical, the various Bk should not be expected to be bitwise identical. However, because they are compilations of the same source, they should be functionally identical, or one of the original compilers was broken (accidentally or deliberately). So now we can compile that original source with the Bk compilers, and because these compilers are functionally identical, the results (Bk(S))should be bitwise identical. There is certainly some chance of false positive, due to bugs in the Ck compilers or exploitation of undefined behavior in S, but if you do get the same output (Bk(S)) from all of the (Bk) compilers then you can be pretty confident that there is no Trusting Trust style attack present: exceedingly so, when the various compilers have diverse histories so that it's exceedingly unlikely that all Ck compilers contain the same attack. If there are any differences, you can manually inspect them to determine what the issue is and either issue a bug report to the appropriate compiler, change the source (S) to avoid undefined behavior, or notify people of the attack present in the compiler in question, depending on what you find. This does involve some binary digging, but quite targeted compared to a full audit and it may well not be necessary at all.
Obviously, if you do have a trusted compiler, including it in the mix is great, but the technique doesn't rely on this, nor on any two compilers returning the same binary output except when they are compilations of the same source.
Please explain which exact steps and which assumptions would be needed to have a trusted GCC 4.8.1, both gcc and g++ and then keeping it trusted as the new releases appear.
I don't know enough about the details of the build dependencies for any of these projects to give exact steps. To get a known-clean build (that is, a build guaranteed to match the source) of GCC 4.8.1, plug the GCC 4.8.1 into the procedure I gave above:
In case it wasn't clear, k is used for indexing, and I use "function application" f(x) to mean compilation of x by compiler f.
"Take one compiler source (GCC 4.8.1), and compile it with a diverse collection of compilers (Ck being a compiler in { C0 = GCC 4.8.1, C2 = LLVM, C3 = icc, C4 = visual c/c++, ...}[1]), producing a diverse collection of binaries that are compilations of GCC 4.8.1: (Bk = Ck(GCC 4.8.1)). Because the different compilers are almost certainly not functionally identical, the various Bk should not be expected to be bitwise identical. However, because they are compilations of the same source, they should be functionally identical, or one of the original compilers was broken (accidentally or deliberately). So now we can compile that original source with the Bk compilers, and because these compilers are functionally identical, the results (Bk(GCC 4.8.1)) should be bitwise identical. If there are any differences, you can manually inspect them to determine what the issue is and either issue a bug report to the appropriate compiler, change the source (GCC 4.8.1) to avoid undefined behavior, or notify people of the attack present in the compiler in question, depending on what you find. This does involve some binary digging, but quite targeted compared to a full audit and it may well not be necessary at all."
Likewise for any of the others, but note that once you've got a known-clean build of any (sufficiently capable) compiler you could use it to build known-clean builds of the others.
[1] the more compilers and the more diverse the background of the compilers, the better; it may well be worth using quite slow compilers that are proven correct and/or implemented in other (possibly interpreted) languages for a high degree of confidence.
One of the most useful forms of diversity is the "my opponent does not have access to time machine" defense. e.g. use some C compiler for amiga, or 1980's DEC unix, or whatever to bootstrap gcc3 for windows, and use that to bootstrap clang for linux, etc. The odds that hardware and binaries you've had for 30 years could carry a trojan that successfully applies to a compiler that was not written yet, for an architecture that was not designed yet, inserting a trojan for yet another such pair, seem low. Feel free to follow more than one such path if paranoia dictates. When you arrive at the end (some compiler, built with itself), the binaries should all match however you got there, presuming no undefined behavior in the compiler itself. If there is something, fix it.
And better yet if this chosen starting point(s), being old, are also small and simple.
I mostly agree, although careful about cross contamination if you're intending to actually use DDC - clang bootstrapped by gcc3 is not going to be independent of gcc3.
You're not giving a useful procedure for me. Let's say that only Gcc can compile itself and its own libraries (e.g version n-1 can compile version n). How can I make trusted GCC 4.8.1 if other compilers won't compile the sources of GCC? Do you agree that I have to implement all the features of GCC used in the sources of GCC in one or more other compilers? If not, don't I have to have a trusted GCC from the start? And if I have such GCC, then I don't need other implementations anyway?
I am not sure if gcc was able to compile itself always, but if it was, you can argue that there existed a smallest kernel of gcc sometime ago that did not depend on any of the "features" of gcc that makes it impossible for other compilers to compile gcc. Now, if there existed such a thing before, it probably exists now, because the incremental "features" that make it impossible for other compilers to compile gcc, would make it impossible for gcc too. My bet would be that there exists a logical separation somewhere, and there is still a small kernel in it, that you can bootstrap with other compilers, from which point you can do what your parent says.
You do need other compilers that can compile the GCC source. These do not need to be trusted, just diverse in origin so that they are unlikely to contain the same attacks.
If GCC is in fact the only thing that can compile GCC, then you cannot use DDC to get a trusted version of GCC.
Yes, you're missing something unfortunately. The author apparently states it several times, but many people must miss it in reading.
"I say it in the ACSAC paper, and again in the dissertation, but somehow it does not sink in, so let me try again.
Both the ACSAC paper and dissertation do not assume that different compilers produce equal results. In fact, both specifically state that different compilers normally produce different results. In fact, as noted in the paper, it’s an improvement if the trusted compiler generates code for a different CPU architecture than the compiler under test (say, M68000 and 80x86). Clearly, if they’re generating code for different CPUs, the binary output of the two compilers cannot always be identical in the general case!
This approach does require that the trusted compiler be able to compile the source code of the parent of the compiler under test. You can’t use a Java compiler to directly compile C code."
Open source was thought to sweep away for hidden code, I really doubt GCC or other compilers has that special code that is reproduced each time you recompile a compiler with it.
If there was such self-reproducing code in a compiled GCC, it would be quite easy to find. There are many eyes looking at a program like GCC.
And even with such a conspiracy theory, which is still possible, open source has better margin than proprietary. It's not perfect, but it's much more transparent if you get what I mean.
True, you cannot audit msft code. Have you checked on the size of the linux kernel lately? Yes I could audit the source, but in reality only a select few will have deep knowledge of only small parts of the code
I for one have not done so and would not know how. However, I'd like to so hopefully some wise hacker will respond with recommendations here. IMHO the best thing that could come out of the Snowden leaks is a rallying cry for an explosion of crypto/privacy advances in the FOSS community.
The bug was introduced in September 2006. Discovery published May 2008. Affected: the most popular Linux distribution, all the keys generated on it in that period. Scary.
Moreover, the bug was not found by reading the source code. The keys generated by all the existing system were analyzed. If I remember, only the keys generated by mentioned Linux distros stood out (and some hardware devices using customized firmware or poor implementations). Windows and OSX weren't there.
But the odds it will be found (and publicly acknowledged) is higher than with closed-source software. Availability of the source is not a substitute for audit and care, but is helpful and you're not guaranteed audit or care with closed solutions.
The mentioned bug was not discovered by reading sources. The sources were available for one and half years and were used for the most popular Linux distributions. What can we expect for less popular ones then?
I'm not saying that it's better to have closed source, even if we can discuss that too when we consider how often the changes are introduced (for security: the less often the better provided the start is good enough) I'm saying that just believing something is secure simply because "it's open source" is pure hand waving.
Paying someone to audit source I have available to me is going to be cheaper / easier than paying 1) the one group that has access to the source to audit it (in which case I still have to trust them), or 2) paying someone to audit binaries through disassembly.