Hacker News new | comments | show | ask | jobs | submit login
Clang (LLVM C compiler) builds a working Linux kernel (gmane.org)
188 points by zdw 2553 days ago | hide | past | web | 59 comments | favorite

That's quite the milestone to achieve. I'm actually really happy to see this happen in light of the negative rap C seems to be getting just about everywhere.

C is far from dead, it may have its warts but it is still very much in use in all kinds of places and a solid knowledge of C has never hurt anybody (as far as I know).

Being able to compile the linux kernel and to get it to boot is a non-trivial exercise in GCC compatibility, I'm very impressed with this.

I wonder if there ever will be an official 'blessing' by the kernel devs of the LLVM C compiler.

I don't see why not in the long run if the devs bend over backward to maintain GCC compatibility on x86. It is never a great idea to have a single point of failure even if that point is the wonderful and awe-inspiring GCC. It is amazing to think that nearly 20 years after conception Linux maybe be built with an alternate tool chain. This is an incredible feat that has far-reaching consequences for the health of the ecosystem. Maybe Linus wasn't keen on the idea but if some people step up and bake in the support it'll be de facto blessed.

Slightly off-topic, does anyone know if the Gold linker works with Clang?

Gold should work.

ICC has been able to compile Linux for awhile, I think.

It's unlikely that it'll get that 'blessing' - a while back certain versions of GCC were recommended against by Linus.

Also, on some esoteric platforms, GCC dropped compatibility. For example, OpenBSD still supports an older version of GCC for VAX and m88k platforms IIRC.

The Linux kernel is explicitly built using GNU extensions, which means that in addition to implementing the C standard, they have to deal with kernel-specific issues as well as whatever ideas GCC has. Clang as a work-alike for GCC will be/is incredibly useful and an incredible accomplishment.

While being able to compile Linux is definitely helpful and a good sign of compiler maturity, the kernel developers target GCC and I'm not sure that will ever change. This (and similar projects) will probably somewhat limit Clang adoption as people will always need to keep GCC around rather than relying on the portability of using standards. :(

Hopefully the competition Clang brings to the table will push at least some open source projects to be a little less reliant on GNU.

On the other hand, Apple is pushing LLVM very hard — Xcode is growing more and more integrated with it in each release — and I would not be surprised to see Clang declared the one true Mac and iOS compiler while GCC is shown the way out. So I think we might have some healthy competition on our hands.

What's wrong with relying on GNU? It's Free software that will love on as long as the concept does. You can be assured that GNU will never have the rug pulled out from under it by a corporate affiliate, anyway.

I'm a hacker. This means I tend to have a lot of strong opinions about a lot of things. There are things I don't like about GNU, there are things I do like about GNU. The same can be said for Linux, Clang, LLVM, Apple, and a long list of other things. I mostly try to ignore my personal opinions, because I'd rather code than argue about epolitics.

I always get a chuckle out of the knuckleheads claiming "C is portable." Even ignoring all the libc and type size assumptions, there are so many "portable" programs out there that won't compile on anything but GCC because the extensions are on by default and people sometimes don't even know they're using them. This gets real amusing when a new version of GCC decides to deprecate them (one example of this I ran into recently was lvalue casts).

I don't think he meant there's anything wrong with it, only that clang/llvm might not feel like implementing certain GNU C extensions.

Besides, pluralism is always good, right?

Nothing's wrong with GNU; the problem is when it becomes a de-facto standard. Having more than one compiler available will improve both as they compete, and it keeps code more time-proof and gives developers and users more options.

A lot of people have been making this claim in this thread, but I'm a little bit skeptical. For one thing, I'm skeptical of the idea that competition always improves products in a commercial setting (though it surely does under some circumstances). But more importantly, I think there is a significant disanalogy between Free software projects and businesses that sell products. Businesses compete for customer dollars. What are Free software projects competing for? Users? Credibility? I'm not sure. But it certainly doesn't seem like they have the same incentives in place. People write code for FS projects because they're interested in it, because they need the functionality, etc. -- it's something they would do even if their efforts don't attract a lot of "customers."

What FS projects do compete for is the time and interest of talented programmers. For that reason, having competing projects can be more harmful to them than good, because it divides the pool of programmers: more programmer time is spent achieving common goals (like having an excellent C compiler) than would be spent if the different groups pooled their efforts.

(This is not, of course, an argument that competing FS projects are always a bad thing. Obviously, competing projects, forks, etc. arise for a variety of reasons, both social and technical. But I am arguing that just competition is not necessarily a good thing, either, especially when competing projects have common goals.)

Competing for the time and interest of talented programmers means being well-written, useful and interesting. Clang is more of a library than just a compiler, so it doesn't have the exact same goals as GCC. I think this is a situation where competition is a good thing- Clang will provide new, useful tools and GCC can try to improve on them or try to offer some other kind of advantage.

With all the flak Apple has taken over dropping its commitment to one "-VM," the JVM, I'd just like to point out Apple is the primary corporate sponsor of LLVM (I understand). How many plates must they keep in the air? This one is flying pretty high.

I'm a 19 year old college student, actually. I have no affiliation with Apple, nor a formal relation with Clang or LLVM.

Sorry, but your HN bio is empty and you didn't submit the article. Did you spearhead the effort? And is my comment regarding Apple and Clang incorrect? Thanks!

I'm not sure why he was downvoted. He appears to the author of the linked email. You can see at the bottom of the mail, he signed of as "Bryce Lelbach aka wash".

Sure Apple is one of the bigger sponsors of LLVM but LLVM started out as an academic project and even now lots of academics contribute to it. Apple hired the guy who originally wrote LLVM and now uses it a lot. But many other companies also utilize LLVM and contribute back to it (though not as heavily). For example, some AMD compiler folks I know said they were going to commit some changes to LLVM (not sure if they did). Adobe also uses it AFAIK for some specific tools.

However, LLVM and JVM dont have much in common. JVM is popular more for its baked in libraries than its instruction set or VM efficiency (though HotSpot is a darned good VM too).

The LLVM project is one of those open-source project which originally aimed to do some great things and eventually found out to do even greater.

FWIW, I use clang over gcc (unless there is a major overriding reason to use gcc) just for the fabulous error messages.

http://clang.llvm.org/diagnostics.html gives some good examples, and my experience has been that the examples given on that page are pretty spot on.

Why would one want to do this (aside from it being a cool project)? What advantages does clang have over gcc for kernel hacking?

Clang's diagnostics and static analysis? Take, for example, https://patchwork.kernel.org/patch/36060/, a serious problem in the kernel that was recently discovered. Given the size of Linux, it would be highly inefficient to manually find and fix every instances of this issue.

Currently, Clang doesn't support the work-around option that GCC provides to prevent the aforementioned issue. I could just implement the GNU work-around, but with Clang, it's far easier to write a scanner which will identify every place in the Linux source code where those dangerous semantics appear.

Clang might not be mature enough to compile Linux for distribution. The difference between GCC and Clang is that Clang is not a compiler. Clang is a modular API that provides the tools to build C language front-ends to the LLVM compiler infrastructure. The Clang compiler driver is just one implementation of an application built using the Clang libraries.

Why would one want to do this? Well...





Think outside of the box :)

I'm pretty sure that bug was the canonical example of the usefulness of Coccinelle in kernel development. This semantic patch was used to find and fix all the instances of it:

    // Copyright: (C) 2009 Gilles Muller, Julia Lawall, INRIA, DIKU.  GPLv2.
    type T;
    expression E;
    identifier i,fld;
    statement S;
    - T i = E->fld;
    + T i;
      ... when != E
          when != i
      if (E == NULL) S
    + i = E->fld;
See http://coccinelle.lip6.fr/impact_linux.php for more.

Oh. Well, I guess they beat me to the punch.

It may make certain bugs more apparent.

If you have development going on with a different compiler and it's actively maintained to work on both compilers then overall you'd get similar advantages in your code base to those that write cross-platform software. Incidentally, you get the same effect with cross-platform as you're usually targeting different compilers.

Also, with less compiler specific code it will be easier to get the kernel working with other compilers, perhaps ICC.

Unfortunately, Clang is still a bit shaky at times. In our (C++) project, we get a miscompilation in Clang that we don't get with gcc, the visual studio compiler, nor the intel compiler. It will be great when it is more stable: clang cuts the compile time by half in comparison with gcc (from 18 minutes to 9 minutes on my computer).

I do a lot of C++ template metaprogramming, so I end up running into 3-5 Clang internal compiler errors (In my circles, "ICE"s) a week, if not more. With GCC I maybe run into 3-5 a month.

The difference is that I can pinpoint the error. Clang dies gracefully, with a stack trace, source file location, and, for parse errors, the tokens that are currently being parsed. The problem might not always be easy to fix, but I never have to waste time trying to find it.

If you want a more stable Clang, then build one. ;)

File a bug report. They are very receptive of them. Especially if you get a miscompilation.

I hope to do so, I really like the project and want it to succeed. Unfortunately, I'm not yet sure how to reduce the 100000 lines to something manageable with the problem clear :)

Do you suspect the bug to be in clang (the front-end) itself, or in LLVM's optimizer or back-end? For the latter cases, bugpoint is included in the LLVM source tree and does a good job of narrowing things down, so long as you provide a program whose output should be deterministic (this may involve some work on your part if your program usually depends on data from the outside world). It is described here http://llvm.org/docs/Bugpoint.html . It may be useful for Clang bugs too, I'm not sure.

People may help you by suggesting strategies for finding a small test case that tickles the same problem if you post to the mailing list here http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev . I've posted a couple times and found people surprisingly responsive and helpful.

That looks like an interesting tool, I'll have to try it out. The program is deterministic given a fixed seed.

The bug depends on whether or not optimization is on, so it might be in the backend. What I've heard is that the C++ FE in particular still does have some troubles with miscompilations, so that is why I suspect it.

Clang is really good at C and Objective-C. C++ is kind of its weak point, most likely because C++ is a couple of orders of magnitude more complex than either of those languages.

I was actually surprised that it could compile our stuff at all, and that it runs mostly all right. We do use a lot of the language, and we need at least gcc version 4.2 for example.

(disclaimer: I can't answer this with authority)

The reasons I get excited:

* speed: faster compilation

* research: interesting code introspection you can't get from GCC.

* proof of concept: if you can build the linux kernel, you can probably build almost any C project

* edit: as mentioned in other comments, competition is good.

This isn't a "hey, go do this" post. This is a milestone in the development of Clang.

Of course. I was just curious if this had any other value.

In general having 2 compatible implementations with little or no shared-code base allows for more easily finding bugs in your code. A lot of times bugs won't show up in testing because one compiler just happens to mask them. This isn't always a problem, since the bugs may stay masked. However, in a changing codebase they are likely to pop up in the future.

I think that http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-re... gives a number of nice features that clang brings to the table which could be very useful for kernel hackers.

I am very happy. I was very pleased when back in may 20 2010 they announced that clang passed its first fully-successful Boost regression test run. But this is another big step ahead.

The clang static analyzer is now part of my everyday life as a programmer and everytime I read "did you mean..." on the console I am amazed as the first time. It saves me a lot of time (even compile time.)

It looks like this was targeting an x86_64 version of the kernel. For a good reference on LLVM-supported backends, see this table: http://llvm.org/docs/CodeGenerator.html#targetfeatures

Clearly, x86 gets the most love. x86, ARM, PPC, and SPARC are listed as "Generally Reliable." Quite an accomplishment, given the difficulty of building up a new compiler from scratch. For now, MIPS is notably missing from the party.

Unfortunately, I'm a student with limited resources. I have three boxes, a macbook and two Atom desktops, all x86_64.

If you promise to get things working on ARM, I might be able to get you something.

I have a 4 week break in December; I'll have ample time then to work on this. Please shoot me an email at admin@thefireflyproject.us, so we can talk this over some more.

Wasn't tcc able to do the same[1] quite a while ago?

[1]: http://bellard.org/tcc/

Even better, it could compile and boot Linux directly from its source code: http://bellard.org/tcc/tccboot.html

I don't think tinycc and llvm are even in the same league in terms of optimizations.

Tinycc is, well, tiny. You can get a working compiler just using syntax-directed translation schemes and it would compile any valid program. The output would even boot, but it would make baby-Muchnick cry.

True, no one is comparing their performance characteristics, it's just that getting the Linux kernel to compile isn't a unique feat where you need to be on the same level as gcc and llvm. You "just" need ANSI C with GNU extensions. And not break on a couple dozen corner cases.

It's even self hosting. This is huge.

This is good news. FYI, Clang/LLVM's been building FreeBSD for a while now: http://wiki.freebsd.org/BuildingFreeBSDWithClang

My only worry, in the light of Oracle's spat with Google over Dalvik and its lack of teeth over the OpenJDK, is what kind of patent protection the BSD-like license of LLVM offers against Apple in case it decides every Linux kernel on every non-iPhone smartphone (something that will probably happen in a couple years) infringes some patent they have that touches LLVM in some way.

Any lawyer wants to chime in?

The GPL doesn't offer any special protection against third party patents, and there's no way it could, given the way patents work.

I am not considering 3rd party patents, but Apple's. If Apple includes some patented tech into a contribution to LLVM or Clang, what guarantees I have they will not sue non-Apple downstream users?

...in case it decides every Linux kernel on every non-iPhone smartphone...infringes some patent they have that touches LLVM in some way.

Is there some legal precedent you know of wherein compiler patents extend to the output of the compiler?

What if the output of the compiler uses some method you patented?

Then I suppose the community can fork from the revision prior to where that patented technique landed in the tree, just as with any other OSI-approved license.


So, the license offers no protection at all...

Depends on one's perspective. Not long ago the meme de jour was that open-source transparency & fork-ability was the ultimate protection to corporate evils.

By the standard you're setting, the only safe compiler would be one you wrote yourself and revealed to no one. What compilers do you use?

Well, the technology used (the C language) is 30 years old. What is new is LLVM, but that was not created by Apple either.

…but it still can't compile Ruby 1.9 :-P

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact