Hacker News new | past | comments | ask | show | jobs | submit login
LLVM 4.0.0 (llvm.org)
468 points by zmodem on Mar 13, 2017 | hide | past | web | favorite | 141 comments

> thanks to Zhendong Su and his team whose fuzz testing prevented many bugs going into the release.

http://web.cs.ucdavis.edu/~su/ claims 1228 bugs found (counting both LLVM and GCC). Impressive!

With fuzzing it's possible to find distinct bugs (or at least bugs that trigger in distinct code locations) without ever further investigating the bug in person.

Your bug report can simply consist of "this input file causes a compiler crash".

Crashes are the easier bugs to report, but often the most important compiler bugs are silent miscompilations that go unnoticed. With C compilers in particular, it's not easy to non-conclusively demonstrate a bug/convince the developers that the compiler is actually buggy (a lot of C standard lawyering is at play when it comes to subtle undefined behaviors).

[disclosure: I have been involved with the initial phase of "EMI" compiler validation work that's linked] One of the great strengths of EMI has been its ability to identify a lot of miscompliations in particular. In fact, while we did not fix the bugs ourselves, I would say the majority of the time spent was probably in reducing the bugs in a short program that manifested the issues and engaging with the developers to prove the bug is not an undefined behavior. I remember from my days that the majority of bugs we reported were miscompilations, not crashes, which took a lot of time to report. Even with simple crashes, you probably need to reduce the bug as the developers don't usually appreciate attaching a 7MB source file to your bug report!

Indeed! On the Mill project we leave boxes crunching hours and hours of csmith/creduce, and we don't watch them do it :)

SPE looks to be very nice too.

I haven't seen a Mill press release hit the front page for awhile.

I'm not good at subtly hinting.

What's the Mill project? Is that the "new computer" thing or am I missing something.

Yeap that's the project :)

We have our own llvm backend which is a front end to our own "specialiser".

We use csmith to fuzz for compiler bugs and creduce to reduce them. This starts with C and we even validate the output of the sim against clang x86 output.

How should someone go about contributing to your LLVM backend?

So it is meant to be a different architecture? What market is it targeting?

Is mill low power, high performance, embeded, radiation hardened, etc? Will it come in small packages with RAM and ROM on the die? Will it support a different architectural view (maybe all the memory will be non-volatile)?

A lot of questions and very little answered in the page.

Give a man a bug report, he will fix his program one day. Give a man a fuzzer, he will be fixing bugs for the rest of his life.

Alternatively, you could do what I do and just not write bugs.

This works if you don't have coworkers.

I'm kidding! It doesn't work even then.

In those kinds of environments, typically I just will the program into existence, bug-free.

I agree with you cynicism when dealing with a software vendor like Oracle. Where bug reports are fixed for only the reported instance alone (if at all).

For a FOSS project like the LLVM I think getting >100 parser bugs might cause a slight re-evaluation of the current code structure. Maybe I'm a bit too optimistic, but the LLVM team has done a fine job of fixing outstanding issues.

Lastly Fuzzers like the AFL (American Fuzzing Loop) attempt to model input, sort of doing a lazy search for bugs (taking parser run-time into account). So when it finds a cluster of bugs they're normally clustered in the source code as well.

i don't think the comment was meant to be cynical, i think it was meant to show how much time and effort fuzzing can save when trying to generate reproducible test cases.

I wonder how many of the bugs were squashed by Dr. Su himself and how many were DISs or other forms of work supervised by Su. Maybe he did kill that many bugs on his own but that seems outrageous (without taking time to actually see what kinds of bugs they were).

If you look at the homepage, he has 4 separate links for LLVM bugs which differ only in the reporter's email address. The first, and largest, is his which suggests that he substantially did the actual work since the other members of his team appear to be reporting problems using their own contact info

The professor himself very likely did none of it. My guess is that someone like Dr. Su probably wouldn't even have enough time to work on fuzzing LLVM, and would likely be more concerned with writing up the next proposal to fund his lab :P

From what I've seen in academia, the supervisor typically generates ideas, and the grad students and postdocs test them out and see if they have potential. The discussion goes back and forth until a decision is made on whether or not to pursue an idea.

This differs from group to group of course, and is not the case in smaller groups, but it's generally how things work.

I think this can vary across people/projects and perhaps true in a large number of cases (this could also be said for the management in business, and it is debatable whether spending time on the ground doing the actual thing would always be the most leveraged thing one can do). Irrespective of that debate, at least for this particular instance, I can confirm that the professor in question himself [who happens to have been my doctoral advisor :)] went through a whole lot of trouble personally, down to reducing many many individual bugs and you can just look at the bug tracker activity under his name to independently verify. (I have long graduated so this is not a "paid Glassdoor review" either and ran into this as a frequent HN lurker!)

I happened to come across here... For this specific case mentioned in your first paragraph, it is not true.

Thanks for letting us know! That's quite impressive honestly :)

I can anecdotally confirm this to be true in my experience as well.


There won't be an academic paper supporting this until a senior academic thinks it's a worthy concept for investigation and follows up by asking his postgrads to work the idea up into a publishable paper. Quite meta really.

Where do you expect someone to find statistics to cite regarding the internal workings of university labs? Or are you looking for someone's blog post, which would ultimately also be anecdotal?

"[Citation needed]" is just noise when there's no reasonable expectation that the desired evidence actually exists. Moreover, we're not writing Wikipedia here, just trying to have a discussion. Take a chill pill.

It's just anecdotal. Take it with a grain of salt if you're skeptical.

You just said it was likely.

Based on my experience, it's likely that he didn't work on it personally. Is there something wrong with a statement like that?

Real life. Mine also.

Looks like it didn't make the release notes but one of the features new for this release is opt-viewer. It's useful for finding the rationale why some bit of code was/wasn't optimized. It's a WIP but usable today.

I made a demo [1] for this tool.

[1] https://github.com/androm3da/optviewer-demo

LLVM Coroutines - This is the most exciting thing for me. Gor Nishanov in his videos explains how coroutines are implemented and how are optimized by LLVM. Asynchronous IO code will be so easy to write and so efficient. Context switch in cost of function call, you can have billions of those coroutines, heap allocation elision (in certain cases). Can't wait for coroutines to land in Clang.

I am a big fan of Go gorutines so Networking TS and Coroutines TS made me very happy, connecting both and having it in standard will be great. Just a shame that for Networking TS integration we will need to wait for C++20.

Go-style coroutines will still be more efficient and more elegant than C++ coroutines. Goroutines are stackful, whereas in C++ you'll need to manually chain coroutines. That means a dynamic allocation and deallocation for _each_ call frame in the chain. That's more efficient than JavaScript-style continuation closures, but in many situations still far more work than would be required for stackful coroutines.

Everything described in Gor Nishanov's video applies equally to stackful vs non-stackful coroutines. That is, the code conciseness, composability, and performance advantages of non-stackful coroutines are even greater with stackful coroutines.

Nishanov dismisses stackful coroutines out-of-hand because to be memory efficient one would need relocatable (i.e. growable) stacks. But that begs the question of how costly it would be to actually make the stack relocatable. I would assume that in a language like C++ where stack layout must already be recorded in detail to support exceptions, that efficiently relocatable stacks wouldn't be too difficult to implement.

At the end of the day, without stackful coroutines networking still won't be as elegant as in Go. And for maximum performance, state machines (e.g. using computed gotos) will still be useful. You'll either need to sacrifice code clarity and composition by explicitly minimizing chains, or you'll sacrifice performance by having deep coroutine chains.

> Nishanov dismisses stackful coroutines out-of-hand because to be memory efficient one would need relocatable (i.e. growable) stacks. But that begs the question of how costly it would be to actually make the stack relocatable. I would assume that in a language like C++ where stack layout must already be recorded in detail to support exceptions, that efficiently relocatable stacks wouldn't be too difficult to implement.

No, this is a common misconception. Unwind tables only store enough information to be able to locate each object that has to be destroyed. It's an entirely different matter to move those objects, because a system that does that has to be able to find all outstanding pointers to an moved object in order to update them. It's legal, and ubiquitous, in C++ to hide pointers to objects in places that the compiler has no knowledge of. Thus moving GC as it's usually implemented is generally impossible (outside of very fragile and application-specific domains like Chrome's Oilpan).

The best you can do in an uncooperative environment like that of C++ is to allocate large regions of a 64-bit address space and page your stacks in on demand. (Note that this setup is gets awfully close to threads, which is not a coincidence—I think stackful coroutines and threads are essentially the same concept.)

> I think stackful coroutines and threads are essentially the same concept.

I think "fibers" is more correct term here: "stackful coroutines and fibers are essentially the same concept".

In Win32, each process consists of threads, and each thread consists of fibers. Each fiber has its own stack. Userspace code can switch fibers with SwitchToFiber() API, kernel never switches fibers (kernel's unit of scheduling is entire thread).

Is there actually some profound research material or experience from larger projects on whether stackful [g|c]oroutines or stackless ones performs better? I'm wondering about this for quite some time and would be very interested in some more facts about this.

During the last years the stackless style in form of eventloops with promises, continuations and async/await sugar got a lot of traction. Often the authors argue that it's better for performance, since no stacks need to be allocated, switched or resized and the only overhead is the object which is allocated for the statemachines - which is true for sure. However these systems have still lots of dynamic allocations, often one or more for each point where a suspension happens. With goroutines in contrast there should be no need for dynamic allocations in general. And also less need for dynamic allocations in general, since lots of state can be put on the stack instead of being stored in various statemachine objects in the heap. I guess data locality might also be a plus for goroutines, instead of having state possibly spread all over the heap it's in a smaller region which is the goroutines stack. But of course the downside is that these stacks need to be allocated and sometimes resized (however I think for lots of goroutines that either won't happen or won't happen often).

> However these systems have still lots of dynamic allocations, often one or more for each point where a suspension happens. And also less need for dynamic allocations in general, since lots of state can be put on the stack instead of being stored in various statemachine objects in the heap. I guess data locality might also be a plus for goroutines, instead of having state possibly spread all over the heap it's in a smaller region which is the goroutines stack

Rust's futures-rs library seems to only require a single allocation for a whole "task", rather than at every suspension point. This implies that that all of the relevant persistent-across-suspensions data is in one object on the heap, in a contiguous allocation that is reused for the whole processing (similar to a goroutine's stack).

> - None of the future combinators impose any allocation. When we do things like chain uses of and_then, not only are we not allocating, we are in fact building up a big enum that represents the state machine. (There is one allocation needed per “task”, which usually works out to one per connection.)

- https://aturon.github.io/blog/2016/08/11/futures/

It's certainly true that Rusts future model (which is readiness and not completion based) is the one that requires fewer allocations - e.g. compared to JS promises or C# Tasks. There seem to be some in the Task structure for parking/unparking. I don't have any idea how often these happen as I don't have a deep understanding of that code. Besides that I think most more complex scenarios will use boxed futures, as combining the raw futures yields more and more complex future types which at some point won't combine anymore unless there's some type erasure (through boxing) - or probably with intense effort and knowledge that the average user won't have. The comparable thing to lots of these combinators with goroutines would just be normal control flow (loops, conditions, etc.), which will not allocate. Then there's some things where I'm not certain about how they work out with Rusts approach: E.g. let's say I have some kind of parser with 20 input states that, and for the duration of each state I would need some temporary variable of 500bytes. With the stackful approach I would enter a new scope (maybe in form of a function call), put the temp variable and the stack, work with it in the blocking call, exit the stack and the memory is freed. My understanding is that with Rusts state machine approach where everything is allocated upfront I would need to reserve the 500bytes in each of the child futures which handles the step and the total future would combine those, which results in 20*500bytes. Or are these coalesced since the combined future can be represented as an enum/union?

It's still using callbacks, though. So while there may be a single allocation, code conciseness suffers.

Using Rust futures' and_then, or_else, etc interfaces, try writing a parser that consumes data byte-by-byte, where reading each input byte might block. (Similar to using fgetc() in a C state machine.) It won't be very easy to understand, especially compared to threaded code.[1] Doing that with coroutines, even stackless coroutines, would be much more clear.

That's the problem with futures. If futures were so great, we'd use them for all control flow. There's a reason we don't, just like there's a reason we don't use purely functional languages everywhere. But at least purely functional languages are consistent, and usually have other features (like lexical binding and garbage collection) to make alternative control flow patterns easier and more concise.

Ask yourself this: if resizeable, relocatable stacks were a freebie (i.e. already provided, or because there was 0 desire to support a pre-existing calling convention and most of the instrumentation needed for moving stacks was preordained by exceptions), would you ever not implement stackful coroutines? The future pattern API could be easily implemented as a library for when it made sense. Decorating calls with "async" would become superflous, but people would still be free to do that. Message passing and consumer/producer inversions would become free and natural coding patterns. Stackful coroutines are clearly the more powerful abstraction. But implementing them is difficult, especially in the context of legacy environments (e.g. C runtimes) and so people compromise and often choose a less powerful abstraction.

[1] In reality you're likely to use a mismash of patterns and language features because futures will just be too intrusive of a construct for fine-grained control flow manipulation. All of that adds to the complexity of a language and to the solutions you implement in the language.

Back when C10K was first written, the big debate was between threaded vs asynchronous (implicitly callback based) code. In terms of performance asynchronous won the day, but I think the threading proponents were always correct that threaded code was more natural, more clear, and less buggy (ignoring preemptive multithreading). Even though I've spent over 15 years writing asynchronous I/O networking code, I think the threading proponents were correct. For control flow, threading is the best default model and should be the gold standard. Stackful coroutines are similar to threads in terms of how they relate to normal calling conventions and function types. When it comes to the theoretical code conciseness and performance possibilities that's no coincidence.

Once you have callbacks, it's not hard to wrap it in some nice syntax sugar that makes it look like regular code, more or less. Or did you mean something else?

Yes, I agree that it's a matter of "legacy environments". But the thing about it is that they will remain non-legacy for as long as the purported successor is not determined. The C model that everyone currently uses as the lowest common denominator persists precisely because of that - it's what everyone has been able to agree on so far. Beyond that, not so much. So Go-style coroutines may well be awesome, but it doesn't matter because of all the other languages that have incompatible models.

Futures, on the other hand, work for this purpose, because you can represent them even on C ABI level. Granted, it's a fairly expensive abstraction then, but it works. Look at WinRT, where futures are pervasive in platform APIs and code that consumes them, and it all works across .NET, C++ and JS, with a single future ABI underneath all three.

> Ask yourself this: if resizeable, relocatable stacks were a freebie (i.e. already provided, or because there was 0 desire to support a pre-existing calling convention and most of the instrumentation needed for moving stacks was preordained by exceptions), would you ever not implement stackful coroutines?

Don't stackful coroutines necessarily mean you need a runtime (with a GC) that is able to grow the stack ? Or am I missing something ?

Why would you need a garbage collector?

To allow for stack growing, the compiler would insert a small number of additional instructions into function entries that either do nothing or call out to a runtime library function that handles stack growing.

Nothing conceptually new in here, that isn't already done by compilers to deal with exception handling, initialize variables or perform division on machines that don't do it natively.

Sure, it requires some runtime support but so does almost every C or C++ program in existence.

edit: Take a look at the split stack support implemented by GCC: https://gcc.gnu.org/wiki/SplitStacks

FWIW, Rust used to use split stacks, as implemented by LLVM, but they had problems like unpredictable performance (if one happened happened to be unlucky and end up repeatedly calling functions across the split boundary, each of those calls ends up doing far more work than desireable). Relocating stacks is a whole other issue, and almost impossible in the model of languages like C/C++/Rust where there's no precise tracking of pointer locations: the information in exception handlers (approximately) tells you were things are, but nothing about where pointers to those things are.

Yes, copying stacks are unsuitable for unrestricted C or C++ code. Segmented stacks might have problems (Go also switched away from them for the same hard-to-predict performance reason) but when you can't do stack moving I still think they're attractive.

> Goroutines are stackful, whereas in C++ you'll need to manually chain coroutines. That means a dynamic allocation and deallocation for _each_ call frame in the chain.

You can elide that heap allocation. Coroutines proposed for C++ are like functions, when you call function inside goroutine you also allocate function frame, as you allocate frame for coroutine.

Every gorutine has it's own stack which means you also allocate minimum of 2 KB for every gorutine. When stack grows over 2 KB you need to copy + reallocate in Go.

Coroutines are like functions, easily optimized by compilers which Gor showed. You can get coroutines inlined and state machines are not so easily optimized.

> At the end of the day, without stackful coroutines networking still won't be as elegant as in Go. And for maximum performance, state machines (e.g. using computed gotos) will still be useful. You'll either need to sacrifice code clarity and composition by explicitly minimizing chains, or you'll sacrifice performance by having deep coroutine chains.


EDIT: Citation from Gor discussing this in more detail:

"In an ordinary function, locals are stored in the registers and in the activation frame on the stack. Coroutines, in addition to the activation frame/registers, have a separate coroutine frame that stores things that need to persist across suspends. Activation frame is created every time coroutine is entered and torn down when it is suspended or completed. It looks as a normal function call and return. Coroutine frame is created the first time you activate a particular coroutine instance and persists across suspend until a control flow reaches the end of the function or hits a return statement.

In general, a call to a coroutine (with suspend points) is more expensive than a function call, since we need to allocate both the activation frame (sub rsp,X) and a coroutine frame (operator new). When the coroutine lifetime is fully enclosed in the lifetime of the caller we simply put the coroutine state as a temporary on the frame of the caller, thus avoiding heap allocation.

To summarize:

    suspend/resume = cost of the function call

    coroutine call/return >= const of the function call

Coroutines proposed for C++ are indeed, "like functions". But they're not functions. It matters.

While the proposal will allow more optimization than the existing alternatives, they won't allow all the same optimizations possible as for stackful coroutines. For example, you can't inline a non-stackful coroutine that has independent state. But a stackful coroutines could be inlined. If all you care about is using coroutines as an interface to an async API, then it's not a big deal. But that avoids questions of how and at what cost you could make use of coroutines to implement the API. (You know, the _hard_ part!) And without experience using coroutines in languages like Lua, you can't even _imagine_ all the possible programming patterns you're missing out on.

As a general matter I think it's really short-sighted to add a language abstraction and only use as your frame of reference one niche usage pattern--namely, async completion. As an abstraction coroutines are _much_ more general than that, but they're unlikely to be used in those ways if they're not transparent--that is, if they're a leaky abstraction that constantly forces you to keep in mind various tradeoffs.

I don't see why the reality needs to be controversial. Stackful coroutines are just more powerful. Full stop. But that doesn't mean non-stackful are useless or shouldn't be added to C++. Just don't expect to see the same dividends that you could theoretically get from stackful coroutines, and definitely not the same dividends as from Goroutines. Don't forget that Go has first-class functions with lexical binding and garbage collection. All of those language features compound the benefits of stackful coroutines.

> For example, you can't inline a non-stackful coroutine that has independent state

I'm a huge fan of stackful coroutines (having written multiple variants myself) and I have been lobbying hard for them. Having said that, stackless coroutines, as long as the next continuation is statically known or knowable, are trivially inlineable (they desugar to the equivalent of a switch statement).

I am not saying stackful coroutines are bad, I'am using Go on daily basis, I've used libmill stackfull coroutines in C etc. They have their usage cases in which they will be better than stackless coroutines, there are tradeoffs in both designs of coroutines. Stackless coroutines for my use case they are ideal (async IO) and can be easily optimized. I don't agree with what you wrote about using coroutines (and similar concepts) for async completion is niche. In most code bases I saw coroutines/tasks/gorutines etc are used mainly for async IO.

I thought I read somewhere that Gor's coroutines could combine allocations as a compiler optimization in some cases. In particular this seems like it would be useful when the compiler can statically recognize deep, unconditional branching.

I can't find a link for that right now, though. Do you know anything about this?

I asked Gor if you can use LLVM's coroutines without a runtime. He answered in the affirmative (subject to some constraints), which I take as evidence, along with his presentations, that these optimizations do indeed take place.


Nit, that raises the question, you aren't begging anything.

is this a limitation of C++ oder LLVM?

It's a limitation of C++. The reason is to avoid breaking interoperability with C and the hosts' existing calling conventions.

They probably made the right decision. C++ is already heavy on dynamic allocation, so I don't really see it being a significant issue by itself.

But from a code composability standpoint there's a huge gulf between stackful and non-stackful coroutines. People don't see it because it's actually a rare feature supported by only a few mainstream languages (Lua, Go, Modula). It's hard to appreciate something you've never really used.

I like to think of stackful coroutines as similar to functions as first-class values. When you program in a language that supports functions as first-class values--especially with lexical binding--then function composition becomes much more natural. Similarly, when coroutines are stackful you'll use them more naturally, rather than relegate them to niche problems like async I/O. Put another way, when you don't need to decorate calls to coroutines, or when coroutines don't need to have a special type, coroutines become just like any other function. In particular, they can become blackboxes again, where you can use them as appropriate without the caller having to cooperate.

If all functions were first-class values in C++, then non-stackful coroutines would be a real loss for the language because it would break symmetry. Because C++ doesn't have first-class functions, it's already normal to juggle different kinds of functions or function-like objects, so programming patterns are unlikely to change very much. So in that sense it's not that much of a loss, but effectively a net gain.

I'd just like people to realize that not all coroutine implementations are the same. Many of the more elegant and seamless constructs just aren't possible with non-stackful coroutines, in the same way that some of the more elegant function composition patterns just aren't possible without functions as first-class values. And when writing languages from scratch it would be really nice for people to pay attention to these differences. Stackful coroutines are not easily added to a language design or implementation after the fact. It's why Python and JavaScript don't have them--because the existing implementations assume a fixed, contiguous C stack all over the code base, and refactoring those assumptions away is impractical, if not impossible without effectively a full rewrite.

> It's a limitation of C++. The reason is to avoid breaking interoperability with C and the hosts' existing calling conventions.

Nothing to do with that. Stackful coroutines in C++ have existed for decades and are proven technology. In fact the push for stackless coroutines is because they are perceived to be better optimizable.

> They probably made the right decision.

no decision has been made, while MS proposal is in a more polished state, there is a lot of pushback and the other proposal is still under consideration.

> C++ is already heavy on dynamic allocation


> whereas in C++ you'll need to manually chain coroutines.

Boost has stackful coroutines:


But without relocatable stacks you're incurring a large, fixed cost per coroutine.

Of course, with stackful coroutines you'll have a lot less coroutine objects. But the real issue is that you have to decide ahead of time how large you want to make the stack. Too large and you waste memory, too small and you'll blow the stack. In the context of individual projects the programmer can usually make the call without much trouble. But removing that kind of burden from the programmer is usually the primary reason you add such abstractions to the language standard. And it's why the typical execution stack is so huge (1+ megabytes) on most non-embedded systems (and even many embedded, network-facing systems).

boost.coroutine can be optionally used with segmented stacks. Copyable stacks are a problem in C++ as objects can have custom copy semantics.

Anyway, large stacks (assuming you want at least 1 page per coroutine) do not waste memory, they only waste address space which is plentiful. If you want slim coroutines, there are the shallow generator style coroutines of the other proposal.

There is some hope that we will have the best of both worlds, stackful coruoutine semantics plus annotations to opt in to the stackless coroutines optimization.

  There is some hope that we will have the best of both
  worlds, stackful coruoutine semantics plus annotations to
  opt in to the stackless coroutines optimization.
That would be pretty amazing. Are there proposals or committee discussions you could link to? If C++ got stackful coroutines I might finally make the switch from C.


call/cc is the basic mechanism - useful to implement stackful coroutines and fibers (cooperative multi-tasking)

Here's the documentation: http://llvm.org/docs/Coroutines.html

Is this similar to fibers?

This was exciting to me too but I couldn't tell how it is exposed in 4.0. All I could see from the notes and the discussion below was on the underlying implementation. Or perhaps I am blind? Will there be a library (stdc++) interface or...?

It would be interesting to try these in some functional language backend, say, in GHC or Ocaml.

GHC compiles haskell through about a million different IRs and abstract machines, until eventually reaching Cmm, which is effectively a similar idea to LLVM IR (A very simple C ish imperative language). Any benefits to be had, should be fairly straightforward for the GHC folks to use, which should be cool.

It's just Core, STG, Cmm. Anything else I'm missing?

I wasn't trying to be critical, it's just significantly more than most languages (With good reason!). This is a good thing, I dream of when we will have libraries/frameworks for doing these kind of high level program transformations in a modular way a la LLVM.

No. But that is probably two more than most other languages with a similar compilation model. (I think Rust has two IRs?)

Just for some more data, Swift is (Possibly still in the process of, but I don't follow swift so they could have finished years ago) moving to a higher level IR, for similar reasons to Rust and Haskell.

I thought Swift had multiple levels of SIL inside the compiler.

Love the improvements to clang-tidy!


Congratulations on the work. Also nice to see that OCaml bindings are still being taken care of.

Absolutely. The following two new checks are impressive and should be on by default.

New misc-move-forwarding-reference check Warns when std::move is applied to a forwarding reference instead of std::forward.


New misc-use-after-move check Warns if an object is used after it has been moved, without an intervening reinitialization.


The use-after-move check has its limitations which make sense considering the "unspecified but valid state" moved from objects are in. From reading the design document the cases they handle should catch common ownership issues, though. Great step in the right direction for sure.

Implementation: https://github.com/llvm-mirror/clang-tools-extra/blob/master...

Related: https://doc.rust-lang.org/book/ownership.html

> Stable updates to this release will be versioned 4.0.x

/nit Semantic versioning (or communication) failure. I would think that "stable updates" would represent minor releases (i.e. 4.x.0), not bugfix-style patches. Unless all new features will be present in major releases instead of "stable updates"?

They have their reasoning behind this scheme over here: http://blog.llvm.org/2016/12/llvms-new-versioning-scheme.htm...

I personally do not agree with their line of reasoning.

OK, so indeed, no features in between major releases.

It's also somewhat unfortunate that, in their words, "every [six month] release is also API breaking". How can you create a stable product that targets a constantly breaking API (short of picking a version and sticking with it)?

Of course, I'm a biased, since I consider stable to be measured in years, not months; certainly not the current trend.

> How can you create a stable product that targets a constantly breaking API (short of picking a version and sticking with it)?

You don't. This is a cost they explicitly push on consumers of the API. As a consumer, you either spend manpower to keep up-to-date, or if you don't have that luxury, you fall behind and update sporadically.

In the open source world for example, there are many volunteer driven projects that use stale versions of LLVM for exactly this reason. This causes a certain amount of churn since projects that don't maintained end up bit-rotting much more quickly than they would otherwise.

This has been the de-facto way of things with LLVM for some time, but at least they're making it explicit now.

> How can you create a stable product that targets a constantly breaking API

Since it's an open source library, and not a remote resource where you don't have control of what's available and deprecated items being removed, it's less of a problem. You don't have to use the newest version of LLVM if you don't want to. I'm sure at least a few prior major versions will get bug/security fixes if the problems are large enough.

That's not to say this is the correct, or best option, but it is one of the ways to go forward, and I would say it's not necessarily a bad way forward when you're still trying to build usage through innovation and features rather than retain usage. Backwards compatibility comes with its own problems, which accrue over time and can eventually strangle a project.

Perhaps in the future they will decide to use the minor version number through retaining API compatibility, and thus the major version number, for 3-4 major releases (18-24 months), while still allowing non-breaking changes to go forward. Their versioning rationale doesn't really prevent that, it's just that their current development style and goals do not support it.

Unfortunately the assertion that you control which LLVM version you use isn't always the case. Debian 8 stable is still using LLVM 3.5, while homebrew on OSX and archlinux typically track the latest stable version (currently 3.9). Crystal, the programming language I work on, has to work on both versions and a few in between. (We're dropping support for < 3.8 soon though). Sure, we could bundle a LLVM version with the language, but that has several disadvantages, including size and extra work to build and distribute the package securely.

That being said, the changes are typically minor, and while the bindings are a little bit too dense on {% if LibLLVM::IS_38 %} for me [0], it's certainly not hellish yet. However the LLVM debug info generator is a different story... [1]

[0] https://github.com/crystal-lang/crystal/blob/c60df09b5082918... [1] https://github.com/crystal-lang/crystal/blob/master/src/llvm...

It is my understanding that this is semi-explicit in the culture of the LLVM project as a BSD analog to the architectural damage sometimes described with GCC and GPL: instead of "we don't like supported intermediate data formats, as it allows people to hoard code: if you want to use our code you have to link to it, which will force you to open source your code so it can be upstreamed" they play "we don't like supported APIs, as it allows people to hoard code: if you want to use our code you will spend a lot of your time on the merge treadmill, which will eventually demoralize you into begging for us to upstream your code".

Honestly I just think distros and package managers are the busted ones here. LLVM is strongly coupled to your language's semantics, and I'm unaware of any good reason to force every compiler to use the same version.

This is why I wish more package managers had a (more) formalized versions of what Gentoo calls "slots". One of the major uses of slots in Gentoo is to represent major versions of libraries, s.t. LLVM 3.5 would be in the "3.5" slot, and LLVM 4 would be in the "4" slot. Dependencies are declared against slots, and the standard *nix naming convention for shared objects trivially lets multiple slots be installed at the same time.

(Other package managers tend to emulate slots by simply having multiple packages with the slot number embedded in the package name itself. Debian/Ubuntu, for example, does this, and it isn't clear to me why their scheme would preclude having multiple LLVMs installed. That said, Gentoo always seems to have difficultly with clang/llvm, so my gut wonders if there is something more at play here.)

Debian does exactly that - the unstable repo currently has:

llvm-3.4 llvm-3.5 llvm-3.6 llvm-3.7 llvm-3.8 llvm-3.9 llvm-4.0

Note that these are all package names. Each is then versioned separately - e.g. 4.0 is currently at v4.0-1, and 3.8 is at 3.8.1-18.

It's not a technical problem. All distro package managers allow for parallel-installed libraries one way or another.

What's necessary for your distro to actually ship multiple versions is someone stepping up to actually maintain the thing, dealing with bug reports and monitoring security issues.

For libraries with many users that happens all the time. See the parallel packages for OpenSSL 1.0 and 1.1 in most distro repositories. For less popular libraries it's often easier to just patch all users and only ship the most recent version.

Nowadays LLVM seems firmly placed in the first camp, as patching programs to work with different LLVM versions can be a lot of work and there are quite a lot of them.

Looking into the Debian repos, you find packages for LLVM 3.7, 3.8, 3.9 and 4.0, in addition there's even a 5.0 pre-release snapshot. It's similar with Fedora and probably other distributions.

Just curious - why not define a macro for some of those? For example DI_FLAGS_TYPE that resolves to DINode::DIFlags / uint as needed? It seems there are some similarities that could be abstracted.

Also a cleanup could be simpler later - find/replace the macro.

Yes, same for pony. Only DIFile and the AlignBits changed.

But major releases are always expected to be API-breaking, right? Isn't that basically the (SemVer, at least) definition of a major vs minor release?

Nothing's forcing anyone to keep up to date, though, so anyone can pick a version and stick with it as long as they like. (So long as they keep making patches for at least the previous version for major bugs...)

Major releases in SemVer can break the API, yes. But the alternative a lot of people would like, which is what SemVer contemplates, is to have a distinction between releases that add features without breaking the API, and less frequent releases that break API. For example, following this 4.0.0 release, in the SemVer model there would then be a series of 4.1.x, 4.2.x, etc. releases that add new features with backwards compatibility, followed only sometime considerably later by a 5.x.x that breaks API.

However, this isn't how LLVM does things. Instead, there are pure bugfix releases (4.0.x) with no new features, followed by relatively frequent releases that both add features and break API (4.1.x, 4.2.x, etc.).

That results in half a dozen versions of LLVM libraries installed on a given machine instead of 1.

When it comes to a compiler, I think I would rather have a statically linked LLVM in the compiler than a shared object anyway, which would make this moot.

It is supposed to be "mostly" API/ABI stable at the C API level. The C++ API is always (and has always been) completely unstable. One way to address it is to have your own wrapper around LLVM that provide the feature you need, and these wrapper can retarget multiple versions of LLVM at will.

How complete is the C API? I'd happily forgo any C++ to get a stable API in return.

Hard to say. I'm pretty sure you can create any piece of IR (almost), and Optimize/CodeGen. We're open to add new APIs when someone needs one that is missing. We only break the C API when there is really no choice. For example when I removed the C++ API getGlobalContext() I made sure to preserve the equivalent C API: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-201... ; but it is not always possible.

Ha, was just reading http://aosabook.org/en/llvm.html.

(Really like that LLVM IR. Does anyone code in it directly? Was also thinking it would be interesting to port Knuth's MMIX examples to it.)

Nobody actually programs in the IR of a compiler, except for (say) tests and debugging. The IR is not usually designed to be read to by humans, although LLVM is actually very readable (But still rather tedious).

If you looked at, just for an example, the IR used in the dmd d compiler, using (Walter Bright's) Digital Mars backend then you'd notice that the IR is not trivial to access at all. Last time I checked, you had to actually dive into the backend code (I'm pretty sure there's an undocumented flag/switch that I'm missing somewhere) to get the IR out of the compiler. In fairly sharp contrast to this, getting the IR (in a textual format!) is normally as simple as using -emit-llvm when compiling. This is part of the revolution (~Slight~ exaggeration) that LLVM brought about in compiler design: Compared to most other compilers, LLVM is designed to be very portable. The only compiler, of similar stature, like it is GCC. GCC is much better than it was, I'd imagine because of the competition from LLVM/Clang.

Tangent Aside: The LLVM IR textual format is not guaranteed to be stable at all. It's also not particularly nice to program in, and also it uses SSA form so it's not suitable for humane consumption. It looks like a high level programming language, but it really isn't. IR is designed to be generated, bullied and discard by the compiler.

If you want to punish yourself, the LLVM project does contain interpreters and compilers just for IR in a file. If I remember correctly, Clang will compile anything with a ?.ll? file as IR.

You can see all the optimisation passes in action here: https://godbolt.org/g/uZH3UD

"Nobody actually programs in the IR of a compiler"

It's called LISP. It is a niche style, I'll give you. ;)

:-). I think GCC has/had IR's that are LISP like.

Parts of the libraries for the Clover OpenCL implementation seem to be programmed in LLVM IR directly.

The way I understand it, in the near future, the folk who write languages/compilers/build toolkits will be writing down to LLVM's Intermediate Representation rather than down to some specific machine code for some specific chipset.

Then the folk whom build the processors/architecture/assembly languages (and open source advocates when businesses ultimately ignore LLVM for a bit) will be writing the conversion from IR down to some specific machine code. This allows Intel to convert the IR line "COMPLICATEDFUNCTION $r2 5" to some advanced x86 call that has some significant speed or memory increase and is only one line while TI can still call the 26 lines of MIPS they need to call and both will be semantically equivalent.

That way you can, from the the software side, be relatively agnostic with how you're writing code and the chipset side is able to get every ounce of optimization using advanced functionality (if available). More importantly, software side is then able to ignore any processor features (be they speed ups or slowdowns) because every chipset manufacturer should have some toolset to convert your IR down to the best machine code they've got based on some desires (speed, space, power, etc). Inevitably, there will still be differences in performance between chipsets but being able to build down to IR (even with performance issues for some chipsets while everyone gets on board) and not some large set of assembly languages should be very nice.

Just as additional information, this is just the future catching up with the past.

Most mainframe architectures since the early 60's, didn't really used pure Assembly, rather bytecodes that were processed by microcode on the CPUs.

Hence why many old papers tend to mix both terms, bytecodes vs assembly.

This tradition carried on to mainframe architectures like the AS/400, nowadays IBM i, where the user space is pure bytecode (even C), called TIMI, and a kernel level JIT is used at installation time to convert the application into the actual machine code.

IBM i Java takes advantage of this, where the JVM bytecodes are converted into TIMI bytecodes.

It also provides some kind of common language runtime called ILE (Integrated Language Environment).

So the trend of using LLVM bitcode on iDevices, Dalvik on Android or MSIL on .NET, JavaScript/WebAssembly on browsers, is similarly with containers, modern computing catching up with mainframe ideas.

IBM i Java used to run on top of TIMI. But the runtime overhead and IBM developer overhead were too high, and now it runs on top of PASE, which implements the AIX runtime without involving TIMI.

I wasn't aware of the change, thanks for the correction.

The change is a bit disappointing, in a way. Though I'm not sad to see the last of CRTJVAPGM.

> Just as additional information, this is just the future catching up with the past.

> Most mainframe architectures since the early 60's, didn't really used pure Assembly, rather bytecodes that were processed by microcode on the CPUs.

And was that microcode typed SSA-based IR with alias metadata and so forth?

You can't just lump all "bytecodes" together. x86 machine code is a bytecode that is processed by the microcode on the CPUs too.

I bet if anyone goes dig into SIGPLAN pappers there might be one or two examples there.

Even if not, I see it as a very interesting Msc or PhD to implement a FPGA to execute LLVM bitcode.

I am not the one lumping all bytecodes together CS papers since the dawn of our industry are. Just go dig a random paper about computer architectures in the 50 and 60's, for example.

Some of them even use bytecode and assembly interchangeably on the same document.

I guess we could eventually consider it a form of bytecode, given its CISC nature.

And when bytecode gets a pure hardware implementation, without any form of microcode support, is it still bytecode or has it become assembly?

What differentiates LLVM IR from, say, JVM bytecode? I'm curious because there's a stalled out GNU project under GCC called GCJ that would compile JVM bytecode to native. I wonder if the issue became that statically linking in the JVM in the binary resulted in a lot of bloat, or something more intrinsic to the suitability of JVM bytecode as a platform-independent IR...

Off the top of my head:

* JVM bytecode is stack-based, whereas LLVM uses "infinite registers" in SSA form

* being in SSA form makes it convenient to consume in compiler passes, but comes with quirks that mean you don't really want to write in that style manually: the mind-bending phi instructions, definitions must dominate uses, simple ops like incrementing a variable really means creating a new variable, etc.

* JVM bytecode carries a lot of Java-level information, for instance if you have N classes with M methods each in source, you will typically find N classes with M methods in bytecode too. A lot of keywords in Java have an equivalent in bytecode (e.g. private, protected, public, switch, new...)

* in contrast, LLVM IR feels closer to C (it only knows about globals, arrays, structs and functions). It exposes lower level constructs like vector instructions, intrinsics like memcpy

* JVM bytecode is well specified: anyone armed with the pdf [1] can implement a full JVM. LLVM IR is somewhat loosely defined and evolves based on the needs of the various targets

* JVM bytecode is truly portable, whereas target ABI details leak into LLVM IR. A biggie is 32 bit vs 64 bit LLVM IR.

[1] https://docs.oracle.com/javase/specs/jvms/se8/jvms8.pdf

> JVM bytecode is truly portable, whereas target ABI details leak into LLVM IR. A biggie is 32 bit vs 64 bit LLVM IR.

From a few LLVM meetings and Apple's work on bitcode packaging, I think there is some work to make LLVM IR actually architecture independent.

PNaCl is noteworthy here too, insofar as its IR is very much based on LLVM IR (but portable and stable).

LLVM IR is an IR format, :). Basically LLVM is just a abstract RISC, whereas the JVM is a lot of that with a truck load of very high level instructions. One could implement these as a superset of LLVM, but that's not what LLVM is. You, mostly, can JIT LLVM IR and use it as a generic bytecode vm: but it's really designed for static copmilation.

When will we see processors with a subset of llvm ir in hardware ?

That would be a terrible idea. LLVM IR has an inefficient in-memory representation; every IR node is separately heap-allocated via new and chained together via pointers. This is probably a suboptimal design for the compiler, but it would go from suboptimal to disastrous if directly used for execution.

I don't think an implementation of LLVM IR for execution would require the same in-memory representation.

Exactly, my point was more that as we're having C-optimized processors and microcontrollers or even java or lisp based ones, maybe once there is many software readily compileable with llvm maybe architectures could be optimized for it (but not directly porting it, just having a tiny final step llvm based microcode. By example of course you can't have infinite registers as ssa. But it ca' influence your instruction set.

That's not the only reason why you wouldn't want to run LLVM IR directly (if it were possible). You still have the types, which are useless at runtime, and the unlimited register file to deal with.

You could make an ISA which is similar to LLVM IR, but there'd be little point when RISC-V (or even AArch64) already exists.

Likely never, LLVM IR uses SSA form. This means that optimisations are easier, but the "assembly" is significantly higher lever than assembly a la MIPS. IR is for doing optimisations not executing code (although LLVM does have interpreters if that's what ya need)

A more interesting question is, when will we see operating systems using LLVM IR (or similar; some future version of WebAssembly, perhaps?) for binaries on disk, dynamically compiling them and caching the result for the current platform as needed.

In principle that could happen, but LLVM IR is really not designed for anything other than being transformed by LLVM. One could define an abstract risc machine, to be jitted at either side. LLVM is not quite suitable for this purpose: It assumes quite a few details about the target. Also, this requires a huge amount of co-operation. So far this has only happened in the browser with e.g. ECMAScript standardization, asm.js and WebAsm. The JVM tried to do this, but it's not a good compilation target for languages like C/C++. Therefore, I think will happen eventually: The web browsers will develop the tools and specifications to make this stuff, then it will get broken off and used outside of the web (I hope, god forbid all software has to be distributed via the web using overlyHypedWebScale.js v2)

iOS 9, ChromeOS PNaCL?

Although it is not really what you are describing.

There are lots of Java AOT compilers to native code, namely most commercial JDKs for embedded development.

The biggest problem with GCJ was that most people left the project when OpenJDK was made available, and decided to work on it instead.

BTW, GCJ was recently finally removed from the GCC code base. I think it hadn't been maintained in a while.

LLVM IR is a generalized assembly language whereas JVM byte code is quite specific to the Java language, i.e., it deals with objects and classes. This causes all kinds of troubles for someone wanting to translate other languages to the JVM.

If I recall correctly, the textual IR is unstable between releases. They recommend building ASTs with their APIs instead.

I was under the impression the API is also unstable between releases. Has the API broken less often than the text format over the last few releases?

Yes. Quotes from LLVM Developer Policy http://llvm.org/docs/DeveloperPolicy.html

On text format: "The textual format is not backwards compatible. We don't change it too often, but there are no specific promises." On C API: "The C API is, in general, a "best effort" for stability. This means that we make every attempt to keep the C API stable, but that stability will be limited by the abstractness of the interface and the stability of the C++ API that it wraps."

There is confusion because C++ API is unstable (very unstable). But C API is stable. And C API is more stable than text format. (In my experience, policy is accurate description of actual practice and not empty promise.)

Ah! I was under the impression that the C API was quasi-deprecated relative to the C++ API, but from what you're saying, it sounds like that's not the case?

Coding directly in LLVM IR is tedious. It's supposed to be used via the AST building APIs.

IR is better than assembly

It's a bit of a thought experiment but it's not totally insane. Let's just say that humanoids writing in IR makes more sense than humanoids writing in asm.js Ultimately, IR is an assembly language. By contrast, web assembly is not an assembly language.


I'm trying to add support for lldb to a gdb frontend (https://github.com/cs01/gdbgui/), and need a gdb-mi compatible interface to do it.

lldb-mi exists, but its compatibility with gdb's mi2 api is incomplete. Does anyone know of a more compatible api to gdb-mi2 commands, or if there are plans to improve lldb-mi's?

Visual Studio is already at version 2017, and LLVM is only at 4, they need to catch up real quick! /ducks

VS is slowing down. VC++ 2017 is VC++ 14.1, not 15.0

Too bad they didn't use more aggressive aggressive grammar checking.

I wish they'd do what GCC does and just eliminate the middle number entirely.

But semver is a pretty meaningful standard. Why not stick to it, even if you don't plan on adding non-API-breaking new features?

Version number inflation strikes another software package, although at least it is not as bad as Chrome or Firefox.

It's not like there is a shortage of Integers, you know...

I bet you'll think differently when installing LLVM –2147483648.0 !

Unfortunately though, there is a shortage of capacity for our mental representation of them. It's pretty easy to remember whether I'm on version 3 or 4, less so for versions 3713 and 3714.

Your mind would just skip the 371 though, like it does with year dates, as the 371 would be the same for a long time. And you're back to version 3 or 4.

The previous scheme would have lead to 3.3713 and 3.3714, which does not really help mental representation


Demo page is not working. Is there any other page that makes me understand what really is it and where it is helpful.

LLVM is a compiler toolkit, used by, for example, Clang for C/C++/Objective-C, Rust, and various libraries like Mesa.

How did you find the demo page, and why did you expect it to be more useful than the front page at explaining what LLVM is?

The llvm.org front page already has "The LLVM Compiler Infrastructure Project" as its title and the first sentence of the body text is "The LLVM Project is a collection of modular and reusable compiler and toolchain technologies."

llvm is the "back-end" of a compiler turned into a reusable library, where the front-end of a compiler is what parses and understands a specific language (clang or rust being examples of primarly front-end things that leverage llvm).

if you make use of llvm, you "simply" have to parse your chosen language and hand off some intermediate form bits to llvm to create compiled binaries.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact