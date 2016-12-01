Hacker News new | comments | show | ask | jobs | submit login
LLVM 4.0.0 (llvm.org)
> thanks to Zhendong Su and his team whose fuzz testing prevented many bugs going into the release.

http://web.cs.ucdavis.edu/~su/ claims 1228 bugs found (counting both LLVM and GCC). Impressive!

With fuzzing it's possible to find distinct bugs (or at least bugs that trigger in distinct code locations) without ever further investigating the bug in person.

Your bug report can simply consist of "this input file causes a compiler crash".

Indeed! On the Mill project we leave boxes crunching hours and hours of csmith/creduce, and we don't watch them do it :)

SPE looks to be very nice too.

I wonder how many of the bugs were squashed by Dr. Su himself and how many were DISs or other forms of work supervised by Su. Maybe he did kill that many bugs on his own but that seems outrageous (without taking time to actually see what kinds of bugs they were).

The professor himself very likely did none of it.

In academia, the supervisor generates ideas, and the grad students and postdocs test them out and see if they have potential. The discussion goes back and forth until a decision is made on whether or not to pursue an idea.

Love the improvements to clang-tidy!

http://releases.llvm.org/4.0.0/tools/clang/tools/extra/docs/...

Congratulations on the work. Also nice to see that OCaml bindings are still being taken care of.

Looks like it didn't make the release notes but one of the features new for this release is opt-viewer. It's useful for finding the rationale why some bit of code was/wasn't optimized. It's a WIP but usable today.

I made a demo [1] for this tool.

[1] https://github.com/androm3da/optviewer-demo

> Stable updates to this release will be versioned 4.0.x

/nit Semantic versioning (or communication) failure. I would think that "stable updates" would represent minor releases (i.e. 4.x.0), not bugfix-style patches. Unless all new features will be present in major releases instead of "stable updates"?

They have their reasoning behind this scheme over here: http://blog.llvm.org/2016/12/llvms-new-versioning-scheme.htm...

I personally do not agree with their line of reasoning.

OK, so indeed, no features in between major releases.

It's also somewhat unfortunate that, in their words, "every [six month] release is also API breaking". How can you create a stable product that targets a constantly breaking API (short of picking a version and sticking with it)?

Of course, I'm a biased, since I consider stable to be measured in years, not months; certainly not the current trend.

But major releases are always expected to be API-breaking, right? Isn't that basically the (SemVer, at least) definition of a major vs minor release?

Nothing's forcing anyone to keep up to date, though, so anyone can pick a version and stick with it as long as they like. (So long as they keep making patches for at least the previous version for major bugs...)

That results in half a dozen versions of LLVM libraries installed on a given machine instead of 1.

http://llvm.org/demo/

Demo page is not working. Is there any other page that makes me understand what really is it and where it is helpful.

LLVM is a compiler toolkit, used by, for example, Clang for C/C++/Objective-C, Rust, and various libraries like Mesa.

I wish they'd do what GCC does and just eliminate the middle number entirely.

But semver is a pretty meaningful standard. Why not stick to it, even if you don't plan on adding non-API-breaking new features?

Ha, was just reading http://aosabook.org/en/llvm.html.

(Really like that LLVM IR. Does anyone code in it directly? Was also thinking it would be interesting to port Knuth's MMIX examples to it.)

If I recall correctly, the textual IR is unstable between releases. They recommend building ASTs with their APIs instead.

The way I understand it, in the near future, the folk who write languages/compilers/build toolkits will be writing down to LLVM's Intermediate Representation rather than down to some specific machine code for some specific chipset.

Then the folk whom build the processors/architecture/assembly languages (and open source advocates when businesses ultimately ignore LLVM for a bit) will be writing the conversion from IR down to some specific machine code. This allows Intel to convert the IR line "COMPLICATEDFUNCTION $r2 5" to some advanced x86 call that has some significant speed or memory increase and is only one line while TI can still call the 26 lines of MIPS they need to call and both will be semantically equivalent.

That way you can, from the the software side, be relatively agnostic with how you're writing code and the chipset side is able to get every ounce of optimization using advanced functionality (if available). More importantly, software side is then able to ignore any processor features (be they speed ups or slowdowns) because every chipset manufacturer should have some toolset to convert your IR down to the best machine code they've got based on some desires (speed, space, power, etc). Inevitably, there will still be differences in performance between chipsets but being able to build down to IR (even with performance issues for some chipsets while everyone gets on board) and not some large set of assembly languages should be very nice.

Just as additional information, this is just the future catching up with the past.

Most mainframe architectures since the early 60's, didn't really used pure Assembly, rather bytecodes that were processed by microcode on the CPUs.

Hence why many old papers tend to mix both terms, bytecodes vs assembly.

This tradition carried on to mainframe architectures like the AS/400, nowadays IBM i, where the user space is pure bytecode (even C), called TIMI, and a kernel level JIT is used at installation time to convert the application into the actual machine code.

IBM i Java takes advantage of this, where the JVM bytecodes are converted into TIMI bytecodes.

It also provides some kind of common language runtime called ILE (Integrated Language Environment).

So the trend of using LLVM bitcode on iDevices, Dalvik on Android or MSIL on .NET, JavaScript/WebAssembly on browsers, is similarly with containers, modern computing catching up with mainframe ideas.

What differentiates LLVM IR from, say, JVM bytecode? I'm curious because there's a stalled out GNU project under GCC called GCJ that would compile JVM bytecode to native. I wonder if the issue became that statically linking in the JVM in the binary resulted in a lot of bloat, or something more intrinsic to the suitability of JVM bytecode as a platform-independent IR...

reply


Coding directly in LLVM IR is tedious. It's supposed to be used via the AST building APIs.

LLVM Coroutines - This is the most exciting thing for me. Gor Nishanov in his videos explains how coroutines are implemented and how are optimized by LLVM. Asynchronous IO code will be so easy to write and so efficient. Context switch in cost of function call, you can have billions of those coroutines, heap allocation elision (in certain cases). Can't wait for coroutines to land in Clang.

I am a big fan of Go gorutines so Networking TS and Coroutines TS made me very happy, connecting both and having it in standard will be great. Just a shame that for Networking TS integration we will need to wait for C++20.

It would be interesting to try these in some functional language backend, say, in GHC or Ocaml.

Here's the documentation: http://llvm.org/docs/Coroutines.html

