Hacker News new | past | comments | ask | show | jobs | submit login
FreeBSD has a(nother) new C compiler: Intel oneAPI DPC++/C++ (briancallahan.net)
92 points by ingve 11 months ago | hide | past | favorite | 35 comments



Good luck with that, cause there are bugs, as the article also points out.

Previous release miscompiled Python [1]

Current release miscompiles bison [2]

It's also the type of compiler that defaults to -ffast-math to look good in benchmarks (well, all vendor compilers do). That breaks many packages that rely on signalling NaNs, or use e.g. Kahan summation for stable math with IEEE floats.

It's caught in the Spack package manager because it's probably one of the few package managers that lets users pick their own compiler for a full software stack.

[1] https://github.com/spack/spack/issues/38724 and https://github.com/python/cpython/issues/106424

[2] https://github.com/spack/spack/issues/37172#issuecomment-181...


So it's an LLVM front end + Intel backend, relatively new and not in common use. Not surprised there are issues, but wouldn't it be fair to attribute those to "growing pains"?

First issue was resolved, the bison issue is still pending.

> It was a compiler bug. The test now passes when cpython is built with an internal build. The next compiler release (likely 2024.0) should have the fix.

So I wouldn't recommend running FreeBSD on oneAPI in production just yet, but I give them some credit & celebrations for reaching a milestone.


It does not look like correctness is their main concern, they mostly care about performance of the vectorizer ;)

Also note that before Intel oneAPI there was the now deprecated Intel Classic Compiler (icc), which also had bugs, since 2003.

In my opinion it's unlikely that a closed source vendor compiler will ever be on the same level as gcc and clang.

It will never be tested as much, and it's hard to make actionable bug reports for closed source software.

Being based on LLVM is of course a good choice... hopefully fewer bugs.


My money's on llvm/clang and gcc as well, though Microsoft Visual C++ is still hanging in there.

Fwiw icc once had a reputation for being fast but I never thought high of their software. Only hands on experience I had was with Intel Media SDK where even basic stuff, like finding a legal download, hardware requirements and installation guidance was tough. And then each iteration both software and hardware was deprecated faster than Google can dream to abandon their products.


> Fwiw icc once had a reputation for being fast but I never thought high of their software.

Aside from some GUI scaling issues on Linux, I was really impressed by Intel Vtune.

Also, some of their low level libraries like Xed and Pin Tools are pretty cool, and seemed (last I looked) well-maintained.


Their Embree raytracing library is at the heart of pretty much every high-end production CGI renderer, it's the gold standard at what it does. Intel does have some gems.


Ah yes. VTune. I had forgotton that one, pretty good stuff indeed.


A really high performance vectorizer is dramatically easier to write if you hand wave some of the correctness ideas. It's an interesting point in the design space. Classically compilers considered miscompilation to be a serious problem - you disable the 'optimisation' while you work out who got the maths wrong. But maybe today you're better off with a compiler that produces fast wrong code where the customer doesn't seem to be upset about the wrong part. ffast-math is popular, enabling that by default is a plausible thing to do.

Plenty of bugs in gcc and llvm. I'm really sure a closed source compiler could be higher quality. Probably not in the C++ world, but I expect there's an ADA compiler out there that gets the answer right whatever the source code is.


Sure, you can reorder floating point operations and assume no aliasing. Now you can vectorize and avoid redundant memory access.

But it's a problem of the language really.

Alias analysis in Fortran is much easier than in C / C++.

Also I'm reasonably sure that Rust's borrow checker could be used for alias analysis and improve performance without making unsafe assumptions, no clue if that is actually done.

For reordering of floating point numbers, there should just be clear demarcations in code where it's safe. It's ridiculous that `-ffast-math` works on the level of the command line. But I think that ship has sailed, cause although clang has supports pragma's, it's not a standard:

    #pragma clang fp exceptions(strict)


The semantics of C don't allow for alias analysis. For some cases, the compiler can prove that there are no aliases, but it will have to be conservative and throw up its hands most of the time.

An ownership/borrowing system is able to implicitly fulfill the requirements of restrict.


Alias analysis can definitely be easier/harder for different languages. It's the independence of loop iterations which really does the trick for vectorising numerical code. That would be why cuda is magic - if the lanes of the vector operations step on each other, it is defined to be the programmers problem, not the compiler. Thus cuda autovectoriser cannot fail, the code is already a vector.

Futhark is probably very good at vector operations.

Fast-math has crazy things like setting it for one file affects other files. Last time I checked a more reasonable alternative set of constructs were being considered/implemented. Other faster-but-kindof-wrong things like assuming signed integers don't overflow exist too. Agreed that they're all less alarming when locally scoped.


It works at least well enough to build a working FreeBSD kernel, as the article notes that was built and then booted as a test. I have to imagine if the thing can build that its doing pretty well. That isn't a trivial piece of software. Makes me suspect any teething issues will be sorted and that it's not fundamentally compromised.


> It's also the type of compiler that defaults to -ffast-math to look good in benchmarks

Not sure if they're still doing that, but in the early days of Clear Linux they were using -ffast-math on everything they compiled, as a way to give their distro a big performance boost over competition, too.


I think they do a lot of good stuff, like LTO and PGO.

But in benchmarks you sometimes see like a 4x speedup compared to ubuntu, which is obviously not due to superior compilers.

For example:

https://github.com/phoronix-test-suite/phoronix-test-suite/i...


> There is a workaround: I installed the compiler using the same command on my WSL

Actually you only need to have a relatively complete linux under /compat/linux. You could even chroot into that and do the oneapi install.

The trillion dollar question is: can you get the intel discrete gpu to work, compiling for and running on the gpu?


And the billion dollar question is: does the code generator treat AMD / Intel CPUs as equivalent when generating code or does it only look at Intel CPU-specific information for generating code?

see https://techreport.com/news/does-intels-compiler-cripple-amd...

[update] and https://old.reddit.com/r/Amd/comments/440ze8/amd_cpus_and_in...


I haven't tried it myself, but you can use the Zig toolchain as a C and C++ compiler[0]. In this day and age, is it really necessary to have yet another C compiler? I believe you can do the same with D as well.

[0]https://medium.com/@edlyuu/zig-c-c-compiler-wtf-is-zig-c-790...


The value of Intel's oneAPI tool chain is that you can write code that runs on their GPU, much like Nvidia's cuda or nvhpc.


One might almost guess from the name that it is a C++ compiler, and not so limited as to compile only the legacy C subset.


The year is 2024, and I'm using another newer, buggier C compiler.


The year is 2074, I'm still using gcc and emacs :)


!remindme 50 years


Serious question, why do we need yet another C compiler?


Different compilers rely in different UBs for code optimization. So, it is very useful to compile OSS projects with different compilers, it helps find more bugs and make sources more standard-complaint and portable


Eventually we may get one that works properly and supports useful features like bounds checking, stacks that grow upward rather than downward, trapping on undefined behavior, etc.

We might even get a new and more robust libc - maybe one that even supports static compilation.


Not to crap on someone's hard work, but usually reinventing something is only warranted if you can do it >2x better than the best available solution. Whether that means requiring half as much hardware to do the same job or requiring half as much manpower to use, usually it's a waste of time and money to make something that's only marginally better at best.

How much benefit did Intel expect to reap from rewriting an LLVM x64 compiler backend from the ground up?


The Intel compiler has a long history, at least going back to 2003.[1] At some point they realized that instead of maintaining their own frontend they can just use LLVM. So they ported their optimizers and backend over.

And on their own hardware the compiler is often significantly better.[2] This was part of their competitive advantage.

2x better is also a lot. It means that you only have to buy half as many servers. Even much smaller improvements are often worthwhile.

[1]: https://en.wikipedia.org/wiki/Intel_C%2B%2B_Compiler#Release... [2]: https://www.intel.com/content/www/us/en/developer/tools/onea...


> 2x better is also a lot.

I specifically drew the line at 2x.

>It means that you only have to buy half as many servers.

That's what I said in my initial comment: "Whether that means requiring half as much hardware to do the same job or requiring half as much manpower to use..."

But even if I didn't already point that out, I'm curious how you would react in your everyday life if all the people you interact with were as insufferably pedantic as you.


Let's look at a specific example: The Aurora Argonne National Laboratory.It has an estimated cost of US$500 million and a power draw of 24.6 MW.[1] Even 1% less hardware means saving several million dollars. And that is just a single system.

[1]: https://en.wikipedia.org/wiki/Aurora_(supercomputer)


Isn't this a GPU backend, not an x64 one?


The backend builds both x86_64 machine code, and GPU kernels/FPGA microcode according to their documentation, but that just begs another question:

How is this significantly better than the existing tools for compiling C into GPU and FPGA code?

It would help if Intel published some kind of chalk talk with a live demo showing how much faster you can build HPC applications using their new toolkit.

I'm not summarily writing it off, but I need a little convincing before I put 20+ hours into trying it out myself.


Yes, though the llvm x64 backend may turn out to be very similar to the upstream one. Existing tools for compiling C to Intel's GPU are this tool, there are no others.


Is there anything more to it than that? If not, the documentation would be a lot more helpful if it lead with something straight to the point. Here's something that could go directly under the title of the README:

"This compiler consists of a custom LLVM frontend and backend. The backend compiles LLVM IR code into machine code consisting of x86_64 instructions and Intel GPU code. The frontend works in conjunction with the backend to compile C and C++ code with special optimization which when enabled, compiles OpenMP routines into hardware-accelerated code targeting Intel GPUs, FPGAs, or AMD and NVIDIA GPUs."

As someone who only used OpenMP academically, I don't see much of a point in that. In the post C++11 world, where we can write type-safe compile-time code, preprocessor macro definitions should stay in C code.

Until Intel GPUs are at least competitive with the big boys, interop with their products doesn't concern me a whole hell of a lot. I'm not going to plan my scientific computing applications around the integrated graphics found on cheap Wintel consumer devices.


The docs say it's a proprietary compiler for intel hardware. I'm inclined to believe it on that.

It's worth noting that OpenMP pragmas are a totally different thing to C preprocessor macros. A pragma like 'omp target parallel for' means something like "take the following loop, build a GPU kernel out of it, arrange for data to be copied back and forth and to launch that kernel when control flow gets here, and arrange to link in all the openmp libraries and also run a bunch of compiler optimisations". A macro means "replace these tokens with these other ones".

OpenMP is essentially a really big runtime library dealing with threads, scheduling execution, running code on GPUs and so forth. It is sort-of usable in that form. If you're determined then making calls directly into libomp.so and libomptarget.so will make your will a reality. All the pragma syntax is about transforming application code into a lot of calls into that library with appropriately constructed tables of data. And then the compiler works hard to optimise this, e.g. removing calls that don't need to happen, simplifying others, deduplicating some.

Syntactically OpenMP is a really good fit for Fortran. The invocations look completely appropriate there. For C++, it does tend to upset the sensibilities of the programmers. I personally think it's wildly funny for people who are content with the syntactic horror show of C++ to decide the OpenMP extensions are ugly but there we go, normalisation of deviance and all that.

On a more philosophical level, and what drew me to implementing OpenMP originally, CUDA is a problem. Not only in the vendor lock to nvidia sense - it's also a deeply nasty language to program with. I especially dislike the warp intrinsics - they take a bitmap corresponding to the CFG of your program, which you are supposed to compute manually (across branches, loops and so forth) and pass around into library functions. GPUs are excellent machines and I want to be able to program them in something which is not CUDA.


Ok, now I'm learnding some interesting and/or valuable shit.

I'm familiar with compiler intrinisics (e.g. __sync_add_and_fetch), but I just assumed (incorrectly) that "#pragma omp_parallel_for" was just a macro that adds pthread API calls into a for loop to create new threads and join when finished.

>Syntactically OpenMP is a really good fit for Fortran.

I can get on board with Fortran for the niche of scientific computing, although again my qualms are with using it in C++. Too many people say they know C++, but then write "C++" code with raw pointers. I don't use C++17 for performance, and most "zero-cost abstractions" are a lie; I use it for type safety. If you buy into the modern C++ way, you'll catch a lot of stuff at compile time that systems programmers using C and web devs using a litany of other weakly or dynamically-typed languages catch in their production environment.

>syntactic horror show of C++

Other than "string" not being a native type, I'd reckon what you really hate is not the syntax itself, but the compiler errors. Granted, if you hire the kind of people who post on Stack Overflow, you can get wacky shit like this:

template<typename Testicle, typename... Diseases> static std::optional<std::tuple<Diseases...>> deeply::nested::namespaced::classes::suck_balls(const Testicle& left_nut, Testicle&& right_nut) noexcept; // Is "classes" a class or another namespace?

But I've had to fix other peoples' spaghetti code in 4 other languages, so I stopped blaming the language many moons ago.

>what drew me to implementing OpenMP originally, CUDA is a problem

Did you try Vulkan Compute, and if so what problems did you run into? 200+ lines of "setup" code, similar to OpenCL programming?

I ask because the entirety of my systems programming career was not speeding up number crunching, but reducing IPC and making things run asynchronously.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: