Hacker News new | past | comments | ask | show | jobs | submit login
Adding ANSI C11 C compiler to D so it can import and compile C files directly (github.com/dlang)
165 points by ingve on May 10, 2021 | hide | past | favorite | 105 comments



Context: Walter Bright created the first native C++ compiler, Zortech C++. This pull request actually implements its own C compiler! For Walter, considering he's implemented C++98 alone, implementing C must be nothing :)

He implemented the ANSI C compiler here in just 5 days.

Semi-related project, using libclang to import C++ directly (with C++ templates and all the other features, regular D has C++ linking and exception support built-in): https://github.com/Syniurge/Calypso

Some quotes from the forum https://forum.dlang.org/post/s7a37h$2bbv$1@digitalmars.com :

- Writing a C preprocessor is a nightmare. Fortunately, I already did that https://github.com/facebookarchive/warp and we can use it if we choose.

- The D code will be able to inline C code, and even CTFE it.

- There are a lot of wacky C extensions out there. We only need to implement currently used ones (not the 16 bit stuff), and only the stuff that appears in C headers.

- Without a C compiler, we're stuck with, wedded to, and beholden to libclang. I wouldn't be surprised that the eventual cost of adapting ourselves to libclang will exceed the cost of doing our own C compiler.


>ANSI C compiler here in just 5 days

This is good work but let's be realistic - it doesn't work yet on any real C code I've tried it with.


Hey, fellow Dlanger!

Yeah, it doesn't support C headers yet. But an excellent prototype nonetheless.


> Hey, fellow Dlanger!

Lol, he is a paid hype man for the Dlang community, and you should ask him about Symmetry Investments and if they are hiring.


> and you should ask him about Symmetry Investments and if they are hiring.

I'm starting in July.


That's great man, congratulations! People did not take my comment as the joke it was intended as.


Congratulations


That was quick.


But dlang has a long problem of change constantly without backward compatibility and a lot of projects just die because of it, for example the mentioned https://github.com/facebookarchive/warp :

====

make

dmd -c -O -inline -release -ofwarp.o cmdline.d constexpr.d context.d directive.d expanded.d file.d id.d lexer.d loc.d macros.d main.d number.d outdeps.d ranges.d skip.d sources.d stringlit.d textbuf.d charclass.d

context.d(222): Error: Implicit string concatenation is error-prone and disallowed in D

context.d(222): Use the explicit syntax instead (concatenating literals is @nogc): "# 1 \"%2$s//\"\x0a" ~ "# 1 \"<command-line>\"\x0a"

context.d(223): Error: Implicit string concatenation is error-prone and disallowed in D

context.d(223): Use the explicit syntax instead (concatenating literals is @nogc): "# 1 \"<command-line>\"\x0a" ~ "# 1 \"%1$s\"\x0a"

macros.d(329): Error: Built-in hex string literals are obsolete, use std.conv.hexString!"FD 61 FD" instead.

macros.d(333): Error: Built-in hex string literals are obsolete, use std.conv.hexString!"FD" instead.

macros.d(338): Error: Built-in hex string literals are obsolete, use std.conv.hexString!"FD 61 62 FD FD" instead.

ranges.d(71): Error: Implicit string concatenation is error-prone and disallowed in D

ranges.d(71): Use the explicit syntax instead (concatenating literals is @nogc): "Buffer overflowed. Possibly caused by forgetting to " ~ "complete a git merge in your code."

Makefile:44: recipe for target 'warp.o' failed

make: ** [warp.o] Error 1

====


> die because of it

That isn't why Warp was discontinued. (My last change to Warp was 7 years ago.) Even with my C code, it's unsurprising to have to update a few things after 7 years.

My version of the repository is:

https://github.com/DigitalMars/dmpp

and compiles with DMD 2.067. I'll see about updating it later today. Stay tuned!


Updated!


Those are 8 errors, all from 1 language change, which can be fixed by inserting 8 characters.


After following the compiler suggestions (several times more):

====

make

dmd -c -O -inline -release -ofwarp.o cmdline.d constexpr.d context.d directive.d expanded.d file.d id.d lexer.d loc.d macros.d main.d number.d outdeps.d ranges.d skip.d sources.d stringlit.d textbuf.d charclass.d

util.d(20): Error: function `loc.Loc.write(File* f)` is not callable using argument types `(File function() nothrow @nogc @property ref @system)`

util.d(20): cannot pass argument `& makeGlobal` of type `File function() nothrow @nogc @property ref @system` to parameter `File* f`

directive.d(470): Error: template instance `util.err_warning!(string, string)` error instantiating

context.d(264): instantiated from here: `parseDirective!(Lexer!(Context!(LockingTextWriter)))`

main.d(39): instantiated from here: `Context!(LockingTextWriter)`

Makefile:44: recipe for target 'warp.o' failed

make: * [warp.o] Error 1

====


Less than 30 line changes to get it to compile with latest DMD. https://github.com/DigitalMars/dmpp/commit/a9b879983130cf3a3...

Small price to pay for an elegant language.


Put () after the stderr and it'll compile.


Adding the ability to just read header files and all the definitions in them without having to translate.

I saw this addition in zig and thought that it was amazing, a lot of languages have this need to create definitions in their own language, C# with [DllImport](https://docs.microsoft.com/en-us/dotnet/api/system.runtime.i...), [crystal with lib](https://crystal-lang.org/reference/syntax_and_semantics/c_bi...), I think since C is so ubiquitous and being able to just consume header files just removes so much boilerplate creation that is easy to get wrong.

zig also decided to just call an existing compiler (clang) and if you do zig cc --version then zig thinks it is clang, because the zig compiler just calls into clang main. D has gone with the route of calling gcc.


> D has gone with the route of calling gcc.

Not exactly. D is going the route of actually implementing a C compiler in the D compiler. Here's the C parser:

https://github.com/dlang/dmd/pull/12507/files#diff-2fc171ca9...

You could argue that writing a C compiler instead of using libclang is an indicator of brain-damage, but I lay out my reasons here:

https://digitalmars.com/d/archives/digitalmars/D/Add_ImportC...

And besides, C is such a simple language it's kinda fun.


> 9. Without a C compiler, we're stuck with, wedded to, and beholden to libclang.

> I wouldn't be surprised that the eventual cost of adapting ourselves to

> libclang will exceed the cost of doing our own C compiler.

This is a really insightful point. I had to learn this the hard way :)

We might follow your lead on this, as we have done with so many other great ideas implemented in D.

Ironically, Vexu started from the other side as you, with the preprocessor mostly done, but the backend left to-do: https://github.com/Vexu/arocc

One thing that might make libclang worth the cost, however, is its ability to compile C++ code as well. On Zig's end of things, all we have to do is provide libcxx, libcxxabi, libunwind, compiler-rt, and linking, and then libclang is really pulling a lot of weight by compiling C++ code into object files. Sadly this ability is just too useful in practice to ignore. For example, LLVM itself is C++ so if Zig wants to be able to bootstrap itself, it needs this capability.

Still, I think your maneuver here is the best long-term approach to tackle this problem, and I imagine as time goes on we'll start to migrate towards D's solution here. Maybe someday the Zig distribution that does not have LLVM extensions enabled will be the more popular one.

I'll be watching the evolution of this new feature in D with great interest!


I agree that this technique of writing my own C compiler isn't going to work so well with C++:

1. Even though I have written a C++ compiler, it's a C++98 compiler and C++, like the Winchester House, accretes new features constantly, meaning keeping it up to date is impractical.

2. C++ semantics are different enough from D that it is impractical to mechanically map C++ to D. You can do bits and pieces, and dpp does that, but seamlessly doing the bulk of it isn't going to work. (C can be mapped nearly 100% to D.)

dpp uses libclang to try and map what it can to D.


So how extremely could I exploit this? Could I write a program entirely in C, compile it with the D compiler, and use D's unit test framework? That would be insane (very good insane, obviously).


That is a great question. The point of ImportC is to import the declarations from a C file. But it turns out you can simply give the C file on the command line, and it'll compile it and make an executable:

    echo hello.c
    int printf(const char*,...);
    int main() { printf("hello world\n"); }

    dmd hello.c

    ./hello
    hello world
But then there's the question "should D features be allowed in the C code?" This is a very subversive question. Is it really C if that is allowed? Already, the C code allows for Compile Time Function Execution, which comes for free by passing the C code through the D front end. You can see it in action in the test cases in this file:

https://github.com/dlang/dmd/pull/12507/files#diff-b94776a02...

Scroll down a bit and you'll see it executes C functions at compile time. I decided to flirt with heresy and leave this in, because it sure makes adding tests convenient!

I'm less sure about adding unittests to the C compiler. Certainly, you can always add the unittests to the D code that is importing the C code. This will probably hold the charges of abomination to a minimum.


> Already, the C code allows for Compile Time Function Execution, which comes for free by passing the C code through the D front end.

Oh my good gosh. D and C could be seamless together. This is great.

> I decided to flirt with heresy and leave this in

I'd love to see more C language extensions in ImportC.


Do you produce a D ~object from c source which is then processed by dmd ?


It produces a D AST from C syntax.


congrats! C is simple sure, in some ways.. but it is the complete standard library that makes it behave like "C" for writing programs. This was by design of course


You are a beast


yes, Java is adding jextract [1] as a replacement for JNI (Java Native Interface) which directly consumes header files using clang and produces a jar wrapping the native calls.

It's not bolted in the compiler so less nice than Zig but the jar produced by jextract is not ABI dependent so more portable. The ABI dependent code is generated by the JIT using the the Foreign Linker API (the equivalent of C# DllImport) at runtime.

[1] https://inside.java/2020/10/06/jextract/


Actually the equivalent of @dll.import from J++, funny how things turn out.

https://docs.microsoft.com/en-us/previous-versions/visualstu...


That code also contains a nice C parser that is handy, backed by LLVM. I recently switched from JNA to Panama for a project, and so far no regrets, other than having to mess with the Java module system.


It is already stable enough?


Too early to tell. It clearly has received more care & attention so far than all the alternatives, so I’m not too worried about it being in „incubator“ status. The project lead knows his shit.


Scheme is a language where I wish this was part of the spec. So many Schemes are intended to be embedded or extended with C, and every one of them has a bespoke way of declaring C bindings and martialling data between C and Scheme.


There are ongoing efforts to stabilise a SRFI (like SRFI198 [0]), which would be the first step towards this. (First get the SRFI, then once that's stable and well implemented, it can be considered for the next spec).

Unfortunately there's a lot of disagreement between the bigger players as to how this should be handled:

> > 1) The authors appear to be deadlocked

> No need to soften it with an "appear", we are intractably deadlocked.

[0] https://srfi.schemers.org/srfi-198/srfi-198.html


I agree.

to bystanders, odd autocorrect: it's marshalling. marshalling data. although fight and war are indeed also appropriate associations for binary data exchange (struct conversions) between languages.


TIL martial and marshal are unrelated etymologically, though marshalling data is by analogy with marshalling troops, and related to the military office.

Marshal = mariscalcus = "horse servant", compare mare and seneschal. Martial = from Mars, god of war.

https://en.wiktionary.org/wiki/marshal


TIL: marshalling == ceremonial arrangement and management of a gathering

these CS gals of olde really had a hand for naming :-)


I often court martial my data.

Not mocking the spelling of the previous post; it's a strange term anyway and in this instance either spelling works.


`zig cc` makes Zig into a more convenient C cross-compiler than upstream Clang!


Why is it more convenient?


Zig can cross compile and link libc on lots of different architectures out of the box, which is amazing given how much effort this takes to set up cross compilers normally.


saurik's reply to your comment feels to me like the epitome of the seasoned C/C++ developer _who is able and willing_ to put up with the tooling setup pains that make C/C++ a non-starter to the "average" developer today.

You can absolutely set up a cross-compiling C/C++ toolchain today as he said - but you're going to have to find the right packages in your distro, deal with the MacOS linking issue, get a copy of the android NDK, make sure you're installing the right package versions, muck around with libc, etc. That doesn't make for a delightful start to a new weekend project.

It's often underestimated just how substantial "setup pains" are IMO. I suspect most people who have a strong distaste for C/C++ mostly despise the tooling and installation hell.

The novel part of what Zig, (and maybe Rust/Go to a lesser extent) are doing with cross-compilation is not 'making cross compilation possible' - it's having 'the best experience' out of the box, with zero installation pain.

Zig has a ton of issues and is super early stages still.

But shipping out-of-the-box with a Zig/C/C++ cross-compiler, with libc, static libc(musl), compiler-rt, libunwind, libcxx, etc. for ~50 cross-compilation targets, and solving the MacOS cross-compilation linking issue (with their customer linker, zld) is IMO a huge deal because it lowers the barrier to entry to systems programming quite substantially.


> saurik's reply to your comment feels to me like the epitome of the seasoned C/C++ developer _who is able and willing_ to put up with the tooling setup pains that make C/C++ a non-starter to the "average" developer today.

For me the best example I currently have of such attitude is how an internal WinDev group managed to push for getting C++/CX replaced with C++/WinRT without it being production ready.

After 4 years, still no Visual Studio support for IDL files, one has to manually copy/merge generated files into the Visual Studio project, no two way editing with XAML files, we get continuously told that it only will be fixed when ISO C++ gets compile time reflection.

Apparently they feel that is a productive experience and never came close to try out either C++ Builder or QtDesigner.

I guess they were still using raw C to deal with COM to consider this kind of downgrade as something positive.


FWIW, this isn't actually difficult: you just need a sysroot, and past that you can just use clang (maybe ironically, for any platform other than Apple ones, as there still isn't a good common linker for mach-O; like, this still works fine, but it is in fact very annoying--but not impossible--to compile a macOS binary from any platform other than macOS). To cross-compile to Linux you can just download a handful of rpm files (the kernel and libc header packages, as well as the gcc package for the CRT) from CentOS or deb files from Ubuntu and extract them to a folder, and for Windows you just need a handful of Msys2 packages (the same general set). That's really all you need: my current major project--Orchid--compiles for Linux, macOS, iOS, Android, and Windows, and does so using the Android NDK copy of clang (just because it is easy to install and easy to ensure us always "the same" on every machine: I do cross-compiled reproducible builds). As long as you are using C/C++ it really should be easy to cross-compile (and yes: this is saurik saying something not only good about clang but specifically saying they finally nailed this use case, which they really hadn't a while back when people kept touting this as a feature... they really did eventually nail it). (That said, I do appreciate that not having to download and extract those packages is certainly "more convenient" than doing anything at all, but I just want to point out that it really isn't that "much effort".)


This sounds a lot more difficult than using Zig where you just need to find the right build flag. Maybe it's easy once you already know which exact packages to get for each platform, where to put them and how to tell clang about them. But if you don't know this, researching this can easily take hours, at least for me.


your solution would not support:

* specifying target cpu features (important!)

* features such as thread sanitizer support

* LTO that crosses the libc boundary

* targeting any glibc version

...all of which are supported out of the box by zig.


1. Do you have a blog post with exact instructions for this for some of the platforms? 2. Does this include targeting any gnu libc version (backwards compatible)? This would be very surprising to me.


The macOS D compiler cross-compiles to iOS, x86_64, and arm64. It just ships with 3 standard libraries built for those arch.


LuaJIT was "good enough" at this that it worked pretty well for me -- mostly for automatically binding to headers from my own code where I was careful to avoid some of the potential gotchas. With a macro the header was stringified and sent to LuaJIT; could've also had a build step. eg: https://github.com/nikki93/cgame/blob/master/src/sprite.h#L9

An added benefit is you get ffi-reflect -- https://github.com/corsix/ffi-reflect -- over the resulting types / functions...


Not that this is terribly relevant to the discussion, but back in the day, I did try compiling bzflag with zig cc and then with clang, and the first executable crashed, while the second worked.

Despite being told numerous compile-time options to use by different manuals and documents and people on IRC, nothing appeared to change the outcome. It might be due to some undefined behaviour in bzflag, of course, but it made zig cc a non-starter for C.


This is due to zig enabling clangs UBSAN by default in unoptimized builds. This can be disabled with `-fno-sanitize=undefined` or by enabling optimizations with `-O2` or `-O3`.

https://github.com/ziglang/zig/issues/4830#issuecomment-6054...


Well, as mentioned, I got a whole list of different compile-time options to try - no-this and no-that, but the executable still crashed in the end.


Please be specific with version numbers. Did you file an upstream bug already? Do you mean https://github.com/BZFlag-Dev/bzflag ? Thats c++ and must be compiled with zig c++.


Ah, well, it must have been zig c++, then, necessarily. Was I planning to file an upstream bug? I was unaware. I did contact them about it, actually, but it wasn't solved at the time.


Apropos of nothing, but found out that the zig compiler can run C++: https://twitter.com/andy_kelley/status/1391468499676008451?s...


Wow that’s really convenient. I was just thinking of how nice it would be to have that feature literally hours ago while working on a side project involving a lot of D/C interop.

Major props to Walter and the rest of the D developers for being so seemingly in tune with what people are looking for in a language.


I always said this, C is so huge that it make no sense to try to "replace" it, you'd better make sure you can easily consume that ecosystem

That's a step in the right direction! I love D!


True, but we can still try to fix it somehow, because certain eco-systems will never move beyond C and it would be nice not to have to worry about its famous issues as we do to this day.


Rewrite it all in Rust! I mean D! Or... Zig maybe? Lots of valid IOT languages, but how many are actually in refrigerators or mice or keyboards or car door oepners? That is a lot of inertia.


My point was fixing C, the ecosystems that value it after 50 years aren't going to rewrite their code into something else, as much as many of us would like it to happen.


Yes, I was agreeing with you and forgot to tag my snarkyness as such.


Ah, sorry if I missed then.

Here is one example from Apple in that regard, search for "Memory safe iBoot implementation".

https://manuals.info.apple.com/MANUALS/1000/MA1902/en_US/app...


> C is so huge

Are you sure you're not thinking about C++? C is a small language.


I think by being huge he/she means C is very popular and there is so much code out there already written in C.


In the sense of its ecosystem and current usage.


A bit more discussion on the hybrid-forum-and-NNTP: https://forum.dlang.org/thread/s79ib1$1122$1@digitalmars.com


Since C90 the standard has been ISO not ANSI. C89 was the last ANSI C. Then ISO adopted essentially that same language as C90, and C95 C99 etc have all been ISO standards.


Thanks. I renamed it to ISO in the documentation.


For me this makes Dlang a very obvious choice when it comes to code that calls C libraries, where I might have previously used C++.


I remember adding TinyCC to my C++ projects, so I could have some kind of a scripting language into a large C++ project that is fastidious to build.

I a way I'd rather use C than lua. I wish I could easily integrate python as a scripting language, but python is slow and its runtime is heavy. TinyCC, however, is very light.


I agree with you. I like how easy it is to embed Lua into an application, but as a language I really prefer python. Sadly it is not as easy to integrate the python interpreter into a project.


Was there a problem with Micropython?


Not sure if it's possible to easily integrate micropython into a C++ project, because micropython doesn't seem to be designed for this.


Walter here. AMA.


No question, just thanks for your hardwork and D contributor's. Diversity, competition and choice are important when it comes to programming languages.


Hello Walter, sorry for the (mostly) off-topic question, but is there still hope for having a variant of the standard lib that can be used w/o the garbage collector?


There is real momentum behind Phobos v2 now, can't give any promises about a timeline, but we're thinking about it, and Andrei is back in the fray now.

The standard library won't be a variant you can use without the GC but rather one that is allocator agnostic wherever reasonable


Do you think there's tension in Dlang trying be both a systems language like C++ AND a high level garbage collected language in C#?

Which is more important?



I'm somewhat in agreement.

There are a lot of people in 2021 who - for whatever reason - feel like they need to program personal computers without using GC. As you point history shows that's a lot less true than people think, but nevertheless - that is the sentiment.

Dlang tries spends a lot of effort trying to attract these people, by making more and more of the stdlib nogc, having @nogc annotations, experimenting with deterministic manual memory management etc etc. But I feel like they'll never convert most of them.


Meanwhile C# and Go keep being improved to play on the same place that D wants to be.

I also agree with the sentiment, the language issues with some not so well worked out features should be fixed, instead of trying to please everyone.


"... the language issues with some not so well worked out features should be fixed, instead of trying to please everyone."

This. For average programmers like myself, latest and fanciest language features are of little value. C++ and (unfortunately) D are in the same quest: to incorporate every programing language paradigm available, the extra complexity be damned. In the specific case of D, I'd rather have a good assortment of native D libraries than more bells and whistles added to the language.


Obviously I'm not Walter, but IMO, D's approach to GC does not create any tension. It's just a systems language with a GC that's there if you want to use it. You can disable it completely or only where you need to. Yes, you lose some features when you do so (like automanaged memory for dynamic array operations), but the tradeoffs are well-known and worth the loss when you need to make them. Consider WekaIO, which used D for their filesystem. No GC anywhere:

https://dlang.org/blog/2018/12/04/interview-liran-zvibel-of-...


I think there's more to the tension than GC. GC is the red herring that everyone jumps to when it comes to such comparisons, but I think there are many more issues that could cause tensions. For example OOP features, inheritance, properties, that kind of things. For system programmer oriented folks these features are usually considered bloat, more high-level/application programmers consider them very valuable features.

For me currently D provides a good level of balance between those worlds, but the aforementioned GC isn't an issue for me at all so I have it easy.


Depends, the future could already have been here, instead of playing catchup with the past.

"Eric Bier Demonstrates Cedar"

https://www.youtube.com/watch?v=z_dt7NG38V4


If you use D across your organization, you can use the GC for internal tools and scripts, while keeping high-performance product code GC-free. So it's not really a hindrance, as you don't have to use different languages.


This!

I think D should latch and promote more on the idea of having on-demand performance when it's necessary with GC convenience for the rest of the application. Soon having borrow checker capability ala rust and cyclone will only strengthen the narrative.


That's what we already do however the popularity of things we thought were obvious implies we need to market D better.


I usually write D code using both GC and manually managed memory. Tension between the two is like which is better, a socket wrench or an end wrench? Most of the time you can make either wrench work, but sometimes ya just need one flavor. Even a clawfoot wrench now and then!


Thanks for all your work, screenshot of my 2000's era DigitalMars CD :D -> https://imgur.com/a/SwlpZBR ( sorry for the gorilla hands )


I'll be starting my new DigitalMarsCoin soon, and rather than being a digital currency, it will rely on original Digital Mars CDs. Hold on to those CDs! The value will surge!


Is there a comparison with the other approaches, like htod? The PR describes them but does not say why this is better.


I've found over and over that building something seamlessly into the compiler is transformative, easily a factor of 10 in adoption of it.

The obvious examples are Ddoc, the builtin documentation generator, and unittest, the builtin unit tester.


You've been a compiler/language developer for many years now (many decades?). How did you get into this field of work? Do you have any regrets about this becoming your niche? :)


The answer to the first question is detailed in the HOPL IV paper, 'Origins of the D Programming Language', freely available here: https://dl.acm.org/doi/abs/10.1145/3386323


Regrets? I should have made it open source from the beginning! Stupid me.


As someone having painfully translated some of STB headers over the years to D, this is good news. It's very easy for the translation to go slightly wrong and then code you don't understand breaks, usually loosing the time investment. That said, that work is pretty early to judge.


Now if we can get people to quit starting new projects in C++ (looking at you Google) we can finally make some progress in programming languages.


I think it is a great sign of the health of the dlang community and project that the creator of the language can submit a PR and get actual honest feedback just like any other participant.


Now we just need rock-solid automatic "extern (C++)" bindgen and we're g2g


Has there been any notable success story with extern(c++)? I know some folks use it to interop with their own C++ code, but I haven't seen any new bindings to any popular C++ libraries.


extern(C++) is what allows the LLVM D compiler to work for example.

Binding C++ libraries is quite risky in the sense that the libraries usually aren't meant to be consumed via that API. For that reason it's mainly used (publicly at least, there are things I know that I cannot say) for exposing D to C++ in a sane way (Mangling, vtables, templates, all work, exceptions even!).


Is the C compiler written in D?


Yes, dmd was orinally written in C++, but has been ported into D for quite some time now, this is an extension of said compiler.

ldc and gdc are a mix of D and C++ due to the respective backends.


If you can't beat them, join them.


D rox




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: