Hacker News new | past | comments | ask | show | jobs | submit login
C2 Lang design (2014) [pdf] (c2lang.org)
41 points by asaka on Aug 6, 2015 | hide | past | favorite | 21 comments



I understand why it's attractive to integrate more and more into the language proper, but I don't like it. Almost all tools which still embrace the unix way of doing things are written in C, and it's not just because that community is conservative. C feels like unix, because it's one of the only languages that abides by that philosophy. C-the-language is only part of the experience of programming in C. Programming in C includes DSLs and code generation and makefiles. Even what we consider C-the-language can be decomposed into preprocessor code and actual C.

That said, C needs to be improved, but I think it should be done via the build system, adding layers on top of C, in a tasteful, thought-out way. Some ideas below:

If we're expanding on C by adding build system complexity, Makefiles need to be improved. Make is a great tool, but it's unityped, like shell scripts. And it's essentially a preprocessor over shell, which leads to a mess of sigils. Maybe redesigning make as an embedded language in some lisp would do it. Additionally we could unify the notion of linking to a library and importing code into the makefile to allow dependencies to specify build steps. This could make some of the added build complexity implicit.

Namespaces could probably be implemeneted as a preprocessor, taking module declarations and import statements and converting all identifiers into they prefix qualified equivalents, emiting warnings when there would be a collision (like if module "foo" declared "bar_baz" and module "foo_bar" declared "baz").

Rust-style syntax-case macros can be implemented as a preprocessor.

Go-style defer statements could be implemented in a preprocessor, and avoid the somewhat verbose goto-style error handling.

As I said, this approach requires a lot of care to avoid adding a huge amount of complexity to the language.


Thoughts as I read:

1. "uninitialized var usage is error": unfortunately impossible without at least one of the following compromises: Automatically initialize variables (wastes CPU); False alarms (see Java); Built-in formal proof system; or, Require compilers to solve the halting problem.

2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

3. New keyword "as": A good invention in Pythonland. Good call to bring this in.

4. New keyword "nil": Redundant with NULL?

5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable. Compare: int library_fnc(char asterisk errmsg) versus int library_fnc(uint8 asterisk errmsg). (HN wants to turn my asterisks into italics...) In the former it's clear errmsg is a string, in the latter it's not clear (it could be a pointer to a flag).

6. Example - function types. Doesn't one usually typedef the function pointer, rather than the function itself? So making that require two lines is annoying. Aside that, the author is right that C has confusing function pointer typedef syntax.

7. Multi-part array initialization: Encourages unmaintainable code. Depending on what's in those "..."'s, might require compiler to solve halting problem?

8. Multi-pass parsing: Trades maintainability for instant gratification.

9. Symbol accessibility: The author makes "public" (and implicit "private") modify entire structs rather than individual fields...

10. Multi-file module: May lead to unmaintainable code

11. I'm worried about the language arbitrarily defining things like "the results of building are stored in the 'output' directory". OTOH the recipe.txt idea could help standardize what amounts to a lot of ad hoc Makefile programming.

12. Build process difference: Theoretically could speed up compilation. I'm worried for social reasons. In module-based languages, we tend to fall into module hell: one symptom being the infamous 20-page stacktrace (see: Java, Clojure, etc.) The nature of C's #include incentivizes shallow dependency trees (a very good thing).

13. "Language scope": trades portability for convenience

14. Tooling: This shouldn't be part of the language, it should be separate.


5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable. Compare: int library_fnc(char asterisk errmsg) versus int library_fnc(uint8 asterisk errmsg). (HN wants to turn my asterisks into italics...) In the former it's clear errmsg is a string, in the latter it's not clear (it could be a pointer to a flag).

You can always typedef it if you want it. The point is that naming a basic numeric type "char" is not only confusing, but also wrong in the world that's no longer all ASCII.


>2. Removed keyword "static": kills one of my favorite tricks, "self-init'ing functions".

The removal of that keyword with several different meanings doesn't mean there isn't/won't be a replacement:

http://c2lang.org/site/language/variables/#local-keyword

>4. New keyword "nil": Redundant with NULL?

https://groups.google.com/d/msg/comp.std.c/fh4xKnWOQuo/IAaOe...

>5. Example - Base Types: Uses uint8 in place of char. This obscures intent and makes code less readable.

http://c2lang.org/site/language/basic_types/

C2 apparently still has char however it doesn't seem to be as weird as C's (distinct type, either signed or unsigned). Simply int8.


I agree with a lot of your thoughts. Here are some areas where I disagree

> 1. "uninitialized var usage is error": unfortunately impossible without at least one of the following compromises: Automatically initialize variables (wastes CPU); False alarms (see Java); Built-in formal proof system; or, Require compilers to solve the halting problem.

I don't think this is true. It should be pretty easy to detect if a variable is initialized or not. I can potentially see how a false alarm would arise, but I don't think that matters in practice. (All the situations I'm imagining involve writing bad code)

> 10. Multi-file module: May lead to unmaintainable code

Go does this already and it is fine.

> 12. Build process difference: Theoretically could speed up compilation. I'm worried for social reasons. In module-based languages, we tend to fall into module hell: one symptom being the infamous 20-page stacktrace (see: Java, Clojure, etc.) The nature of C's #include incentivizes shallow dependency trees (a very good thing).

I can see why this is a concern, but I think it is more a problem with JVM languages because of the type of programming Java encourages.

> 14. Tooling: This shouldn't be part of the language, it should be separate.

I thought this way too. Then I used Go and realized the huge benefit tooling integrated into the language provides. (Go has other problems, but tooling is not one of them).


>I don't think this is true. It should be pretty easy to detect if a variable is initialized or not. I can potentially see how a false alarm would arise, but I don't think that matters in practice. (All the situations I'm imagining involve writing bad code)

It's easy to trace the code-paths between a variable's declaration and its usage, as long as those don't involve procedure calls. Then that problem becomes "static-analysis complete".


> 1. "uninitialized var usage is error": unfortunately impossible without at least one of the following compromises: Automatically initialize variables (wastes CPU); False alarms (see Java); Built-in formal proof system; or, Require compilers to solve the halting problem.

Isn't this just explaining why such useage can't be made a static error? It seems to me that raising runtime errors would avoid each of these compromises (but maybe that's clearly not what was meant).


1. Works perfectly in Java. Note that in Java vars are both initialised to known empty values and not initialising a var explicitly in a local scope is an error. If the compiler can't prove your code is correct, then it's not obviously correct, which means I have to sit down and carefully think about if it's correct or not. Just write simple code.

Though I concede that pointers make things very hard. Let's say you have

    void foo(int*);
    void bar() {
        int x;
        foo(&x); /* Is this an error? */
    }
You can't tell if x will be initialised or is expected to be initialised. You don't even know if foo will read one int, or expects an array of ints. I would have really preferred it if they did something on that front. Maybe have to declare foo(int[n] a), which you then call as foo([1]&x). There was a paper on extending C with exactly this - though their syntax was foo(size_t n, int a[n]), but I haven't been able to find it. One big plus is that you could, when compiling in debug mode, insert checks for every access. In general, I really want the successor to C to disambiguate between pointers and dynamically-allocated arrays.

4. Same reason C++ added nullptr (and I really wish they'd use the same keyword) - if you have a function int foo(int), foo(NULL) compiles fine, but foo(nullptr) doesn't. In C++ especially, since pointer types require casts, so NULL can't be ((void ptr)0), it must be just 0. Once you disallow casting a void ptr to any other ptr, you run into this problem.

5. This is more a problem of C's strings being naked pointers to chars. Every serious piece of code I've seen uses some sort of string wrapper struct. Though they might want to alias char to uint8, for this.

6. Function types should really be pointers, I agree.

7. yeah...

8. Practically every other language does it. It's kind of ludicrous to want every function declared before its usage, when the compiler can just collect all declarations, then see where they're used.

9. I think that's about the right level of granularity. Seems like you'd use struct embedding to separate the public and private fields, e.g.

    type Foo_priv { private fields }
    type Foo { public fields; Foo_priv priv; }
The fact you can't create a Foo from outside the module seems like a bonus to me.

10. I'd be in favour of enforcing Java's "project directory structure must mirror module structure", i.e. module a.b.c's files are all in src/a/b/c. I've heard a lot of C# devs lament that the files for a certain package are strewn everywhere.

11. Kind of a sore point, but yeah. I think emulating Java would be the right thing again - give the compiler a list of dirs which to check for compiled modules when doing module resolution, say where the output directory is.

12. yeah

13. Sounds like #pragma. You can't really escape the fact that when you have N compilers for your language, every one will support its own extensions.

14. I think his point is that the language makes these tools easy to write.


> This is more a problem of C's strings being naked pointers to chars.

A C string is a sequence of characters, not a pointer. C strings are manipulated using `char*` pointers.


Is there an advantage to this over say a more modern and safe language like Rust? It seems to be just reducing the complexity of the language, but doesn't look like it will reduce memory related bugs.


Since you have to rewrite everything, you might as well switch to another language (we switched to OCaml). If there was an incremental path or a safe subset of C or something like that, that would be more interesting.


My thoughts exactly. If nothing bad happens, I think Rust will be a worthy successor of C in low-level land. If I need a higher abstraction level, I use either Lisp if I want dynamic typing, or OCaml if I want static typing. Together, those three cover a vast spectrum.

addendum: That doesn't mean C2 is not a worthwhile experiment. I might even try it out when a compiler is available.


We have pretty good C interop, so you don't have to re-write everything, you can do it in chunks. Firefox isn't suddenly going to be reimplemented in Rust, for example, it will be library by library, bit by bit.

Of course, you can do that with some other languages too, but our lack of runtime and no-overhead makes it significantly better, in my opinion.


You can use D, which shares a lot of syntax with C, although you cannot directly reuse C code because there is no preprocessor in D. Some people use D as a "C development compiler".

The incremental path is the official C standards. C will probably gain modules for example.


In the project I'm working on right now, there are 277,617 lines of C (using gcc extensions). The incremental path either compiles all of that directly, or we rewrite it. There's hardly any middle ground, and actually rewriting it isn't a realistic option either.

(For comparison the same project has 63,972 lines of OCaml and 31,605 lines of Perl)


This is cool. I've often toyed with a similar idea of creating a language that improves/fixes the thing C messed up. If you aren't worried about safety (memory bugs can largely be avoided by changing how you do memory allocation, i.e. switch from individual mallocing to region based memory management) then C is actually a pretty nice language since it is simple enough to hold the entire language in your head. Plus it's nice to know how things are actually laid out in memory. The problems C2 solves are really the main things that frustrate me about C: header files, lack of a build systems, no modules, spiraling type signatures.


Looks extremely interesting. I'm skeptical of some of its claims (faster compilation when incremental compilation is removed sounds unlikely, for example), but nothing that worrisome.

Anybody have any experience? Is it still basically a toy language?


The main site makes it look like the language is about a year old.

I have no doubt that they can make the parsing stage faster, as they won't be parsing the same headers over and over again. This is often a bottleneck in C++, but much less often in C. (I've seen 10+MB C++ files after preprocessing).


If people used precompiled headers (which have been available for about 20 years) that problem wouldn't be there... However I'm not sure that the parsing is such a bottleneck these days; I think all the complex constructs are a lot more of a time sink (templates etc). That and linking! Takes an hour to link the webkit library with the regular linker :-)


It's been a while since I tried precompiled headers, but the last time I tried it, the experience was unpleasant and error-prone.


Very similar to Ark: www.github.com/ark-lang/ark. Except Ark has no GC, has tagged enums, ownership is enforced, and a few other smaller differences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: