Hacker News new | past | comments | ask | show | jobs | submit login
C2 Programming Language (c2lang.org)
44 points by luu on Aug 14, 2015 | hide | past | favorite | 49 comments

I'd love to see an evolved C that targets the most performance-obsessed programmers. Features like:

* fine-grained control over alignment

* automatic generation of SoA code from AoS code

* first-class SIMD types

* pragmas to switch the compiler into branch-avoiding codegen

* cache line size as a built-in constant, with ability to compile a binary that contains several versions of the complete program optimized for different cache line sizes

* user-defined calling conventions

* vectors, matrices, and quaternions (taking care of 90% of the cases where you really want operator overloading in C++)

* ability to treat an integer as an array of bits, bytes, or smaller integers, a la Terry Davis's "Holy C"

And the standard library could contain some nice performance-oriented stuff like:

* portable memory allocator that exposes pages, virtual memory indirection, reserve vs. commit, etc.

* fixed-point math

* approximate transcendental functions

* bitwise stuff a la the "Stanford Bit Twiddling Hacks"

while also adding modules and cleaning up declarations like C2 does.

I think a lot of programmers would like it.

The main issue would be fixing the problem of pointers; high-performance code is hard to write in C because of pointer aliasing. Rust has an interesting approach in its ownership model but I think a performance-oriented C evolution would require another more general solution.

ANSI rules for aliasing, plus the restrict keyword go a long way to address this.

Yes. Pointers are only a problem if you do things with them that confuses the compiler.

As someone who writes C code that needs maximum performance (my problem is cpu bound) the two biggest performance gains were moving to icc (Intel’s compiler) and writing my own thread-safe memory pool allocator to avoid malloc.

What are your use cases that require this kind of performance? High performance order matching or other trading/financial applications?

DNA sequencing data processing [1]. It is a hard problem that needs a lot of processing power thrown at it to solve.

1. https://www.nucleics.com/peaktrace/peaktrace-basecaller-over...

In Myrddin[1], the code is amenable to loop versioning -- All references are bounded, one way or the other. Either you have a raw pointer, which is bounded by the size of the type, or you have a slice, which carries with it a length. This allows you (or the compiler) to trivially write an 'aliases(a, b)' predicate, which allows the compiler to write something like:

    if aliases(a, b)
Considering that this language seems to also keep track of bounds on the pointers, you could probably do something similar.


Jonathan Blow is working on at least part of this (sorry, the only links I could find were to his videos about it)


As a C-enthusiast, I came in assuming that I wouldn't like this. It actually looks very interesting though.

Modules and imports look to make things much smoother, function pointers and structs are friendlier, and more explicit data types are something many C projects already adhere to.

I'm curious to see how/if the compiler handles blocks and C11 threading/atomics, with the proper libraries (blocksruntime/musl).

Have to echo the sentiment (if at least to give the authors some encouragement if they are reading this). I saw the title and thought, "not another attempt to supercede C..." but it sounds rather interesting so far. I love high-level hacking as much as the next guy, but I can see how this can be easier for C-lovers to not hate since it explicitly doesn't try to be more high level.

I am not sure, if it is really worthwhile to produce a C equivalent language in the year 2013 for more productivity.

I think, that C has still some usages, but for bigger projects, it is IMHO not feasible to use C or an equivalent for the whole project.

Most usage I see in generated code or small code samples that have to run rather fast. My questions:

* for a faster C, is there really a new language needed? (can't we go with better compilers)

* C2 should provide better tooling, but at first it is a drawback, since old tools are not working or are limited. How does that fit together? Would it not better, to use just C for generated code (where the usability aspects are also neglect-able).

Of course, every programmer wants to create a new language, but most of them are not lasting long ... and C2 did not convince me from the front-page.

There is value in a C replacement, but it would need mindshare even more than features.

I think Go has a good combination of mindshare and features for system software development. Rust may as well, though I know less about it.

I'd think that having a GC and runtime associated with it would prevent Go from even being considered for a Systems language.

I see Go as more of a java replacement than as a C replacement (inasmuch as there are lots of tools that can be written in both C and java, it is a C replacement though). Rust is obviously going after C++.

> For array types, C2 introduces a new operator, namely elemsof(). This returns the number of elements in an array


PHP - count

Python - len

C++ - .size (vector)

C - macro or stored

C# - .Length

Yes, "elemsof" is not only another new name, but in my opinion, it is also a name that does not disclose itself easily.

Good naming in programming is an art.

It seems to have been chosen so there is some parallelism between sizeof and elemsof.

It also makes the fact that you are referring to a size/length/count of elements, and possibly not bytes, very explicit.

To me it does not disclose that easily. The parallelism between "sizeof" and "elemsof" seems to me very constructed. "size" and "elements" are not similar words.

To be logical at all, the name should be "numelemsof" or "countelemsof" ... but I guess, those names are already to long for C purists ...

That's true all too often. Meanwhile, though, I prefer long and expressive names. You can easily find functions like getNumberOfEnemiesOnSamePlatform() in my C and C++ code. I don't think it's very hard to understand what that function does. With a few special exceptions, I haven't written any comments for almost two years and nobody ever complained.

Then I would not call you "C purist". The "C purists" I knew, refused to much typing. It also was the reason they refused other languages for example such with "tiresome long" words like "begin" and "end".

I on my part (in spite using C myself) think that typing code is not that problem, but typing the right code. And -- at least when you are not alone programming -- every code line is read 10 or more times more often than it is written. So, to be clear and understandable is much more valuable to me, than come away with less strokes ...

I've been plagued by supposed 'purists' all my career. What they are, really, is bad at typing. They won't refactor a crap method because it would take them too long to type it all

The best investment in my career I ever made is, typing class in high school. It has separated me from the crowd at every job. Its more important that so many other irrelevant 'purist' ideals.

I never had a formal typing class and I guess I am still not that fast in typing, but with the years I learnt my keyboard (even in the US and the German variant) and for my opinion I am typing fast enough now, since I have to think sometimes between the strokes, still.

But when I look back at my career, I never had the feeling, to be to slow in typing even for Module II -- a language, that would never ever get the approval of the "C purists".

Doing refactoring, I use a simple trick myself -- I do much copy and paste ;) ... and delete (!) the copied stuff afterwards.

I think, as a programmer, it is a good property to be lazy -- but you should merely use it for code reuse instead for avoiding typing.

Oh, I forgot: I use vim, a typing avoidance editor. You can avoid lot of keystrokes or mouse moves with that editor. But not to avoid real content.

> I on my part (in spite using C myself) think that typing code is not that problem, but typing the right code.

What do you think of the auto keyword in C++? I've personally never found typing out the types to be a problem.

Hi, I must admit, that I am not totally up-to-date when it comes to C++, since I am not hopping on the newest standards as soon as they are there. And I am not actively programming in C++ currently.

But I don't think, that "auto" was specified to reduce typing. I see it more as a possibility to follow the "DRY" principle (don't repeat yourself).

There might be cases, where for example the return types of functions might change and you don't want to change every caller.

Templates are of course an other example, because the return types of templates oftentimes are depending of the input types. You of course can use those template declaration stuff, but it is very ugly and oftentimes clumsy.

I also found a nice example, where it really can help to make things more easy to read, here:


I think, the real usages are rare, that is also the reason, it was specified rather late.

Why not numberOfEnemiesOnSamePlatform()?

I couldn't get used to that form yet. getNumberOfEnemiesOnSamePlatform() is imperative, while numberOfEnemiesOnSamePlatform() is not. I think it's because I'm not a native English speaker, but the get version sounds like 'give me that value' to me, which would make sense. Usually, my function names are imperative (often with the exception of event handling functions) and that feels more natural to me. I'll do some research on the topic, though. There's a presentation where someone explains it, but I couldn't follow last time.

MATLAB uses numel()

Go's len() sounds nicer to me.

This is why C is sort of unassuming in its "sizeof arr / sizeof *arr" pattern.

Whenever you write a new type, just write a sizing function for it using sizeof. I like that Go is taking a similar route, except that one just implements the signature.

C# is even more annoying because there is .Length .Count .Count() and .LongCount() and different types implement different ones.

I see a few questionable decisions in the type system:

* Why is char equivalent to int8? If C2 is keeping 8-bit chars, why not use uint8?

* Why isn't there a primitive size type a la size_t or rust (i|u)size to increase portability?

Beyond this, when I first saw this link, I expected to be disappointed with what they'd done. I was pleasantly surprised to find I liked it.

From someone who mostly deals with embedded systems, I like that. All of the modern system languages essentially rely on having a virtual memory model and force you to use GC. Both these prerequisites essentially rule out something like Go for a microcontroller (it can work, just not be useful beyond a novelty).

There are a few things I'd personally want to see, like lambdas and nested functions, some more intelligent error handling, fixed point math, etc. But its a good step forward.

I should ask the same question that people asked me when I introduce my language:

Do you have interesting code written in it?

I'm not a C programmer by any means - and the languages I've dealt with don't have signed/unsigned. So I don't fully understand the merits and drawbacks of them.

However I know enough to question this:

>The default int and float types have been removed, as have type modifier like short, long, signed, unsigned.

If the goal is this:

>C2 aims to be used for problems where currently C would be used. So low-level programs, like bootloaders, kernels, drivers and system-level tooling.

It is my understanding that signed/unsigned is exactly the type of concern you have when dealing with low-level, embedded, or binary code. eg. Bit masking, bit shifting, raw memory

So I'm a bit confused. The removal of them seems a bit contrary to the domain goal of the language.

They removed the default types. Meaning no int or unsigned int. They do have int8, int16, int32, uint8, uint16, and so on. You are forced to be explicit.


That's what I get for glancing (asking silly questions already answered in a relevant section)! :) Thanks. That makes a lot more sense.

They've removed the default "int" and "float" types and modifiers for size and signed/unsigned. Instead, they just have the specific types which have specified signed/unsigned and size characteristics. So, this because explicit over implicit.

They have int8, int16, int32, int64 and uint8, uint16, uint32, uint64, and float32, float64.

They don't have short, int, unsigned int, signed short int, long, unsigned long, long long, unsigned long long int, float, double, long double, etc.

A few features from Go that can make things better and help preventing certain class of bugs:

* no parentheses for if/switch/for, and mandatory {}, no semicolons

* fallthrough in switch

Also some sweet features like

* replace while with a generalized for

* struct methods

* multiple return values (this might be a stretch)

* if with variable definition (if x := y; x<z { ... } )

* not sure if it already has but type inference

> int32 instead of int. In C2 you always specify the size

Have fun porting that, this language is already dead out the gate.

It also seems to ignore low level details on how compilers optimize programs, Rust actually invested into this on the language side and wasn't just some syntastic difference from C.

I've encountered this prejudice against specific-width types before. It confused me then as now.

It seems to me that porting apps that rely on default types like int, where the int size/behavior changes between platforms, have about 1000x more chance of compatibility issues when trying to get the same code to run on new platforms. Sure there could be a performance hit on, e.g., a system that is optimized for 32 bit and you specify 16 bits. That's what types like fast_int16_t are for in C/C++. But porting the code is just easier when the types can't change out from under you.

Early versions of C couldn't even decide whether char was signed or unsigned; how is specifying uint8 or sint8 not strictly better than a char that you couldn't be sure whether it was signed or not? How is specifying uint16/32/64 or sint16/32/64 not strictly better than saying "[unsigned] int"?

Well, let's say I'm implementing something like STL's vector. I've got a size (number of elements). What type should it be? Clearly something int-like, but what exactly?

If I make it an int, then on a tiny machine, it might only be 16 bits. That means that the vector can only hold 64K elements. But that's probably OK, because if I'm playing with a chip where ints are 16 bits, it probably doesn't have enough memory, address space, or need to contain more elements than that.

But if I'm on a supercomputer, int might be 64 bits, and vector might need to hold that much. That is, the size of int tends to generally scale in the same neighborhood as the other capabilities of the machine.

If I have to decide how big to make the size of a vector, what size do I pick? int64? That works for the supercomputer people, but it means that an 8051 has to do 64 bit operations to use my vector package. That's... less than ideal.

If you've got generics (like templates) or macros, you make your int be different for different vector scales.

In fact, though, I would think that an 8051 vector would need to be profoundly simpler than a supercomputer vector; not just differently sized. When you're dealing with 1-16k of RAM and not much more ROM, a full "vector package" isn't what you need, but rather a really, really limited vector package that only does exactly what you need and no more.

In fact, you probably want a version with an 8-bit "length" value -- or even a 7-bit length with some other relevant flag stored in the final top bit, for space optimization reasons.

Source: I was the lead developer on an original Game Boy game, which ran on a CPU that, IIRC, was roughly an 8053 (it had instructions somewhat related to an 8080 or Z-80, lacking all the 16-bit instructions, but adding an 8-bit fast-memory operator reminiscent of the 6502). Back in those days you didn't have "packages." You had code snippets (in assembly language, of course) that you shaved bytes off of to make them fit. And then you shaved more bytes off.

[edit: clarity, and note that it's the Game Boy I was talking about; originally said Game Boy Advance, which was an ARM CPU. Also did a game on that device, but it was the Game Boy I meant to describe.]

Which is why a good language should, in addition to fixed-width types, have a size type, which is guaranteed to be the same size as a pointer; but it shouldn't have some random other types with vaguely defined widths and use cases. For example, Rust has u8, u16, etc., and usize, but no "short", "int", or "long"; Go has uint8, uint16, etc., and uintptr, except it also has uint, which is arguably a bad idea. C now has size_t, but none of its traditional unspecified-width types satisfied that criterion, since long is too short to fit a pointer on platforms such as Windows 64 (and long long's too long on 32-bit platforms, but that came in the same standard revision as size_t).

You can always be explicit:

    typedef uint16_t vector_size_t;
    typedef uint64_t vector_size_t;
    #error "Unsupported platform."
That's better than your code which inserts 100,000 things into a vector mysteriously breaking on certain platforms (on which you may or may not have tested your code).

When people bring up arguments for parameterizing numeric bit width I kind of shrug now. You don't get debugged code for free when you change the declaration. You have to walk through all your arithmetic regardless - any sufficiently large program will eventually depend on something like overflow behavior or the existence of an extreme range of values. In all likelihood, you're safer off copy-pasting and editing when you switch up primitives like that.

How is this any different than rust(aside from [i|u]size type)?

You mean that int32 is some sort of premature specialisation ? But if the machine you run that program has 15 bits int won't you have to rewrite part of the logic to deal with the new limit ?

Java (the most popular programming language in the world by many metrics, if you didn't know) works exactly like this.

I understand that C and Java have (sometimes) different use cases, still I feel reasonably confident that 32-bit arithmetic is going to be efficient enough on most platforms of interest (some embedded platforms might be a problem).

It's true that something like the Posix int_fast_16_t (etc) types might be useful though.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact