Hacker News new | past | comments | ask | show | jobs | submit login
C – Preliminary C2x Charter (open-std.org)
94 points by EddieRingle on March 17, 2016 | hide | past | favorite | 171 comments



This is cool!

I'm a pretty hardcore C programmer (been using it as my absolute main and favorite language for over 20 years), yet I find it hard to come up with some wishes for what I'd like to see.

Not sure how to interpret that, perhaps it's just indicative of a certain ... conservatism common in old people? :) Scary thought!

To be brutally concrete, I sometimes wish I could write someting like this:

    do {
      const bool flag = do_something();
      ...
      ...
    } while(!flag);
That'd be very useful, but of course in current standards it's not possible since 'flag' isn't valid outside the scope it's declared in. So now we have to pre-declare 'flag' before the loop, breaking const-cleanliness too of course. This is obviously not a show-stopper, it's a minute thing. But it would be so nice! :)

Some support for automatic type inference would be nice, since most right-hand-side expressions have a type anyway it would be cool to be able to write

    auto x = 3;
and have the type of 'x' be 'int' since that is the type of the literal '3'. Of course re-using "auto" like this would break their guiding principle, but it would look great (and, I think, somewhat mimic what C++ has done to that keyword).

Does anyone have some better ideas?


> Does anyone have some better ideas?

gcc's C extensions have some good ideas [1], particularly:

1. typeof(e) operator: great for use in macros

2. labels as values: used in just about every language interpeter

3. statement expressions: great for use in macros

4. fixed point types: who wants to hack their own fixed precision arithmetic?

5. vector extensions

6. arithmetic functions with overflow checking

I think all of the above would make good additions to the C standard.

[1] https://gcc.gnu.org/onlinedocs/gcc/C-Extensions.html


Well said.

I'd like to see the gcc ".." operator for range-based switch cases and designated initializers become standardized.


TR 18037¹ defines fixed point arithmetic, but it does not seem well supported.

¹ http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1169.pdf


As far as I can tell, a programmer can't specify the number of fractional bits. The type "unsigned short _Fract", for example, has a number of fraction bits determined by the implementation within limits imposed by the standard. There's no way to ask for a type with, for example, 12 integral bits and 4 fractional bits.

Do hardware constraints make, for example, an 8.8 bit format easier to handle than a 12.4 format?

Compare Ada's fixed-point types, which let you specify the range and delta:

    type Fixed is delta 0.0625 range 0.0 .. 4095.9375;


> Do hardware constraints make, for example, an 8.8 bit format easier to handle than a 12.4 format?

Sometimes they do. For example, the ARMv7 DSP extensions support signed Q0.31 and signed Q0.15 arithmetic much faster than any other fixed-point format.


I'm not liking the fixed-point types.

I'd rather see the ability to place an attribute on a floating-point variable that would inform the compiler that fixed-point might be suitable. You'd specify the requirements that need to be met, and the compiler would choose the fastest format for the current architecture. That format could be fixed-point, decimal float, IEEE float, or something weirder.

For example, your requirements could include: minimum magnitude, maximum magnitude, minimum value, maximum value, maximum additive error, maximum multiplicative error, signed zero needed or not, NaN signed/unsupported/unsigned, infinity distinct from NaN and/or the largest number, etc.

You say what you need for correctness, and the compiler makes it happen.


Fixed-point and floating-point types have very different behavior. I can't think of a case where I'd want to let the compiler decide which one to use.


No, normally what people desire of a fixed-point representation is a subset of a floating-point representation. Most fixed-point stuff is prototyped as floating-point, then painfully converted into a mess of nasty macros. Ideally there would be assertions to protect against failure due to the limited functionality of the fixed-point type, but probably you don't bother.

It's not maintainable.

We should be able to take that prototype, mark it up with hints for the compiler, and then build with fixed-point enabled.


As long as we keep __builtin_apply out of the standard, GCC is loaded with good ideas.


Yeah, after reading that I was really wondering about the use case for those extensions at the time they were added. I honestly can't think of any now.


In these cases, I usually use while(true) with break, like

  while (true) {
    const bool flag = do_something();
    if (!flag) break;
  }


I'm curious if your version is exactly equivalent to the OP's, or is there something not covered here?


It does what the parent wanted. Just not as concise. do-while isn't all that common anyway.


My would-like-to-have in C is some equivalent of D's *Scope Guard" statement (https://dlang.org/spec/statement.html#ScopeGuardStatement) -- this will make error handling so much cleaner.


It seems the consensus is that new features like "auto" are not wanted:

Principle 13: Unlike for C99, the consensus at the London meeting was that there should be no invention, without exception. Only those features that have a history and are in common use by a commercial implementation should be considered


Do you really consider it an invention? I think they mean inventions done by the committee (like <locale.h>). ISO C++ has this feature. And so does GNU C via the __auto_type keyword (GCC 4.9 and clang 3.8). History is there but I doubt it's commonly used (in C yet).


Well, does implementation in C++ count? The wording is ambiguous.


This means that you should convince a compiler maker to add your innovation as a C extension. When it has been tested that way for a while, it may be added to the standard.


Bummer. Oh well, at least typeof and compound expressions can be blessed?


If you're using GNU C:

    #define AUTO(var, def) __typeof__(def) var = (def)

    AUTO(x, 3);


Or, as of GCC 4.9 and Clang 3.8, just use __auto_type. It avoids duplicating the source expression, which duplicates compiler errors if the expression is erroneous and can actually repeat side effects in the rare cases involving variable-length arrays.


Things I'd like to see

static lifetime compound literals. I don't see any reason not to be able to get the same effect as string literals for other types like:

return (static int const []){1, 2, 3};

Disallowing branch-predicted values for relaxed atomic operations. I think c++ did this and it seems the sane way to go.

Binary literals and binary printf format specifier. It would just occasionally be nice compared to hex.

countof macro for arrays. I don't even care if it checked for non-array type errors, it would just be nice for it to be standard.

Statement expressions and function literals would be nice, but I don't feel strongly about it.


Auto's a great thought. It would really kill a class of numerical operations that lead to overflow with type mixing. On the other hand, I once had to get to terms with a code base with the most generous amount of autos. The class hierarchy was maybe was maybe 20 levels with an adequate amount of branching. At any point, I had no idea what particular object I was dealing with and always had to go follow a labyrinthine chain of predecessor function calls to determine the object type actually assigned to an object.


On auto. Couldn't they just define auto as

  _Auto
and then add a header like <stdauto.h> which does

  #define auto _Auto
like they did with _Bool in C99?


auto is already a keyword in C11, a type of storage specifier (like static or register).


From <http://www.seebs.net/faqs/c-iaq.html#section-1>:

1.8: What's the auto keyword good for?

Declaring vehicles.


This is inherited from early Lisp implementations, where it was common to define a

  struct car { ... };
which would get cumbersome to type.


Yes, that's exactly why they'd use _Auto and a header to redefine the keyword. Existing code that does auto x = 3; (implicit int type) wouldn't break, but #include <stdauto.h> auto x = 4.2f; would be the new meaning.


Existing code that does "auto x = 3;" is already broken; the "implicit int" rule was removed in C99.

But "auto int x = 3;" is valid (and the "auto" keyword is useless in that context).


Super against auto. The type is implicit documemtation, and auto hides that. It probably also increases compile times.


It doesn't increase compile times. The type information is already there for type checking.


For your first case, why not:

  do {
  } while(!do_something());
Not trying to be snarky, I'd just like to understand :)


Because it's often either like this:

    do {
      const bool flag = do_something();
      do_something_else();
    } while(!flag);
Or like this:

    do {
      int whatever = get_some_data();
      const bool flag = do_something(whatever);
    } while(!flag);


Someone mentioned this instead:

  while (true) {
    const bool flag = do_something();
    if (!flag) break;
  }
Is it not what grandparent wanted?


Yes, that would do it. It would just be nice to have it be better supported in the language rather than working around it like this. Any time you introduce break or continue into a loop you're making it more difficult to follow what's going on, since suddenly the exit condition can be anywhere. Not a big deal, especially if there's only one and it's right at the end, but mildly annoying that it can't be expressed more naturally.


I've only somewhat recently become a hardcore C programmer, but I actually quite like it. Wrote my own incremental generational garbage collector recently—the things you can do in C are fun. Here are some things I'd really like:

Not language, rather standard library, but I would like to see a new allocation API with explicit control over alignment, separation of reserving and committing addresses, and generally updated for the fact that memory systems are different now than they were in 1979.

aligned_alloc is alright, except there's no aligned_realloc. Linux always reserves addresses and doesn't commit until it needs to, but that can't be relied on for portable code.

In short: we're in a 64-bit, virtual memory world. I don't want an allocator design from 1979.

It's about time glibc's custom streams[1] get standardized. Real talk, it's 2016 and we can't define custom FILE * types. This doesn't even need to expose the libc's FILE implementation; it can remain opaque just fine.

Language-wise, I'd like to see a pragma to covert an array of structs to a struct of arrays; i.e. declare

  #pragma struct SOA
  struct entity {
    uint32_t id;
    ...
  };

  struct entity all_entities[1000];
and have the compiler rewrite it to

  struct {
    uint32_t id[1000];
    ...[1000];
  } all_entities;  // where typeof(all_entites[N]) == typeof(struct entity)
à la Jai[2]. This isn't a particularly hard transformation to do by hand, but it's kind of tedious.

Standardizing M4 as a preprocessor would be a nice-to-have; i.e. you should be able to compile a .h.m4 or .c.m4 file directly. I don't know why GCC can't do this yet. It's a bit of a niche thing, but some of us are M4 addicts.

C is in a pretty good place otherwise. I like to think there are 3 known local optima in terms of programming languages: C, Lisp, and Forth. All of them are a little hard at first, but quickly become very easy to think in.

We'll probably never get any of this, which is a little depressing. I get the feeling the C committee doesn't really care that much about evolving the language any more. Modern, fast, safe(-ish), readable code is a GCC extension.

1. http://www.gnu.org/software/libc/manual/html_node/Custom-Str... 2. https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md...


For aligned_realloc, you may want to look at using jemalloc for your code, instead of the vendor malloc. It's got an "experimental" function rallocm that does allow you to choose the alignment. Regarding m4 as a preprocessor, I'd rather wishlist a proper macro system for Cx+n that could do the job of the preprocessor and any other uses of precompiling.

The big problem with extending the C language is that C's in a ton of places. Some of the things you describe would mean increasing the size of the C runtime, which would be unacceptable in embedded programming and the like. Perhaps we need a middle ground language here -- more daring than C, more conservative than C++ -- where ideas can be tuned better and/or be specified for system types. C's universal assembly role really is a double edged sword as far as getting new stuff in the language.


I implemented my own, but I'll definitely check out jemalloc. I happen to have a use case for allocating large chunks aligned on large boundaries (high-performance garbage collector). jemalloc is common enough that I suppose it's not a particularly bad dependency.

Adding Lisp-style AST macros to C is never gonna happen[1], though it would be cool. I'm of the opinion that text macros and ‘real’ macros complement each other anyway.

One of these days I'll get around to implementing a less-portable-more-cool-C. There's Jai[2], but I don't really agree with all of its design (understandable, since I'm not a game developer).

--

1. Just kidding, people have done it already; e.g. https://github.com/eudoxia0/cmacro but it'll probably never be part of the standard.

2. https://github.com/BSVino/JaiPrimer/blob/master/JaiPrimer.md


I think your do-while suggestion is kind of cool, but I think that this program may do different things under a current C compiler and one which includes the feature you suggested.

    int x = something();
    do {
        const int x = somethingElse();
        ...
    } while (x != 42);


While some might claim this goes against the spirit of C, I think minor support for generics(enough to implement a type safe dynamic array e.g. vector) would be really useful. And if this did monomorphisation, then performance would still be great.


We already have C++, so is this required? If you want C with a dynamic vector, you could write C in C++ using std::vector.


While it's important to not overload C with features, not adding features to it because C++ already has it isn't very practical.


How about just type parameters that let you implicitly get sizeof parameters and pass function pointers without having to unportably cast them.


All I want is Friendly C. Eliminate unnecessary sources of undefined behavior, stop exploiting undefined behavior in ways that breaks sensible code, and give me sane semantics like two's complement signed integers by default.

http://blog.regehr.org/archives/1180


C needs to cut down on the instances of undefined behavior. Mostly these should become implementation-defined with some restrictions that the implementation should do one of the expected things.

For instance, if you overflow an integer, the compiler should not be allowed to assume that code branch is unreachable, or to conjure nasal demons. Instead the result should be an implementation-defined integer or a jump to an implementation-defined exception handling routine.

C should be predictable!


> C needs to cut down on the instances of undefined behavior

As someone who's worked on compilers a good bit, I disagree (and this is my favorite meme to argue against lately). You need that undefined behavior for performance.

> For instance, if you overflow an integer, the compiler should not be allowed to assume that code branch is unreachable, or to conjure nasal demons.

You need to be careful to not destroy loop trip count detection.

> Instead the result should be an implementation-defined integer

Not good enough. Loop trip count detection destroyed.

> jump to an implementation-defined exception handling routine.

Unacceptable for performance. Probably a 3x loss.


I think that the reason people want this stuff is that they don't understand how much effort it takes to prove that underlying UB is not triggered in a branch and optimize out checks. Code size would explode, too; decimating the usefulness of embedded platforms, and eviscerating instruction caches. In that way, it would also be an ecological disaster.


I'm curious to learn more about your position -- have you written it down in more detail elsewhere? Rust doesn't have this kind of undefined behavior AFAIK? Are you arguing that C and C++ should have it, but Rust doesn't need to even though it's competing perf-wise with C and C++ in this space?


> Are you arguing that C and C++ should have it, but Rust doesn't need to even though it's competing perf-wise with C and C++ in this space?

Yes. Other languages like Swift generally don't need it either.

The big wins from exploiting undefined behavior in C come from mistakes that C's designers made in 1978: signed int being the easiest thing to use for loops over array indices, the any-pointer-can-alias-anything-else style, the uselessness of const for optimization, and null pointers. Rust (and e.g. Swift, for the most part) made none of these choices, and so they can reap the benefits of aggressive compiler optimizations without the undefined behavior.

(Digression: I can't blame C's designers for these mistakes, of course: it was 1978! But I think that in 2016 we should just accept that there were things we didn't do right in 1978, because we didn't know as much then as we do now. In most other engineering industries the idea that we made mistakes 35 years ago and that modern designs are better is uncontroversial, but for some reason in computing we have rose-colored glasses for the early days of Unix. Everyone wants to blame the compiler authors because they don't want to admit C has flaws!)

These issues don't necessarily apply to other languages. I believe that if you design a language optimally you don't need to fall back on undefined behavior to get good performance. But I don't believe C is that language, and I think attempts to dial back undefined behavior without changing C are missing the point.


Wow, if what you say is true, that's even more of a benefit for a language like Rust. Memory safety is awesome, but keeping the same performance while removing undefined behavior is huge.

Another question: how do "unsafe Rust" (ie. Rust inside unsafe blocks) and C compare in this sense? If satisfying the borrow checker for some bit of code is too hard, is writing in unsafe Rust better/safer than C?


Currently, the aliasing rules for unsafe code are "any pointers can alias"—i.e. no type based alias analysis (TBAA), a.k.a. strict-aliasing. (This is because LLVM has no concept of TBAA at the IR level; it's something the frontend supplies explicitly.) There is an open question as to how much this will affect our ability to perform alias-based optimizations, given that a lot of our primitives are implemented with unsafe code under the hood. I'm optimistic that it can be solved (although some others on the team, like Niko, are less optimistic).


https://doc.rust-lang.org/reference.html#behavior-not-consid...

>• Integer overflow

> ◦ Overflow is considered "unexpected" behavior and is always user-error, unless the wrapping primitives are used. In non-optimized builds, the compiler will insert debug checks that panic on overflow, but in optimized builds overflow instead results in wrapped values. See RFC 560 for the rationale and more details.

(also important to note that when they say "RFC 560", they are referring to Rust RFCs and not IETF RFCs)


Meanwhile, Java runs pretty fast.


Java doesn't suffer from these a bunch of these limitations, though, especially around aliasing. That makes a big difference. (Though signed indexing is still an issue in Java, unfortunately…)


I think this is the right spirit.

C makes a lot of things undefined that could be implementation defined even on weird hardware. That would considerably reduce the gap between customary C and standard C for "normal" hardware while letting the oddball machines do what they need to do and document it.

I don't know if the compiler implementors would stand for it, though.


> I don't know if the compiler implementors would stand for it, though.

I wouldn't without solid proof that performance won't regress.


How much? On which applications?


The applications that the compilers' customers are using.

The thing that's often forgotten in these discussions is that compiler authors listen to their customers, and what they do reflects what their customers want. Compiler developers don't implement optimizations because they're scheming language lawyers looking for ways to break programs. They implement optimizations because their customers file bugs saying that a competing compiler optimized their program a certain way and wondering why all compilers don't do the same.

It's instructive to go back and look at mailing list and message board discussions at the height of the GCC/LLVM/MSVC/ICC competition. The performance wars were intense.


I actually prefer undefined behavior to defined but not necessarily desirable behavior, such as values defaulting to zero/NULL or integer rollover. That's IMHO only hiding the symptoms (e.g. crashes) of the problems (e.g. logic bugs).

Undefined behavior allows the compiler to handle those cases as errors (e.g. print stack trace and crash). Or to perform static analysis. Or, to do something predictable.


> Undefined behavior allows the compiler to handle those cases as errors (e.g. print stack trace and crash). Or to perform static analysis. Or, to do something predictable.

In practice, no existing compiler that I'm aware of issues a stack trace on undefined behavior at runtime.

Instead, the compiler reasons from the perspective that if a section of code would have undefined behaviour under some set of circumstances, it will assume that those circumstances will never occur, and optimize accordingly (with sometimes surprising results).


gcc/clang with -fsanitize=undefined


Interesting, thanks!


> Undefined behavior allows the compiler to handle those cases as errors (e.g. print stack trace and crash). Or to perform static analysis. Or, to do something predictable.

The term you're looking for here is "implementation defined".


Not really. "Undefined behavior" is behavior that is not defined by the language standard. A given implementation is free to define the behavior as it likes.


C does have an all-behavior-defined mode.

It's called valgrind.

Try running a serious production program in it and then tell us about how trivial those UB-based optimizations are.


Valgrind is great. But its slowness does not arise from disabling UB optimizations.


This too...


Agreed. Integer overflow, in particular, should be implementation-defined, and implementations should have a means to indicate whether they wrap around in two's- or ones'-complement, clamp, signal, or do something else.


It's too much furniture to do that properly. You may want to overflow an integer. Granted, rings of cardinality 2^64 are unusual at best but I've used 8 & 16 bit size rings forever.

UB is just the nature of the beast.


He was talking about signed integer overflow. Unsigned integer overflow is useful and defined.


He sort of didn't say, so you're probably right. I know that's a PITA. Good thing there are 64 and 128 bit ints.


Pointer arithmetic overflow frightens and confuses me. Does anyone know why it's treated differently than unsigned arithmetic?


I assume it's because the standard doesn't guarantee a linear address space.


The standard doesn't even guarantee that pointer arithmetic works at all, outside the bounds of an object (or at least did in C89, which is the only standard I know well).

If you have an object which is ten bytes long, like:

    uint8_t o[10];
...then it's legal to construct pointers to o[0] through o[10] --- not o[9]; you can create a pointer to the byte immediately after the object --- and nowhere else. Like, it's not even legal to calculate one, let alone dereference it.

I used this to make a prototype compiler from C to Javascript/Perl/Lua, where each C pointer was represented as a tuple of (array, offset). Pointer arithmetic worked inside objects; pointer arithmetic between objects wasn't supported. Worked nicely.


Accessing beyond bounds is undefined, but leaving pointer arithmetic outside of objects undefined would preclude a lot. For example, building an OS page table or DMA, or RDMA, or MMIO, etc...


No, even out-of-bounds arithmetic is undefined. 6.5.6 Additive operators paragraph 8 (cribbed from Stack Overflow [1]):

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. [...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

[1]: http://stackoverflow.com/questions/18186987/decrementing-a-p...


This doesn't say anything about pointer arithmetic on pointers to raw memory though. For example, using an mmap file, there isn't any object, and there aren't any bounds.


That's correct --- the standard doesn't say anything about raw memory. Only pointers to objects defined by C are defined to work; everything else is an implementation-specific extension (and so, covered under 'undefined behaviour').

I believe that C99 added the ability to losslessly cast from a pointer to a uintptr_t and back again, but, IIRC, the compiler didn't have to support this in C89.


These cases do work if you are operating on (uintptr_t)&object instead. The C machine model is more restrictive than the von Neumann machine model supported by the likes of x86 and ARM.


Non-linear address spaces shouldn't affect pointer arithmetic though, unless I'm misunderstanding something. Otherwise, I guess the implication is that there are systems or implementations that rely on negative pointers; in which case I would think it should be up to the implementation.


Consider a system where an address consists of a segment identifier combined with a byte offset within the segment. The relationship between different segments is unspecified. Pointer equality has to consider both parts of the address, but "<", "<=", ">", and ">=" can ignore the segment identifier and compare only the offsets.

Given two distinct objects, x and y, (&x == &y) is meaningful, but (&x < &y) isn't particularly. (Except that sometimes it would be convenient to have a consistent total ordering on addresses, something that C doesn't define.)


I can see that a non-linear address space doesn't imply a strict weak ordering, but this still seems to be an implementation defined detail. It doesn't imply anything about pointer arithmetic overflow.

For example, consider x86 segments. Is there a reason why you would use negative offsets? Given a segment address, the number of representable values is identical whether the offset is strictly positive, or negative with a shifted segment address (assuming two's compliment).


My #1 desired feature for a future C standard is statement expressions, like the GNU extension. They make it much easier to avoid issues like multiple evaluation when writing macros. Typeof would be nice too - combined with statement expressions, it adds some capability for type generic macros without having to explicitly list all the types ala _Generic (which was standardized in C11 but which nobody uses).

More broadly, while I'm not sure if it really fits in C, it would be cool to see some kind of lambda function support, along the lines of Apple's "blocks" extension.


Agreed on both statement expressions and typeof; those seem like they could and should be standardized exactly as currently implemented.

I'd like to see blocks/lambdas standardized, but I'd want to see a second implementation first. I also hope for better properties than those of the Apple blocks extension; I realize C does manual resource management, but the lifetime issues in blocks seem particularly worse even by C standards.


How about continuations? Or is that too far...


__attribute__((cleanup)) please. It's used by systemd now and is supported by gcc and clang. It makes a real difference to code, avoiding masses of cleanup along error paths.


I mentioned it elsewhere, but some equivalent D's Scope Guard statement would be even better: https://dlang.org/spec/statement.html#ScopeGuardStatement


When you start needing/wanting stuff like that, wouldn't it make sense to just use a C++ compiler and a simple RAII type or two?


Really it doesn't. If you add C++ to the mix, you lose the clear relationship with code as written to what runs (is this operator overloaded?), and you have an open invitation for people to attempt to use more and more C++ features, many of which are not well thought through.

Instead of moving to C++, I'd rather allow mixed objects in a more sensible language (eg. linking with objects written in OCaml).


I'm not convinced by this argument. It's perfectly possible to stick to a well defined subset of C++ if you're scared by certain features. Just write a coding standards document and enforce it.

Operator overloading is very important for generating the fastest code possible, too. It would be hard to get the benefits of a library like Eigen which can automatically use SSE to optimize maths expressions in C.


No. You don't want the vast majority of C++ semantics; especially classes, templates, and different casting rules. Additionally C++ compilers are just slower than C compilers.


Agreed. __attribute__((cleanup)) provides a way to do block-scoped cleanup of each variable and eliminate the "tear down in reverse order in each error path" or "tear down in reverse order with goto labels" pattern.


I have never seen this before but was just reading about it. It appears to run unconditionally when the scope is exited though: how do you prevent it from tearing down even in the success case?


Tearing down in the success case is a feature, not a bug! If you don't want cleanup along the success path (eg if you're returning an allocated string), don't use cleanup on the returned variable.

However that does point to the one problem with attribute((cleanup)). If you consider code like this (using the systemd macros):

    int f (void)
    {
      _cleanup_free char *s1;
      _cleanup_free char *s2;
      s1 = strdup ("foo");
      if (!s1) return -1; /* crash */
      s2 = strdup ("bar");
      if (!s2) return -1;
      // ...
      return 0;
    }
If strdup fails, it'll crash at the place I marked because it will try to free the uninitialized pointer s2.

To get around that you have to initialize everything to NULL (since free(NULL) is legal).

The problem then becomes that you end up "over-initializing" and "over-freeing". It would be nice to have a version of attribute((cleanup)) that would eliminate the call to the cleanup function if the pointer is NULL (or uninitialized?). But then you're relying on the compiler to do some kind of dataflow analysis, which is difficult in a standard (excludes simple compiler implementations), and may not even be possible because of the halting problem.

This is basically the reason why you wouldn't want to use cleanups in kernel code, because performance is paramount there and unnecessary cleanup calls wouldn't be welcome.


You could potentially write the "free" function like this:

    static inline void free(void *ptr)
    {
        if (!ptr)
            return;
        _free(ptr);
    }
Then the compiler can easily inline this, omit the NULL check if it knows the pointer can't be NULL, or omit the whole thing if it knows the pointer is NULL.


GCC 5 already does that for you (and more), since malloc/free are C standard library functions:

  #include <stdlib.h>

  int main()
  {
      free(NULL);
      char *foo = malloc(42);
      free(foo);
  }
gcc -O1 compiles it to:

  00000000004004f6 <main>:
    4004f6:	b8 00 00 00 00       	mov    $0x0,%eax
    4004fb:	c3                   	retq   
    4004fc:	0f 1f 40 00          	nopl   0x0(%rax)


free(NULL) is already perfectly legal (specced to be a no-op) as noted by the parent. The problem is that `char * s;` is probably _not_ NULL, so you'd try to free some random address on the stack. Of course it is easily fixed with `_cleanup_free char * s = NULL;`, but it's a somewhat difficult error to spot sometimes.

EDIT: I super misread that, my bad.


The point is that this informs the compiler of that behavior of free so that it can optimize it away. Basically it gives you some of the advantages you'd get if free were a compiler intrinsic (which it might be, idk).


I'm not familiar with using attributes for cleanup, but isn't the problem caused by the predeclaration of the variables? That is, wouldn't this work (and be shorter and simpler)?

    int f (void) {
      _cleanup_free char *s1 = strdup ("foo");
      if (!s1) return -1;
      _cleanup_free char *s2 = strdup ("bar");
      if (!s2) return -1;
      // ...
      return 0;
    }


OK, I tested it. Yes, this approach worked fine for me in GCC, Clang, and ICC.

  #include <stdlib.h>
  #include <stdio.h>
  #include <string.h>

  #define autofree __attribute((cleanup(autofree_func)))
  #define PASS 0
  #define FAIL 1

  void autofree_func(void *ptr_ptr) {
    void *ptr = * (void **) ptr_ptr;
    printf("%s(%p)\n", __func__, ptr);
    free(ptr);
  }

  int test(int fail1, int fail2) {
    printf("\n%s(%s, %s):\n", __func__,
           fail1 ? "FAIL" : "PASS",
           fail2 ? "FAIL" : "PASS");
    autofree char *s1 = strdup("foo");
    if (fail1) return -1;
    printf("s1: '%s' (%p)\n", s1, s1);
    autofree char *s2 = strdup("bar");
    if (fail2) return -1;
    printf("s2: '%s' (%p)\n", s2, s2);
    return 0;
  }

  int main(/* int argc, char **argv */) {
    test(PASS, PASS);
    test(PASS, FAIL);
    test(FAIL, FAIL);
    return 0;
  }
nate@skylake$ cc -Wall -Wextra -Wconversion -O3 scoping.c -o scoping

nate@skylake$ ./scoping

  test(PASS, PASS):
  s1: 'foo' (0x2056010)
  s2: 'bar' (0x2056030)
  autofree_func(0x2056030)
  autofree_func(0x2056010)

  test(PASS, FAIL):
  s1: 'foo' (0x2056010)
  autofree_func(0x2056030)
  autofree_func(0x2056010)

  test(FAIL, FAIL):
  autofree_func(0x2056010)


> Tearing down in the success case is a feature, not a bug!

I guess it is if you are thinking of this as C++ RAII destructors, which is indeed one very useful case.

It's not useful in some other cases. Like imagine that you are writing the C equivalent of a constructor. If you initialize half your members and then run out of memory, you want to uninitialize only the members you managed to initialize already. attribute((cleanup)) doesn't help with this case.


    out_error:
      dtor(self);


@haberman, you don't use it for constructing an object you want to return.


RAII is the one thing I miss from C++.


It would be nice if they incorporated some suggestions from the Cerberus project to make it clearer what is defined and undefined behavior.

https://www.cl.cam.ac.uk/~pes20/cerberus/


http://www.open-std.org/jtc1/sc22/wg14/www/docs/PreLondon201...

>N2012 2016/03/10 Sewell, Clarifying the C Memory Object Model

>N2013 2016/03/10 Sewell, C Memory Object and Value Semantics: The Space of de facto and ISO Standards

>N2014 2016/03/10 Sewell, What is C in Practice? (Cerberus Survey v2): Analysis of Response

>N2015 2016/03/10 Sewell, What is C in practice? (Cerberus survey v2): Analysis of Responses - with Comments


I don't understand why these committees are so dead set on maintaining backwards compatibility with existing code when most compilers will support flags for compiling with old standards anyway. As long as you don't break binary compatibility I really don't see the issue.


Because your new code wants to use old-code libraries.


But if you maintain binary compatibility can you not just compile the old code with --std=c89 and then link the new code compiled with --std=c2x ?


Unfortunately, because of header files, no.


One minor thing that should hopefully be considered... a %b/%B conversion for printf et al.

It's useful to print binary representations of things (debugging, conversion, etc...). A number of libc implementations support it as an extension. Forcing people to re-implement it outside of printf is a bit obnoxious considering how simple it is, and that there are already hexadecimal and octal conversions.


And "0b11001001" constants.


Old-style (non-prototype) function declarations and definitions have been officially "obsolescent" since 1989. Perhaps C2x can finally get rid of them.


This would break the mandate of allowing C90 programs to run "largely unchanged" (see text)


True, but none of the 14 listed principles is absolute. C2x compilers probably would (and could be encouraged to) accept old-style declarations and definitions in some non-conforming mode. They could even be accepted in conforming mode as long as the compiler warns about it (only a diagnostic is required).


I was curious about this statement:

> The Standard is currently written in troff, which is subject to increasing bit rot as the tools for formatting it evolve.

What exactly is bit-rotting? There are multiple maintained troff's out there, and if they are breaking backward compatibility regularly it's news to me.


Using formats which became less popular and well supported can be considered bit-rotting.

For example, having .PCX files in some project today instead of .PNG could be considered a form of bit rotting, because less and less tools support .PCX


I'd like a standardised function to clear a block of memory that is guaranteed to not be optimised out, and a constant time compare function.


Another minor thing I think belongs in the C standard is bswap. For such a common operation that's necessary for portable code, I personally don't understand why it hasn't been defined in either the C or C++ standard.

Currently, the only compiler I'm aware of that optimizes a bswap correctly is LLVM, which leaves a lot of code in the cold Aside from requiring people to write their own bswap, which is fairly maligned [1] for good reason, It leaves out a reasonably important optimization or potentially requires excess maintenance (platform libs, intrinsics, etc...) for people who do need portable code (the main reason for bswap).

I think languages are starting to take the place of platform independent interfaces, and my personal opinion is that bswap is fairly low hanging fruit. Given that one of the stated goals of C is portability, I think it fits well within the mandate.

[1] http://commandcenter.blogspot.com/2012/04/byte-order-fallacy...


Remove VLAs from the language. They were a mistake.


They were made optional in C11. Compilers for DSPs or whatever where VLAs aren't practical don't have to implement them.

Just because an _optional_ feature is impractical on a few niche architectures doesn't mean it's a mistake.


> because an _optional_ feature is impractical on a few niche architectures

That isn't why it's a mistake. Arrays and the stack don't mesh well together, especially arrays that aren't bounded by anything other than the width of the integral type used for the VLA length.


Yes, but that could have been addressed by making them optional for freestanding implementations but mandatory for hosted implementations.

Are there any hosted systems (systems that can support the standard library) on which VLAs are difficult to implement?


Not as far as I'm aware. POWER, x86(_64), even Itanium all just bump a stack pointer[1]. SPARC has a funky stack, but I don't know enough about it to say how hard VLAs would be. (It's a darn cool architecture though. I wish acquiring a modern SPARC processor wasn't ridiculously expensive.)

The thing that makes VLAs hard to implement is that on some embedded and special purpose processors the stack is actually a physical, fixed-size location on the chip. I recall, but can't seem to find, some architecture that didn't use an explicit stack pointer.

1. Itanium has 2, but for purposes of VLAs it may as well have a conventional stack.


VLAs provide a very useful way to avoid making temporary allocations using malloc()/free(), allowing (generally) better speed & avoiding leaks.

Why do you believe they are a mistake?


Because malloc() can signal failure to allocate by returning a null pointer.

VLAs make it impossible to handle a failure to allocate: there is no interface to indicate what the program should do when it happens. You can only follow the declaration of a VLA with code that assumes that the allocation succeeded.


Exactly the same thing applies to fixed-size arrays. If you define a local array of N elements, there's no defined way to signal an allocation failure whether N is a compile-time constant or not.


Expanding on the above, if you create a VLA with an unchecked size derived from user input, you've got problems. If the user provides a size bigger than you can allocate, your program could crash if you're lucky. (If you're not lucky, it might appear to work but behave unpredictably.)

On the other hand, if you're currently allocating a fixed-size array that's big enough to hold, say, 1024 elements (of which you're only going to use some initial subset), then replacing it with a VLA that's the exact size you need (<= 1024) will be an improvement.

malloc() is supposed to safely tell you whether an allocation succeeded or failed, but in practice it doesn't do so reliably. On Linux, by default, it can allocate a huge chunk of address space, and then fail (with no way to handle the failure) when you try to access it. When that happens, it can invoke the "OOM killer", which can kill other processes.


But you can say the same about function calls: it may trigger stack overflow, but there's no interface to tell what to do in that case.


This is why I avoid allocating large VLAs, and using them in deep call stacks in general. There's a diminishing return for allocating on the stack beyond a certain point anyway (roughly around a page).

Having a mechanism to abstract this might be nice, but these are things people should hopefully be aware of before using VLAs. std::dynarray has been a bit polarizing though.


But malloc() doesn't always signal failure with a null in many modern systems. It will happily return you a region of unpopulated virtual address and then SIGSEGV you when you try to use it. (With overcommit).

I think we can leave VLAs in the "trust the programmer" category. Very handy with bounded sizes; it keeps you from wasting all those warm cache lines.


The OS is working around sloppy programs by allowing overcommit which is in itself sloppy (imo), if you can't trust the kernel itself the language can't really save you.


It turns out that ARMCC generates a call to malloc when you use a VLA. Was a bit of a suprrise when I used one in an ISR explicitly to AVOID malloc.

Apparently the lack of a frame pointer on ARM means that tracking dynamically sized stack frames is a problem.


All I really want is C89 with stdint.h and safe(r) string handling routines (strncpy, snprinf, etc.). Hold the VLAs and the stdbool.h and the what-have-you.


(a) https://randomascii.wordpress.com/2013/04/03/stop-using-strn... (b) Can't you just use the parts of c99 that you want.


(a) I'm aware, but thanks.

(b) Use? yes, but only when I'm not using a library that relies on the parts I don't want. Implement? not really, since users expect their compilers to conform to the standards. Rely on for portable code? no, since not all compilers support C99 and many of those that do actually only support some subset.

For now, I'm using C89 with my own safe string routines and a configure script to generate the bits of stdint.h that I need.


which compilers aren't supporting C99 now? I keep hearing this argument but I haven't had that issue at all in recent years.


MSVC and a lot of embedded compilers. Not to mention, I still have to support systems that are using rather old compilers.


I've got 25+ years experience in realtime/embedded work and I have to disagree with you there. all compilers I can think of have been C99 compliant for a number of years now. MSVC is compliant now also.


Hmm, that's all news to me. When did MSVC add full C99 support? I totally missed that. Also, which compilers are you using for your embedded work?

Anyway, like I said, I'm often stuck supporting older compilers, too. So it might still be some time before I can "upgrade" to C99.


What about templates for C? Has this been proposed before? I'd love to write function templates, and templated structs would make implementing generic code so much easier and cleaner.


I'd just like something better than _Generic and (void *) as our only ways to do generics. That's my one feature. Too bad that's not likely to happen, ever.


The part where I see that after 1999 events have finally unfolded such that internationalization is only NOW considered important is... disappointing.


Internationalization was considered important for C89… which yielded regrettable inventions like trigraphs and wchar_t.


Umm... wchar_t was not part of C89. It was added later. I think it was a supplement rolled out half way between C89 & C99, mostly as a consequence of a lot of platforms embracing native unicode interfaces (I seem to recall it was released just as Unicode 2.0 was coming out and making it so fixed width glyphs were no longer a thing).

...and digraphs & trigraphs had to do with different terminals/text editors that didn't have keys/characters for certain language symbols. Last I checked it wasn't really about internationalization, but rather the rather insane variance of character sets just for english (like EBCDIC).


Really? I'll have to check my paper copy at home, but Wikipedia seems to agree, with a quote from the standard: https://en.wikipedia.org/wiki/Wide_character#C.2FC.2B.2B Maybe it was in C90 (ISO) but not C89 (ANSI)?

For trigraphs see the Rationale at http://www.lysator.liu.se/c/rat/b.html#2-2-1-1


It's entirely possible I'm misremembering, but I could swear it was in Amendment 1/C95, not in C90. I have this distinct memory about making jokes about how C95 and Windows 95 were both outdated technologies the day they were released. ;-)

I thought C90 was essentially just a reformatting of C89. Interestingly, the ANSI C page agrees with me: https://en.wikipedia.org/wiki/ANSI_C#C95


Ah, got it. C89 had wchar_t, defined in <stddef.h>, but <wchar.h> and <wctype.h> didn't show up until '95, so it wasn't very useful. Hence my vestigial memory of it being a prematurely standardized checkbox feature.


That principle was added in 1994.


Jeez... I totally misread that.


support binary literals with 0b prefix. Also include rotate ,count leading zeroes,count trailing zeros and popcount as functions.


I've become a heavy user of a number of clang extensions,

  __attribute__((cleanup))
  __attribute__((overloadable))
… are now indispensable to me.

Cleanup I use for both object lifetime management (when it fits that model), and lock management. Typically a "get access" type of function will take the lock and return the pointer to the object. Then when that pointer goes out of scope the lock can be freed.

overloadable is perfect for keeping names under control. Consider if you have a bunch of structs for encoding various bit arrays. Maybe 32, 64, and 1024 bit long. Each has functions for working on it, so you either prefix all the functions with bit32_, bit1024_ or you pick a letter suffix, e.g. bitset(), bitsetl(), bitseth() (huge?), and try to make the programmer remember them all. With overloadable functions I get to use the same simple name for the operation and which implementation you use underneath is irrelevant.

I think I'd like to go one step further and sugar up the syntax so…

  variable.func(a,b)  <----> func( variable,a,b)
… but Mr. Stroustrup might call me names. But darn it, sometimes you want to think of your objects in different ways. I'd be using lisp if I wanted everything the same. (Actually, I suppose Kernel, and laughing at the lisp users who think they have uniformity.) It doesn't have to be ".", go ahead an create an "=>" notation if that helps with the ambiguity of structure reference. (No dynamic dispatch here. Just leverage overloadable.)

And here is the big one: Thread Safety Analysis. This has the potential to eliminate huge swathes of runtime problems in multithreaded code and even calling convention problems in single threaded code.

In a nutshell, you can annotate your program to make statements about objects. You can say an operation ACQUIRE's an object (takes a lock), and that another operation RELEASEs an object. Then you can declare that other options REQUIRE that the object be in the ACQUIREd state. There is more, such as declaring that a the execution is currently in a state and then restricting its ability to perform operations based on that.

The current implementation is probably bit rotting in gcc, and the clang version seems to be missing some important bits to making it useful. The C++ end of it apparently works for Google with their Android coding conventions, but the C end of it doesn't quite do it for me. I think with perhaps letting a function declare that its return value is ACQUIRE()d I might be able to put it to use. (And fixing the bit where it doesn't play well with __attribute__((cleanup)) ).

But the working group has a '2' in the name. We have 4 years.

And one more huge one: PLEASE add a paragraph to the spec that says:

"undefined" is NOT license for some language lawyer compiler writer to kill your dog, burn your garden, erase your disk and return 42 from the main program.

Undefined should mean "we understand different architectures and compilers might prefer to do this differently, implement something SANE and warn the user that they are on squishy ground".


If they decided to hand out flying ponies…

• blocks are pretty spiffy.

• nested functions, by which I mean I don't need access to the parent's variables, I just want to tuck this little function inside the namespace of my existing function because darn it, it's two lines long and I can give it a tiny name, and I don't have to stick it above my giant function comment header where I can't see it while I'm in the code that is using it. I just want to abstract these two lines that I need to do three times in this function. gcc has these, clang thinks it's hard in their world.


gcc nested functions do have access to their parent's variables. Obviously only until the parent exits --- they're not closures.

I would love for this to be in the standard, particularly if they come up with a nested function literal syntax. Having them would make things like mutation callbacks and foreach() so much cleaner.


Nested functions are convenient, but when you take the address of a nested function that refers to surrounding variables, GCC puts a small trampoline function on the stack. To do this it has to disable the no-execute flag on the stack, which makes it significantly easier to exploit stack overflow vulnerabilities in your program.

Since the implementation is somewhat complex and system-specific, depends on memory that is both writeable and executable, and requires disabling security features on modern OSes, I doubt GCC style nested functions will make it into the standard.


  > I think I'd like to go one step further and sugar up the syntax so…
  > variable.func(a,b)  <----> func( variable,a,b)
  > … but Mr. Stroustrup might call me names.
http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/p0251r0.... … but that's C++.


Namespaces please!


(f) Make support for safety and security demonstrable

"Demonstrable" seems to be a bit of weasel-wording there, but for the life of me I don't see how you make C demonstrably safe and secure without ending up with something that is almost, but not quite, entirely unlike C.


Well, "safe" and "secure" can be relative terms, and C definitely could be safer. For example, null pointers could be defined to trap and -fwrapv could be blessed by the Standard as a mode that all compilers must support.

I agree with you of course that C will never be memory-safe either in theory or in practice while still remaining C. (The same goes for C++, despite popular belief.)


Cyclone-style never-NULL pointers and bounded pointers would be interesting, especially if completely optional. They could be incrementally added to sub-trees of a risky codebase (like a parser) but not poison your ultimate ability to do pointer arithmetic.


I'm sorry but open-std is a terrible, terrible name.


If I could change C, I would look closely at Go.

To be more precise, I would

- Quit separating definition and implementation (.h & .c vs. the singular .go)

- Implement multiple returns

- Replace #include with something a bit closer to Go's import (Package name and in-body identifier are decoupled)

- Methods on any user-defined type

- Finally, something namespace-like because

  library_thing_return_t *library_thing_doing_something_else(libraty_thing_param_t *in)
seems unsustainable to me.


People who want to program in Go are writing code in Go. People who want to program in C++ are writing code in C++. The C standard should improve the language we have, not try to emulate some other existing language.


Note that I didn't say I want to use Go, nor are all my points taken from Go (namely the namespace feature). I did not mention automatic memory management, I did not say the standard library needs to handle XML, and I still did not advocate that the syntax be so strict.


I won't comment much on most of those points (they're not my style), but as for the last point you made about long identifiers and namespacing, I used to agree that that was a problem, but now I've changed my position.

If you choose a long prefix to disambiguate your library functions, then it gets annoying, but you don't have to use long prefixes. "sdl" and "ao" and "X" and "gtk" are great examples of short vendor prefixes for identifiers. In Go, since you brought it up, you have to reference identifiers from foreign packages with a prefix anyway. Go does let you change the prefix if it would otherwise conflict with your code, but because of syntactical concerns, it's more likely to conflict in the first place.

Which brings me to the interesting thing about C's lack of namespacing: it means that global identifiers look the same everywhere. This enables you to safely use grep and sed on global identifiers. It's not 100% foolproof because you can shadow a global with a local, and the identifier could appear in a string or comment, but if you're disciplined with the naming conventions of your globals, it should be very unlikely that a local accidentally shadows a global, and if a global identifier's name appears in a string, or especially in a comment, it's probably actually refering to that identifier.

This means that a language design decision turns simple refactoring features from a somewhat complex tool that needs compiler integration into "just use sed".


.h and .c separation is important for shared libraries (you know, that thing that you can't really do in Go). I would like it if headers had some stronger requirements (since you can put any code in a header, which is quite bad).


> .h and .c separation is important for shared libraries (you know, that thing that you can't really do in Go).

Why are header files necessary for dynamic linking?

To name one of dozens of examples, Java has dynamic linking and has no header files.


> .h and .c separation is important for shared libraries

No it isn't. That's a completely orthogonal issue. The compiler is perfectly capable of generating the equivalent of a public header if the implementation has a way to specify visibility.


It's a bit more than that. You'd have to compile a lot of fluff in to the ELF. We'd be talking something closer to Windows COM, which allows extracting IDL from a type library.


saying a 40 year-old language that is one of the most widely used "seems unsustainable to me" seems weird.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: