Hacker News new | comments | show | ask | jobs | submit login
Deep C (2011) (pvv.org)
166 points by Tomte 11 months ago | hide | past | web | favorite | 59 comments

All the talk about `printf("%d\n", a);` and it's invalid due to a certain part of a language. From list of undefined behaviours (J.2).

> In a context requiring two function types to be compatible, they do not have compatible return types, or their parameters disagree in use of the ellipsis terminator or the number and type of parameters (after default argument promotion, when there is no parameter type list or when one type is specified by a function definition with an identifier list) (

> disagree in use of the ellipsis terminator

printf is called with the following implicit type signature (due to not including stdio.h)

    int (char *, int);
However, the actual printf signature is as follows.

    int (const char *, ...);
While char * can be used where const char * is being used, int is not a valid replacement for ellipsis terminator. As such, the correct answer is that it's undefined behaviour.

(of course, most implementations will allow this code, but strictly speaking as a language lawyer it is undefined behaviour)

Interestingly, with the K&R version of C we would not have such discussions at all - printf as well as many other functions did not require declarations. So, as long as we can assume that nobody would call printf with the format string as, say, the second argument, everything would work without anyone giving much of a thought. This is but one example of how simple tools reduce the unnecessary cognitive load.

I know most of this trivia except where it crossed to C++, which I don't pretend to understand. And instead of a candidate having a deep knowledge of this, I would rather have the compiler refuse to compile most of the examples. Computers are good at repetitively checking things. Why should I have to?

For C++ especially, most of these conventions and techniques co-evolved with the language, so requiring the compiler to produce an error for failing to follow them would be a breaking change. You might be able to configure your particular environment to produce more errors than the standard requires, which is what some of the mentioned warning flags are about. GCC and LLVM both support syntax like this to flag specific warnings as errors: -Werror=reorder (this specific one will error out if member variables aren't declared and initialized in the same order).

I yearn for compilers that can break backwards-compatiblity and strictly enforce best practices. This should be a achieved with a relatively simple command line flag to the compiler (instead of hunting for the right combination of dozens of -Werror=*, which then still let a lot of unclean code through)

Won't see widespread usage outside of learning because you can no longer trust any compiler upgrade not requiring a rewrite to your code, as it arbitrarily decides what the latest best practice.

What might be useful though is something like -wall printing all flags representing current best practices, so you can have a snapshot of it when writing new code (the problem with current -wall is again potentially arbitrary changes on compiler updates as new warnings get added, if it actually did warn on everything)

Then you can choose when you want to update to whatever modern best practices

That would be fine! Although, I also don't think "best practices" would be something that changes every 6 months (maybe every 5 years), and being forced by the compiler to rewrite code to keep up with best practices is absolutely a good thing!

I've been the lead on a fairly large Fortran project, and we make a point to run our test suite on as many compilers as possible, with the strictest flags to avoid any kind of "undefined" behavior (which is rarer in Fortran than in C). I believe it's the only way to ensure long term maintainability.

clang-tidy goes some way towards this goal.

There are some places where only the programmer can know that the undefined case doesn't happen (like integer overflow). But yeah, what purpose does it serve that the order of side effects in "a() +b()" is unspecified? I assume that's for historical reasons.

A fixed evaluation order may be (very slightly) less efficient than if the compiler is allowed to reorder. Consider evaluating "x + f()" when x is global and the compiler can't prove that x cannot change during execution of f. If the order of evaluation is fixed from left to right, then it must do the following:

1. load x into a register

2. spill the register holding x to the stack

3. call f

4. reload the "local" version of x from the stack

5. add

On the other hand, if the order of evaluation can be chosen by the compiler, it can do this:

1. call f

2. load x into a register

3. add

(Details depend on whether the caller or the callee saves x, but you get the idea.)

In practice, all competitive optimizing compilers are SSA-based, and thus they fix an order of evaluation early in the compilation process. I always wondered how much performance could be gained by actually delaying decisions about evaluation order.

Not sure if you're tyring to suggest that SSA-based compilers cannot do this reordering. For my example above, GCC reorders as I sketched, while Clang keeps the program order: https://godbolt.org/g/8KDvFo

I think this is done early, during lowering to a "mid-end" intermediate representation (whether SSA or not). This is the best time to do it, since the legality of this reordering is a property of the source language, which you want to forget about in the mid-end. On the other hand, in this particular example (function call vs. global variable access), I don't think there is any reason to delay this; doing the call first should never be worse than loading the global first.

As for deciding other things later, instruction scheduling and global code motion can reorder a lot of things, though calls and memory accesses block many movements.

The knowledge presented in the slides will advance where you are going with your programming in the same way as knowing from which wood chess board is made will improve your ELO rating. Although learning about wood may be more worthwhile goal then learning about topics in these slides.

What is this hoping to accomplish? To convince candidates that they can't print a number? Maybe instead of intimidating people they should be inspiring them?

The knowledge presented in the slides isn’t deep; it’s essential. Can’t imagine a C programmer not knowing that parts of an expression can be executed in arbitrary order. Especially that this is aimed at embedded C programmers. It’s more like playing chess not knowing what „en passant” is.

If the stuff in this slide deck was all "essential", barely anyone would get hired for C dev positions imo...

Knowledge of standards before C99 is not "essential" unless you're on a legacy codebase. Which, to be fair, embedded, is very plausible. But then you're looking for an expert candidate anyways.

There are actively-used platforms that do not have a fully C99-compliant compiler (unpleasant, I know, but such is life :-( ). Also, many products in the embedded field have very long lifetimes (10-15 years), during which they at least need to be maintained. There are a lot of actively-used platforms that did not have a C99-compliant compiler ten years ago, and they are not exactly legacy codebases.

Edit: oh -- and (this I find truly revolting) there are companies that have not updated their coding standards and mandate that all written code should target an older standard (usually C89).

IIRC Visual Studio still doesn't fully support C99. In particular variable length arrays and the _Complex type are unsupported.

Both of which are now optional features in C11.

I agree completely. And honestly, it looks like the presenter is trying to brag about their knowledge of obscure C and C++ quirks through the voice of candidate #2.

There is no doubt these slides wreak of humble bragging and maybe even a bit of shaming, which isn't a good look. Obviously, there are important properties to programmers other than knowing obscure stuff.

That being said... It's still solid and I think the slides have many, many important points, both for developer attitudes and things to know about C and C++. A lot of these obscure quirks can hit you hard if you're not paying enough attention. The order of initializer lists is extremely painful and hits me every so often even though I would probably catch it in an interview.

If anyone takes anything away from this w.r.t. C and C++, I'd say the best thing would just be to always code with `-Werror -Weffc++ -Wall ...`

Hold off on the -Weffc++, it is known for false positives. It is based on a set of guidelines for C++ that appear in Scott Meyer's book (first edition!), but these guidelines are routinely violated in well-written code. For example, it warns for any base class having a non-virtual destructor, warns if any member is missing from an initialization list, etc. Some of the advice is inappropriate for modern (C++11) code.

I feel similarly. On the other hand a good intuition about memory representation is important: the difference between global and automatic variables, linker visibility etc. The unknowledgeable programmer from the slides has no sufficient intuition in my opionion. You code C precisely because you want control over these things.

Personally I don't care about the tiniest details too much. I pick them up as I go and typically at least remember that there was something that I can look up again later. Also it's really easy to navigate around most pitfalls. As the first programmer says, "I would never write code like that".

Eh, this kind of knowledge can absolutely be beneficial when debugging. It won’t help you structure a program though.

There is nothing especially deep about C. Its design is intentionally extremely primitive, almost to the point of being a thin layer of syntactic sugar on top of assembly. In fact, the creation of C was a reaction to a real deep sea monstrosity that had been gaining wide-spread popularity at the time, PL/I - a devilish mixture of COBOL, Fortran, and even assembly, all dressed in a horrible ad-hoc syntax, yet intended for the use in both systems and application programming - basically, C++ of its time. PL/I was selected as the implementation language of the Multics operating system which, in turn, precipitated the creation of UNIX.

PL/I is probably the most unpleasant language I have ever had to deal with. It makes C, without warnings, look as safe as a nuclear bunker.

The best quote that resonates with me, has to come from an unattributed UNIX fortune file:

> Speaking as someone who has delved into the intricacies of PL/I, I am sure that only Real Men could have written such a machine-hogging, cycle-grabbing, all-encompassing monster. Allocate an array and free the middle third? Sure! Why not? Multiply a character string times a bit string and assign the result to a float decimal? Go ahead! Free a controlled variable procedure parameter and reallocate it before passing it back? Overlay three different types of variable on the same memory location? Anything you say! Write a recursive macro? Well, no, but Real Men use rescan. How would a language so obviously designed and written by Real Men not be intended for Real Man use?

C, with it's almost-assembly and fairly predictable semantics was an absolute blessing.

As a consequence we have secure IBM i (PL/I), z/OS (PL/S), and Unisys ClearCase (NEWP) with OS features not yet present in modern OSes, while C coders are the job security of exploit writers.

Personally I find that I don't ever use any of the more oop features of c++, especially copy constructors and assignment operators.

If I'm in a situation where I need to make a deep copy of an object, I need to ask myself why am I making an exact copy of a complex object with lots of internal pointers and state, an expensive operation. If I'm going to modify the result of the copy, perhaps what I should do is write a function that generates a new, different object based on the original one, in which case I would be passing in a const reference to the old object and maybe some other data about the new object I want to create. And if I'm not modifying the object, again why not just pass a const reference? Also, the exact behavior of all these hidden functions can be really hard to figure out. Who wants to try and figure out how many times the copy and assignment operator is called if you do a=f(g(a)); and pass by value? No thanks.

I got up to slide 80, it was really fun; although i'm not sure if the 'knowledge' given in this slide is of really high importance (for example, it explains that static variables are automatically set to 0 upon declaration; i'd better wish my team would always make sure variables are initialized after declaration, unless for performance reasons.)

It's a useful knowledge to have in the sense that it implies these variables are stored in the BSS section. Unless there are highly specific reasons why one would do this, though, I certainly agree that relying on this property is not a good idea.

It can be useful, though. Many years ago, we used it to trim the twenty bytes or so that prevented an updated version of our firmware from fitting into the tiny flash space of a device that had been on the market for quite some time.

It is at the very least useful to know it when debugging code written by programmers who thought this feature was nice and relied on it throughout the code (either because they thought it was a good optimization to make, or because it further obfuscated their code and thus contributed to the security of their jobs).

I seem to recall that one of the previous times this set of slides was presented here ( https://news.ycombinator.com/item?id=3093323 , https://news.ycombinator.com/item?id=6596855) , some people commented that they would hire the less knowledgeable candidate rather than the more knowledgeable candidate. I genuinely can't remember what the justification was. A hope that by not knowing the details of the language and how it was implemented on typical hardware, he was a better programmer, I think.

It feels like some people do programming for the sake of programming (like linguists). Others create poems and exquisite novels with it (authors). I’d argue you don’t need to be a linguist to be a nobel (or pulitzer) prize winning author. Why? It’d be a distraction. You’d start focusing on the wrong kinds of things.

Linguists are still highly respectible people. Society needs them. Just that being a great author does not require you to be one.

On slide 24 they claim that your main should be declared int main(void): https://de.slideshare.net/olvemaudal/deep-c/24-What_will_hap...

Is there any good reason for writing "void" in empty parameter lists? I have never seen one and, unless there is one, that declaration is just useless line noise.

int main() is not a prototype. int main(void) is a prototype. int main() declares main with an unknown (at this point) but fixed number of arguments (i.e., non-variadic). Callers must guess the correct number and types of arguments, the compiler does not enforce anything. In contrast, int main(void) declares main with exactly 0 arguments, to be enforced by the compiler.

For main it doesn't matter much since usually you don't call it yourself, but consider:

    int f();    // no arguments, apparently

    int g(int a, int b, int c) {
        return f(a) + f(b, c);  // at least one of these is fishy...

    int f(double d) {   // oh, the caller guessed wrong. twice.
        return (d != 0.0 ? 0 : 1);
GCC and Clang do not complain about this program even with -Wall (they do with -Wstrict-prototypes).

Apparently void foo() will not error when called with arbitrary values - it's implicitly variadic. This surprised me too, but I confirmed it with gcc, even on -std=c1x and with -Wall -Wextra -pedantic.

Learnt something!

I find this interesting, but for another reason than what the authors probably intended: the guy, who is presented as the "dumb" one (after a while the invisible interviewer is even making jokes about him), actually shows what many people would think intuitively would happen in their C code, so he is a good guideline for a compiler that wants to take that into account.

Either a brand new compiler or an existing compiler like gcc/clang that adds a new flag that performs its regular optimizations as long as they wouldn't break common assumptions about what their C code would do. Of course it would be hard to find these assumptions, but the linked presentation is a good start.

Personally if i was to do something like this i'd use a simple rule: what would be the dumbest, most straightforward way to implement a C compiler? What effect would some expression have in that compiler? Then this is what the "unsurprising" compiler mode should do - perform any optimizations as long as they do not interfere with that effect.

I think it would be a win/win situation for compilers to do that: they'd get to play their performance game and also provide a surprise-free mode without fully abandoning performance.

The real answer for the first example: compiler error due to fancy quotes (“” U+201C and U+201D).

The authors discusses intricate details of the language and yet fails to fix the obvious error in most code snippets:

error C3873: '0x201c': this character is not allowed as a first character of an identifier

error C2065: '“': undeclared identifier

I understand why knowing what the standard is useful, but why should one try to know what happens in a case where the behavior is undefined? It's platform, compiler, and optimization specific and is literally just trivia.

445 slides. I'm curious how long this takes to present in person.

Glad I checked the comments after a few dozen slides.

I think this presentation is supposed to convince you that it’s important to have a deep understanding of both the implementation details and official specification of your language of choice. However my take away is that C is really complicated and has no respect for the least astonishment principle.

I think C predates that principle.

This was prepared to be a 3 hour presentation. Source: https://github.com/bmerkle/twodaycourses/blob/master/doc/Dee...

A huge number of the slides are small incremental changes to a base slide. Things like adding each bullet point, callouts, individual newlines and edits to make code changes easier to follow, etc..

There's like 5-20 pages in the pdf for every actual slide. So it's not that bad.

When I opened that page my laptop's fan started audibly spinning at maximum speed. I can't imagine what it's doing to my battery.

Well the site feels the need for analytics from LinkedIn, Bizographics, NewRelic, Google and Score Card Research. Which seems a little excessive.

Weirdly, there are code examples in the C99 and C11 standards that use "int main()".

PDF link, for those of us who loathe Slideshare: http://www.pvv.org/~oma/DeepC_slides_oct2011.pdf

Thanks! We've updated the link from http://de.slideshare.net/olvemaudal/deep-c to this.

After updating the link, I think the PDF file server can't handle the load. I cannot load the PDF at all, I keep getting a "connection timeout".

The GitHub link posted here might be better.

    # make your own (large) PDF from slideshare url
    # image resolutions available: 320, 728 or 1024 pixels
    # example: $0 http://de.slideshare.net/olvemaudal/deep-c 728 deepc.pdf

    test $# -eq 3||exec echo usage: url $0 {320,728,1024} file.pdf
    browser='curl -s -O';
    #browser='ftp -4';
    test ${#TMPDIR} -eq 0||cd $TMPDIR||exec echo set TMPDIR 
    $browser -o 1.htm $1;
    echo downloading images... >&2;
    tr '\40' '\12' < 1.htm \
    |sed '/-'"$2"'\.jpg/!d;s/\.jpg.*/.jpg/;
    s/.*https:/https:/;s/\".*//' \
    |while read a;do $browser $a;done #slow step;
    test -f  *-1-$2.jpg ||exec echo download failed 
    pdfimage -o $3 *-$2.jpg;
    exec rm  *-$2.jpg 1.htm;

    # pdfimage is sample program included with pdflib
    # http://www.pdflib.com/

I don't loathe it but the site wasn't usable. Thanks.

Thanks, slideshare is painful, but I got to know olve through it so ..

Thank you

In this short example you compile, the compiler complains and then you troubleshoot the errors reported. You don't extrapolate on the variations possible in another dialect.

You don't optimize early, you don't overthink. You can then add platform convention, indentation and other sugar to the base code to fulfill whatever workplace standard or best practice you need to match.

Can we replace "de." with "www." in the URL?

I bet most HN readers don't speak German.

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact