> Yes, venerable GCC is a very important and respected tool. And this is my bug, not GCC's flaw.
> But that bug in my code ruined one month of my work. I coded without realizing tests don't work at all. OUCH!
Sorry to say but that too is your fault. First thing you're supposed to do with tests is make them fail, then make them pass. The only safe way to handle it is that if you've never seen your test as written fail, then it never fails.
As an aside, the first line says:
> Sorry for the provocative title, but I'm too emotional these days.
And this really ticks me off for some reason, probably because the entire web and beyond has moved to that form of expression where one exaggerate and then say "yeah but I have my reason". I miss adult discourse.
> > Sorry for the provocative title, but I'm too emotional these days.
> And this really ticks me off for some reason, probably because the entire web and beyond has moved to that form of expression where one exaggerate and then say "yeah but I have my reason". I miss adult discourse.
I'm on the same boat with you. Worse, people writing these titles also hold to their provocative titles firmly, making discourse almost impossible. Trying to read this loaded articles from title to last dot is exhausting and counterproductive, too.
This is why I also don't publish anything unless I can write a neutral title, and distill the post's tone to a neutral point.
> Sorry to say but that too is your fault. First thing you're supposed to do with tests is make them fail, then make them pass
They did not say they didn't do that, you're assuming.
If I were the person here, I might write a simple test with 'val1="foo"; val2="bar";' to make it fail. That would make it fail because the first byte is different, while I would think it failed because the entire value is different.
You might say that I should have also tried "bar" and "baz" (which would reveal the issue), but it's infeasible to try all possible inputs.
Most code is not written with exactly an excess of time to follow perfect practices every time, and so we must make tradeoffs.
> If I were the person here, I might write a simple test with 'val1="foo"; val2="bar";' to make it fail. That would make it fail because the first byte is different, while I would think it failed because the entire value is different.
This is absolutely the wrong way to do testing and the crux of what I was getting at: you should not change a single line of code within the test itself when making them pass or fail.
If you change the test itself, then it's not the same test, you have not ensured your test works, only that another test works.
I'm sorry if this sounds harsh or if my phrasing is not correct, but it's important to remember this. If "testing your test" implies "making modification to the test", then you're not doing it right.
> > Sorry for the provocative title, but I'm too emotional these days.
> And this really ticks me off for some reason, probably because the entire web and beyond has moved to that form of expression where one exaggerate and then say "yeah but I have my reason". I miss adult discourse.
I completely agree. I've seen a lot of people try to excuse being rude to people because they're extra emotional at the moment. If you're extra emotoinal at the moment, *don't write it*. It's more reasonable to use as a post-facto reason. (Example: Sorry. Yesterday I wasn't in my right mind and I said some mean things to you that I regret saying. It won't happen again.) But it's not at all okay to try to excuse what you're about to write. That's not an apology at all. That's like a person with anger management problems trying to excuse being able to hit people whenever they want.
We all have code that doesn’t get tested “the way it’s supposed to be” and then waste a bunch of time trying to figure out wtf happened.
And clang is really great at catching more issues than other compilers would, and then reporting them better than the others. So the title isn’t far off.
I firmly agree with your second paragraph and the general point the link made about error message being vastly superior in clang.
I don't strictly disagree with your first point, but I disagree with the attitude from many dev regarding tests that lead to it becoming too common like in this article (and I'm not piling on the guy, it's very very common).
It's like people have forgotten what tests are for and thus what you want from them: a test that pass is essentially useless, it completes the green check mark and that's it, it doesn't matter. It's the test that fails that provide you information, that matters to you.
One might be tempted to disagree, tell me that passing test also provide information, by ensuring this and that and I will insist that they take a second and realize that no, they don't. Ultimately, the only value they have is in fact that they can fail, it's always the failure that matter.
There are indeed tests which fails to fail for other unplanned reasons, by that's not really what I'm getting into, I'm aiming specifically for the attitude of tests for the sake of tests.
What you're saying isn't all that precise, but I think I disagree with it.
> It's like people have forgotten what tests are for and thus what you want from them: a test that pass is essentially useless, it completes the green check mark and that's it, it doesn't matter
This is, in my opinion, not true in some cases. Sometimes, I'll write unit tests that never fail, and are purely to allow me to verify one of my assumptions about the code. Basically, the unit test is a tool to force me to think through each line of the code, and that's it.
Typically, when I do the above, I also use a debugger and/or line coverage tool to ensure the test actually reaches part of the code I'm trying to reason about, and with the debugger, this sort of "test-writing as a tool to think about code" is in my opinion much more powerful than just reading code. I do this a lot in code review when I don't understand someone's code, since it's easier to run sample values through it and set breakpoints in a test, than it is to do all that in my head looking at the diff view.
Now, you might "no true scottsman" argue that those are not tests (and indeed, I often don't bother to check them in), but I think they are.
I also think integration and blackbox tests that never fail are fine, and expected. If I add a website uptime monitor (a "foo.com returns 200" type test), should I immediately take my site offline for a while to make sure it fails?
> This is, in my opinion, not true in some cases. Sometimes, I'll write unit tests that never fail, and are purely to allow me to verify one of my assumptions about the code. Basically, the unit test is a tool to force me to think through each line of the code, and that's it.
That is fine and has value, but that is usually called an assertion (and most language have them under some sort of assert() function), which is different from an unit test.
> If I add a website uptime monitor (a "foo.com returns 200" type test), should I immediately take my site offline for a while to make sure it fails?
How do you know if it works otherwise ? In fact for that specific exemple, the most naive version of that test actually misses many failure cases as we've seen again and again in the last decade with status page that fail to detect any error
Assertions are to validate things do not happen. They do not let me trace a sample value through code with a debugger, step through lines with that value, and add further breakpoints.
unit tests let me pick sample values, and I do what I'm talking about using unit test libraries, not assertion libraries. Perhaps you misunderstood me.
> many failure cases as we've seen again and again in the last decade with status page that fail to detect any error
That is a political problem, not a testing problem. The majority of status pages that fail to detect errors are because the company decided to have a human update it because the test failed too often, or did not differentiate "error" and "warn" or provide a human description.
Status page uptime is often a political matter for a company, and no unit testing practices will fix a political problem.
But there are many subtle reasons why a test might not get written. Sometimes it’s deadlines. Sometimes it’s because a component is hard to test in isolation. Sometimes it’s just an honest mistake.
So, while I would encourage everyone to write tests I also won’t judge them for not doing it.
But we are all a bit guilty of this I am sure. It is great to share such stories. Realize we are all falling into the same traps and too shameful to admit it. Then we can fix and improve the technology.
You're not wrong, but the tone of your reply is off-putting. The reality is that many developers can use knowing things you teach them, but teaching people requires a bit of compassion for their relative ignorance.
Sorry if my message has the wrong tone, it was not meant to. I am not a native speaker and rewrote it two times because I thought it sounded too agressive, apparently I still didn't fix that entirely.
I suggest the author of that code try a denser language, one that requires more focus on individual characters, and figure out a more methodical approach to debugging. I saw the error at first glance, but that may just be because the code is in an unusual style (https://news.ycombinator.com/item?id=33768662); suggesting that a second pair of eyes will also help.
> I mentioned this as an afterthought in a nested comment, but it bears repeating.
I agree whole-heartedly.
There is no situation in which `memcmp` should be used with structs. In code review this would have been flagged as an error, no matter how much the author cried "but but but ... there's no padding!".
Even if there is no padding right now (compiler flags, or struct has perfectly aligned fields), at some point in the future this code will be reused in a spot where there will be padding (compiler upgrade, someone adds a new field at the beginning of the struct, etc).
It's the same with serialising structs, you have to do it field by field.
Serializing structs directly is probably fine if you fully control the platform on both sides and you memset to 0 ahead of time (the latter probably only important if compressing or if you are worried about leaking data, which in a closed system you would typically not be...).
Will any compilers cause trouble if the struct is always memset to 0 on initialization? I suppose there's probably nothing preventing the compiler from randomly setting padding bytes to other values but I doubt that'd ever happen?
I think it can. If you have, say char a; int b; in a struct, so there is padding in between. Then do s.a= x;s.b=y; If x and y happen to have the same memory layout, the compiler can replace the 2 assignements with 1 big transfer, and copy random things in the padding.
Interesting! I guess most realistic scenarios would have x and y on the stack though in which case I bet they'd get optimized out completely under normal circumstances.
True, it is a case of severe bad luck. But these are even worse than normal bugs: Some minor change in an unrelated part of the code triggers a cascade of different inlining decisions, and now a comparison is silently returning wrong answers in a very specific edge case.
Another variant is having x as an int. It is supposed to truncate the int on assignement, but instead it simply dumps the high bits in the alignement.
As the saying goes: Working code > crashing code > silent corruption.
It's interesting that there was a post just today about initializing padding and other stack variables to 0 decreasing bugs in C/C++ by 10% without significant change in performance, this sounds like a great example.
That is indeed a nice error message from clang. I presume it is noticing that a `bool` is being implicitly converted to a `size_t`, which is suspicious.
But still, it's good having both clang and GCC. The competition makes both better. If there was only one, it would stagnate.
Many, but not as many as gcc. Gcc being the only game in town for about thirty years gave it a massive head start, and gcc existed during an explosion in different isas and architectures. Every obscure processor in the world seems to have a gcc backend.
Clang much less so. But clang supports all the modern “important” isas, so as a practical matter, it doesn’t matter that much for everyday use.
My impression is, GCC development team is more open to mainlining support for obscure architectures, Clang/LLVM require stronger evidence of demand and maintenance commitment.
Of course, 99.9% of people don't use any obscure architecture, so that's irrelevant to them.
due to clang's license, some vendors like to release their own proprietary version of clang for an architecture (though in cases I'm aware of, parallel unofficial open source support exists as well, though I don't know if this is always true).
Or at least stop jamming so much stuff into one line, and use whitespace around operators!
To be fair to C, problems like this exist in a lot of languages, even more-strongly-typed ones. At some point, a linter comes in handy. It's nice if the compiler has linting built in, but at some point you need to draw a line between "compiler warnings/notes" and "lint to be found by an external tool".
No one linter or static analysis tool will pick up every possible error.
cppcheck picks this up (a bit cryptically):
clang-gcc.c:17:72: error: Invalid memcmp() argument nr 3. A non-boolean value is required. [invalidFunctionArgBool]
if (memcmp(m_result_original, m_result_my_version, sizeof(struct tmp)!=0))
PVS-Studio picks it up too (more precisely):
<source>:17:1: error: V526 The 'memcmp' function returns 0 if corresponding buffers are equal. Consider examining the condition for mistakes.
flawfinder doesn't catch anything either. That doesn't mean "cppcheck is better than flawfinder".
The error here is in the programmer relying on one tool as source of truth.
The competition between gcc and clang has made both compilers better. At this point, it is easy to construct examples that will make either of the two compilers look a lot better than the other. I think it is best for users if they continue to compete and there continues to be no clear winner.
The `&` must be dropped in the call to memcmp (which the author did in the current version).
Ironically, GCC 12.2 warns about this error while clang doesn't :)
The warning message produced by GCC is "'memcmp(...)' specified bound 12 exceeds source size 8 [-Wstringop-overread]", and it is printed even without -Wall, but only when the third argument is `sizeof(struct tmp)` rather than `sizeof(struct tmp)!=0`.
But as @Stratoscope pointed out, memcmp() is still problematic if the struct has padding.
When I see "!=0" in C code, it's immediately suspicious, along the same lines as "== true" or "== false" (or worse, "!= true" and "!= false", both of which I've seen in codebases before.) Just write if(memcmp( ... )) or if(!memcmp(...)).
I would agree on this for some cases (and I suspect more so for != NULL than != 0), but memcmp() != 0 isn't one of them. There are multiple reasons why, some more generic than others, but two of them are very specific to memcmp():
1. There are similar functions (like std::lexicographical_compare() in C++) returning bool instead an int.
2. It's not crazy to read memcmp() as comparing for equality rather than ordering. i.e. I think when you see 'if (memcmp(a, b, n))', it's quite natural for one's brain to read it as "if a and b are equal..."
For someone who uses memcmp() on a daily basis, you probably don't encounter these misconceptions. But as someone who writes C++ and merely deals with C once in a while in the process, both of these possibilities have tripped me up multiple times when seeing if (memcmp(...)). Writing != 0 has always cleared that confusion for me, since it immediately signals the result (a) isn't meant to be read as a Boolean, and (b) might be negative, thus putting my brain on alert.
When I used to write a lot of C code, my solution for this was to avoid calling memcmp() directly at all when I only cared about a yes/no on the match. Instead I used a wrapper I called memeq():
That said, this code, even with my updates, was wrong all along. Using memcmp() on two structs, even with a wrapper, is a bad idea. What if there is padding?!
This particular struct will not have padding, but in the more general case you shouldn't count on that.
What you really need to do is have a function that explicitly compares the x, y, and z fields of these two structs.
Then you won't have to worry about uninitialized padding, you will never get the length wrong, and everything will always just work.
Reading this makes me appreciate how nice it is in c++ to just be able to do operator==() = default; to let the compiler generate the appropriate comparator automatically
I disagree. I avoid implicit casting, including truthiness. "== true" is indeed suspicious because you aren't avoiding anything, but to me "!= 0" or "!= nullptr" are crisper.
I don't know what you mean by "crisper", but in general I've noticed that extraneous code only increases the chance of bugs (and this is a good example of that.) if(foo) and if(!foo) are very common idioms and should be committed to the subconscious memory.
0 is the false value in C, no implicit casting or "truthiness" involved. The C standard explicitly defines if statements in terms of zero: "the first substatement is executed if the expression compares unequal to 0... [else] is executed if the expression compares equal to 0".
C does not have a boolean type. (The `true` and `false` defined in stdbool.h are macros expanding to the integer literals 1 and 0, respectively.)
They aren’t testing for ‘false’ in this case but for success.
Functions which return 0 on success I always compare with 0 as “if (foo()) error();” easily gets confusing. If I see “if (foo() != 0) error();” then I can easily tell what’s going on because C is funny like that.
Conversely, if I’m testing for truthfulness or NULL I’ll use the boolean operators.
Yes, although they still evaluate to 0 and 1 and conditional logic is still defined in terms of 0 and 1. Maybe that will change at some point in the future as well!
This is a stylistic choice. In my personal opinion, I agree with you. But I've seen styles that swear by the other way that you and I don't like. And I respect those who hold those opinions in good faith. I think someone once told me misra-c requires != 0.
GCC is still the king of the hill though - the amount of support for various ISAs, new features like C++20 stuff - it's far from lagging clang in hugely significant ways. (GCC user since 2.95 on Solaris/SPARC!)
I know a company that used to build all code with both GCC and clang in parallel on every build, and use binary output from GCC but error/warning messages from clang.
Honestly if nothing else that sounds amazing from the perspective of maintaining portability. Allowing you to exploit the strengths of each compiler is also nice:)
It is wise because this dichotomy won't last forever. GCC will slowly die (because corporations seems to hate GPL and LLVM is their child) and everyone will need to migrate to LLVM sooner or later.
Ironically, while LLVM lives on, Clang is slowly dying and is failing to compete with GCC. Clang's C++20 support lags behind GCC because both of its major corporate sponsors, Google and Apple, have seemingly moved on (after failing to get the ABI changed, Google went over to working on Carbon, and Apple doesn't care as long as the C++ support is good enough to compile LLVM and the C++14 subset required for Metal's shading language). All the other vendors who take advantage of Clang's permissive licensing seem unwilling to upstream improved C++ support.
For x86 in certain cases, I think gcc produces better code still. But x86_64 / aarch64 looks different, and there I would assume the gap between gcc/clang is not big.
You shouldn't be: things that happen rarely happen all the time at scale. If you know of one person hit an edge case, chances are that many more will in the future (and others have before but didn't bother to tell anyone about it). Because of this it is always worth it to handle these edge cases, a couple hours implementing the lint will pay itself several times over by saving others from a single person a lot of time, or a lot of people a little bit. Of course, who pays for that time is different so it is easy to say "sucks to be you" and not do anything about it.
You don't have to _build_ anything, you are simply using the Clang compiler with the GCC standard library
The C++ compiler and the C++ standard library exist as separate components. You're free to mix and match them. You just need to configure the relevant compiler and linker flags (which may be harder on GCC than Clang, for instance)
And if you mean to just #include <ranges>, that's not right at all
For instance, installing LLVM/Clang on Fedora Rawhide uses Redhat GCC 12.2 as the default stdlib right now, and including <ranges> works perfectly fine:
> The C++ compiler and the C++ standard library exist as separate components
this is not really true. Aside from the fact that the standard library need not be a library at all and could just be built-in, in practice the standard library provided by a compiler is intimately tied with that (specific version of the) compiler. libstdc++ uses plenty of GCC specific non-standard extensions and relies on implementation specific and often undocumented behaviour.
Clang tries to implement all the same extensions and GCC developers try not to break clang support too often (sometimes even providing clang specific workarounds or gating out non-implemented features), but as you have experienced, breakage does happen form time to time.
In this case the breakage is not caused by extensions, though. It's caused by a bug in the C++20 support in clang, at least according to the GNU people.
Both compilers are good enough.
It's more about style.
1. No need to save spaces, your fingers won't fall off from pressing the buttons.
2. Pairs of brackets are not so much for the compiler (because they can be freely nested inside each other), but for people.
3. The inverse notation of the condition saved embedded programmers from such errors, and it saves, and will save:
I kinda disagree, I still find GCC's code generation better a lot of times. I have seen Clang generate some pretty crazy code.
While this is a different than warning you about errors it still is an important aspect of compiler.
Having used both compilers quite a bit they each have their own strengths. Although I will add clangs code generation as gotten a lot better over the years that it does not generate crazy bad machine code as often anymore, but I still see it happen.
> Yes, this is a problem of weak static typing of pure C. Pure C is a great tool for some jobs, but here be dragons, as they say.
> The problem is that (sizeof(struct ...)!=0)==1, so size=1 always for memcmp(). Instead of comparing two tmp, my code compared only 1 byte of each structs.
This isn't just due to the type coercion of the third argument to memcmp though, is it? It's at least as much due to the fact that the _return value_ of memcmp is a valid `if` condition. I know that "zero is false, everything else is true" for integers is just as much a "feature" of C as the coercion of the comparison to a size_t, but I feel like it's less obviously a useful feature, given how easily it is to just manually write `!= 0` if you happen to want the existing behavior. Are there any compiler warnings or hints for being able to disallow this specifically?
C didn't even have a separate `bool` type until C99 (https://en.cppreference.com/w/c/types/boolean), and even then `true` is simply a macro that expands to 1. The widespread use of int to represent booleans in legitimate code makes it impractical to warn about this practice (at least in existing code).
Most of the linters I'm aware of do the opposite. You could probably create rules for one that require the condition of an if statement to always contain a comparison, but it's worth remembering that the comparison operators themselves return 0 and 1.
> You could probably create rules for one that require the condition of an if statement to always contain a comparison, but it's worth remembering that the comparison operators themselves return 0 and 1.
Yep! I'm thinking that it might be possible to approach it from the side of how the expression is constructed rather than what it actually yields when evaluated though. I'm not positive, but my instinct is that it should at least be technically possible to statically detect whether or not a condition was constructed from a set of recursive rules like this:
1. `true` or `false` from std.bool.h
2. a variable or struct field of type `bool` from stdbool.h
3. a comparison expression using either equality or inequality operators
4. some composition of instances of these rules with boolean operators
5. a function call where all returned values are instances of these rules (applied transitively across nested function calls)
There might be a couple other cases where known booleans could potentially be constructed. Off the top of my head, macros could probably be supported by doing this check after the preprocessor rather than before, but I imagine it might not be possible to handle things like invocation of a function pointer. My point is more that I think it would be possible to enumerate almost all of the possible cases. I imagine few existing C codebases would be able to pass that lint fully today, but it would probably be fairly easy to enforce it on a new codebase from day one without too many issues (and if there are occasional outliers like function pointers they could potentially just be individually suppressed rather than allowing violations globally across the project).
The real issue here is that memcmp has a horrible interface that only C programmers would find acceptable. The author seems to understand this to some extent.
Sure Clang compared to GCC, Clang would always be better coming out at the top. But it's not code quality and better checking: it's the spirit for openness.
Obviously, such an important tool like clangd should be run by everyone, but saying that Clang is better is a false claim, Clang is still slower than GCC.
Indeed, clang has better error messaging than g++/gcc. Just my 2 cents, if you have a choice, do your C or C++ bit in Rust (which like Clang, is llvm based). I've been writing C/C++ for a very long time now, it's time to move away from it slowly....to a better tool like Rust.
He spent a full day* writing unit tests to cover half of the important cases, then a month debugging why the code still doesn't work, but you think the initial day was the waste of time?
* He doesn't say how much time he actually spent, maybe he spent a whole week writing his (poor) unit tests. Still, that part was the waste of time?
You can do the same with Meson. I just added the warning, debugging, and sanitizer options in-line for clarity. The benefit (I'm not sure if Bazel has this too) is you only have to run the commands once to set up the build directories. After that, you can just use the ultra-fast ninja and almost ignore Meson completely.
> But that bug in my code ruined one month of my work. I coded without realizing tests don't work at all. OUCH!
Sorry to say but that too is your fault. First thing you're supposed to do with tests is make them fail, then make them pass. The only safe way to handle it is that if you've never seen your test as written fail, then it never fails.
As an aside, the first line says:
> Sorry for the provocative title, but I'm too emotional these days.
And this really ticks me off for some reason, probably because the entire web and beyond has moved to that form of expression where one exaggerate and then say "yeah but I have my reason". I miss adult discourse.