I don't know anything about the code quality of the author, but in my experience, the ones that complain about static checking and other guard rails the most, are the ones that need it the most, because in their hubris of thinking they won't make those kinds of errors, they make the most egregious forms of those errors. e.g. smart enough take pot shots about the possible downsides of the process, but not smart enough (or self aware enough) to see their own fallibility.
I remember an argument with an engineer about static code analysis years ago. Without a trace of irony, he honestly said "I know it's high quality code. I wrote it. What could a program tell me about it that I don't already know?" You can guess how much of a nightmare his code was to work with.
This isn't a comment on the original article or the author at all. Just my experience with a certain type of engineer.
There must be a corollary for programmers who have their preferred set of static analysis tools, but bristle in the same manner at new static analysis tools, or other peoples' static analysis tools.
I'll give a legitimate example I've had with Python and Mypy. The codebase I work on uses static typing extensively, and largely it's great. However Mypy is... well it fails open badly in obviously incorrect cases. The problem is the teams are by now socialized to trust Mypy and get surprised when very obvious things fail in production because they thought the tool would catch it.
This is the real risk of static analysis IMO, putting undue trust in the tool(s).
hahaha, my inclination is almost always the exact opposite. Half the things I wrote a year ago are shit and need to be rewritten and I know it and so do all of my users
In my experience, the people who develop static analysis tools and add warnings to compilers are aware of the problems. They know that turning on all warnings is the path of insanity.
It’s a knob that you turn. You get useful information as you start to raise the warning level, but as you increase it higher and higher, you start to spend more and more time managing and silencing warnings at specific locations in your code, or dithering about how to handle some particular warning that does not (or even can not) manifest as actual incorrect program behavior.
I’m the first person to enable Wall Wextra and a few others, and I even turn on Werror religiously during development. But there are some super noisy warnings around. Static analysis tools are wonderful but they usually contain rules that are unreasonably picky, for most people most of the time. Usually these rules are off by default.
I read this article not as complaining about static checking but as warning that a rule that compilation should not produce any warnings can be counter-productive if many potential problems that get reported are red herrings.
As an extreme example for C, if you had a compiler that issued a warning for every potential int overflow, I think every programmer would ignore those warnings because it would report a warning on such statements as
int foo = bar + 1;
and preventing that requires one to write
int foo;
if(bar != INT_MAX) {
foo = bar + 1;
} else {
???
}
Even ignoring that figuring out what to write for ??? can take a lot of time, that blows up the code. Macros can make the source smaller, but your binary will still be larger and slower.
In better languages, the compiler will write that code for you, and eliminate it whenever it can prove that the condition will never be hit.
This is one of the philosophical differences between C and other languages that makes everyone consider C unsafe. A legacy language that should no longer be used, because all of its users are perpetually incentivised to take shortcuts precisely like the one you've just described.
Most programming languages are not expressive enough to reliably eliminate redundant overflow checks (or overflows); afaik doing so in general way requires something in the ballpark of refinement/dependent types
In which case the compiler will emit the overflow or bounds checks, which is fine.
If the developer is certain that a specific add operation should use unchecked add for performance, then they can override the behaviour as a spot micro-optimisation. That then becomes a "self documenting" thing, easily found by a simple keyword search.
To give you an idea of just how rarely this is actually needed or done, there are less than a thousand uses of "unchecked_add()" in all of the Rust code on GitHub!
If you don't know what ??? should be, your code's going to misbehave in production - at best crashing, at worst causing data corruption or an exploitable vulnerability.
If "???" is just "foo = INT_MIN;" clang will optimise it all down to a single addition (see https://godbolt.org/z/K1ee8h3TM). If it's always "foo = INT_MIN;" across all of your code you can just use "-fwrapv" to override the C standard and define signed addition as two's complement wrapping.
GCC will ignore the possibility of overflow in signed addition if you don't do anything special (e.g. https://godbolt.org/z/Pc76c5E9z).
You can use "-ftrapv" (or "-fsanitize-undefined-trap-on-error -fsanitize=signed-integer-overflow") to make bigger, slower, safer code that fails fast on unexpected overflows across the board.
You can do the rather unergonomic "__builtin_add_overflow(bar, 1, &foo);" to tell the (GCC or clang, at least) compiler "???" is two's complement wrapping. You can do the cheeky, non-portable, but predictable on the same compiler and platform "int foo = bar + 1L;" to do the same thing. Combine these with "-ftrapv" and you'll get code that short-circuit blows up for unchecked unsigned overflow with the opportunity to produce faster code where you either are damn sure it can't overflow or where you want two's complement wrapping on overflow.
If it's not in a tight inner loop avoiding a check is premature optimisation - if you're consuming I/O and doing lightweight processing on it, an overflow flag test on signed additions will not make a measurable performance difference, nor will it blow out your binary size unreasonably.
IMO it's not a red herring to have signed integer overflow flagged as a potential error. The error is the failure to have an answer for "???".
> If you don't know what ??? should be, your code's going to misbehave in production
Not necessarily. The programmer may know the ??? code will be dead, but it’s not realistic to expect the compiler to deduce that.
For example, the programmer may know that, in this program, baz has been checked to be in the range [0,999] ten layers up the call stack, and bar was computed from it five layers up by calling quux, and thus will be in the range [31,98355], which is fine, given that this program tests that int can hold that range at startup.
Not all functions in a program have to be free of undefined behavior in isolation to make a program free of it.
That's a great example of knowing exactly what ??? means: the program is misbehaving and should terminate.
It's dead in the current call stack, but over time programs change and that is how errors creep in. If it's a precondition that _baz_ is in a particular range for the function to behave, why not check that in an assert and save yourself or a future programmer a ton of stress trying to work out why some unrelated part of the code blew up when they made what seemed like a simple change nine layers up the call stack?
There's no runtime cost to an assert in production code, but it makes your intent clear to both the compiler and other humans.
This is actually my annoyance with monadic error handling. When I'm writing a little command line utility, I don't really want to explicitly handle every edge case. Just let the damn program crash and shame on me.
What level of static checks are you comfortable with? Because I don’t exactly think you’re wrong, but writing deviation reports for nearly every cast is a bummer, and I think many who have worked with high levels of static checks are familiar with the situation where there is no possible way to write a line of code that does not cause at least one message. I’ve spent days writing justifications for disabling particular messages (justifications such as “analysis tool authors misunderstood misra-c, here is evidence” or “reason for message is based on single paper from 1978, does not align with programming practice of the past 30 years”).
I’m net positive on static analysis, but I will forgive a lot of grumbling. It’s a lot of effort and does not catch many issues.
Furthermore, point 4 in the article (fixing unimportant warnings can cause bugs elsewhere) is real. I mean, it indicates issues with test coverage, but who doesn’t have issues with test coverage? Oh, 100% unit test coverage? Is that branch coverage or MC/DC coverage?
But anyway, on one team, when making warning fixes we would compare the final binary to show that with our compiler (the one that matters) our change produced identical output to the previous commit. If there actually was a bug that the warning found, well that was a whole different procedure.
OTH: If the warnings can't have false positives, they would be errors.
There's a lot to talk about 'warning hygiene', it's part of the daily life of C/C++ coders since it's very common that existing 'warning clean' code bases suddenly spam new warnings after upgrading compilers to a new version (that's also why it's good advice to not lambast an unfamiliar code base for having warnings before investigating why that is, it may simply have been developed with a slightly older compiler version than yours).
I don't agree with the author though: warnings should be turned to 11, then investigated, and locally suppressed if they turn out to be false positives (which is very frequent if you turn the warning level to max - also because some of the 'fringe warnings' are merely coding style suggestions).
Also: static analyser tools almost always depend on additional hints outside the language spec (usually provided through asserts), without those additional hints all static analysers produce a ton of false positives because C and C++ alone don't provide the builtin language features to express enough 'intent' to the analyser.
Don’t 100% agree with this, but it reminds me of a former coworker who got annoyed by all the “noise” of warnings (in a C++ codebase) so he set the flag warn=off. When I turned them back on, there were over 3000 warnings. Including one about a format string error that literally said “format string does not match arguments, this code will crash if executed.”
Warnings should be checked by CI. Adding a new warning should fail the build requiring you to either deal with it, or add a disable comment to the section.
He's good to very good when talking about performance and simd tricks but I don't know if he's particularly sage when it comes to general programming practice.
In this case I think he's probably wrong about compiler warnings but basically right about a lot of static analysers.
I'm not sure of your familiarity, but I've worked with Daniel on several projects, and think you might be guessing wrong about his attitude toward warnings. He's not the sort who ignores warnings. His tendency is to change his code so they don't happen, even if he doesn't think a particular warning is helpful or necessary.
I'm pretty sure his point isn't that the end user should get in the habit of ignoring warnings, but that those who decide which warnings are produced at which level need to be conscious that adding too many false positives can be counterproductive. His target audience here isn't application programmers, but those who decide which warnings are made default.
Hmm...when the memory checker came out for Xcode, I ran it on my code-base, parts of it decades old.
I still haven't adopted ARC.
I expected mayhem.
Turns out it "found" two false positives, i.e. leaks that are mandated by the low-level Objective-C APIs. I put the required incantations around them to shut up the checker and then I was good to go.
Keeping it simple, I guess...
¯\_(ツ)_/¯
Oh, and back to my thesis about "safyeness"[1], I certainly felt much better about my code once those checks had passed, even though it hadn't changed.
There's an overwhelming bias in the profession towards writing code that creates some kind of result first, and then figuring out what it does later.
I don't think it's new or specific to tooling practices, it just reflects seasonal fashions based on whatever the last problem someone had was: "the old stuff was too restrictive" vs "the old stuff blew up too often". And there is always difficulty at the beginning of the specification in that feature design tends to change partway through, so guardrails in implementation can hinder the path towards seeing the result.
However, I do think it's countered in some sense by designing towards intentional wrongness: In my code it's always been habitual to first write skeleton code that is a stub that always "returns true", prints out string literals that are what I want, or other such things. Where something a little more complex like managing the state of a video game scene is called for, I've learned to turn towards first enumerating a large number of states, and then gradually parcelling it out into more specific data and algorithms. The reason why that works is because it means I am starting from a fundamentally static position: I've exactly specified what states the system can be in, and haven't taken on unexpected surprises. So I know that I don't cover everything, but I do understand what I covered.
In contrast I think there's also a mindset of "open-ended permissive, fix as you go" where the system is allowed to do anything at anytime, and features are shipped broken and maybe discovered in test. I believe this is what results in ball of mud designs because, with no central point of enumeration, it allows the specification to creep up from any direction. But - it gets a result. And as the system's ambitions grow, the temptation to throw open the doors of permissiveness also grow. And that leads to the "alternate hard and soft layers" pattern[0]. Where we fail isn't in not using that pattern, it's in not recognizing when it's forming and heaping too many concerns on a specific layer.
I prefer maximum warnings and use of #pragma to disable a warning at a specific point when I know the warning is useless/wrong. That way I can get to a totally clean build with zero warnings output. The last significant piece of C I wrote:
# Turn on absolutely all warnings and turn them into errors
CFLAGS += -g -Wall -Wextra -Wno-unused-parameter -Werror
The reason I do this is if I leave some warnings in I can get complacent "oh, there's always a few warnings" or "that's expected" and then miss something. So, I like all the warnings and warnings treated as errors.
well, clearly not "maximum warnings" since you are missing a lot of them.
-pedantic -Wunreachable-code -Wcast-qual -Wswitch-default -Wconversion -Wshadow -Wstrict-aliasing -Winit-self -Wcast-align -Wpointer-arith -Wmissing-declarations -Wmissing-include-dirs -Wno-unused-parameter -Wuninitialized and thats not even all that are useful.
Also, don't use -Werror if you can
Warnings will change based on compiler, compiler version and a countless number of other factors that make working with code that turns them into errors a pain.
I actually do what parent does for work projects but I also control the compiler and can decide when to upgrade. Like everything it's a trade-off and if I wanted wide support for different platforms I would consider your advice.
A good strategy is to use -Werror in development mode and CI builds, but not in 'distribution mode' (or alternatively: use -Werror for your own code, but not for imported libraries).
Man, you should try Clang's `-Weverything` it's… well you know what, I'll try that on my stupidly high-quality cryptographic code:
$ make "CC=clang -std=c99" "CFLAGS=-O3 -Weverything"
clang -std=c99 -O3 -Weverything -I src -I src/optional -fPIC -c -o lib/monocypher.o src/monocypher.c
src/monocypher.c:262:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u8 tmp[64];
^
src/monocypher.c:231:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u32 pool[16];
^
src/monocypher.c:337:12: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
const u64 s0 = ctx->h[0] + (u64)s[0]; // s0 <= 1_fffffffe
^
In file included from src/monocypher.c:54:
src/monocypher.h:289:9: warning: padding size of 'crypto_poly1305_ctx' with 4 bytes to alignment boundary [-Wpadded]
typedef struct {
^
src/monocypher.c:410:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
size_t nb_blocks = message_size >> 4;
^
src/monocypher.c:437:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u64 c = 5;
^
src/monocypher.c:498:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u64 v0 = ctx->hash[0]; u64 v8 = iv[0];
^
src/monocypher.c:621:10: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
size_t nb_words = message_size >> 3;
^
src/monocypher.c:600:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
size_t nb_blocks = message_size >> 7;
^
src/monocypher.c:640:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
size_t hash_size = MIN(ctx->hash_size, 64);
^
src/monocypher.c:786:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u8 hash_area[1024];
^
src/monocypher.c:874:10: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u32 next_slice = ((slice + 1) % 4) * segment_size;
^
src/monocypher.c:801:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int constant_time = config.algorithm != CRYPTO_ARGON2_D;
^
src/monocypher.c:1170:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
i32 q = (19 * t[9] + (((i32) 1) << 24)) >> 25;
^
src/monocypher.c:1337:5: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u8 isodd = s[0] & 1;
^
src/monocypher.c:1349:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int isdifferent = crypto_verify32(fs, gs);
^
src/monocypher.c:1433:7: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
i32 *quartic = t1;
^
src/monocypher.c:1500:5: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
fe x2, z2, x3, z3, t0, t1;
^
src/monocypher.c:1650:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u64 carry = 1;
^
src/monocypher.c:1676:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u32 B[8]; load32_le_buf(B, b, 8);
^
src/monocypher.c:1756:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int is_square = invsqrt(h->X, h->X);
^
src/monocypher.c:1956:8: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int lsb = v & (~v + 1); // smallest bit of v
^
src/monocypher.c:2015:7: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int h_digit = slide_step(&h_slide, P_W_WIDTH, i, h);
^
src/monocypher.c:1994:12: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
ge_cached lutA[P_W_SIZE];
^
src/monocypher.c:2185:5: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
fe tmp_a, tmp_b; // temporaries for addition
^
src/monocypher.c:2540:5: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
fe t1, t2;
^
src/monocypher.c:2641:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int is_square = invsqrt(t1, t1);
^
src/monocypher.c:2696:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int is_square = invsqrt(t3, t3); // t3 = sqrt(-1 / non_square * u * (u+A))
^
src/monocypher.c:2775:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u32 t[16] = {0};
^
src/monocypher.c:2815:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u32 m_scl[8];
^
src/monocypher.c:2870:22: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
crypto_poly1305_ctx poly_ctx; // auto wiped...
^
src/monocypher.c:2925:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int mismatch = crypto_verify16(mac, real_mac);
^
src/monocypher.c:2953:6: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
int mismatch = crypto_aead_read(&ctx, plain_text, mac, ad, ad_size,
^
33 warnings generated.
clang -std=c99 -O3 -Weverything -I src -I src/optional -fPIC -c -o lib/monocypher-ed25519.o src/optional/monocypher-ed25519.c
src/optional/monocypher-ed25519.c:165:8: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
u64 in = K[i16 + j] + ctx->input[j];
^
src/optional/monocypher-ed25519.c:158:9: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement]
size_t i16 = 0;
^
2 warnings generated.
ar cr lib/libmonocypher.a lib/monocypher.o lib/monocypher-ed25519.o
clang -std=c99 -O3 -Weverything -shared -Wl,-soname,libmonocypher.so.4 -o lib/libmonocypher.so.4 lib/monocypher.o lib/monocypher-ed25519.o
ln -sf `basename lib/libmonocypher.so.4` lib/libmonocypher.so
doc/doc_gen.sh
---
Well…
Yeah.
That is cranking up warnings up to 11. Now this is C99 we're talking about, how about removing that obviously useless `-Wdeclaration-after-statement`?
$ make "CC=clang -std=c99" "CFLAGS=-O3 -Weverything -Wno-declaration-after-statement"
clang -std=c99 -O3 -Weverything -Wno-declaration-after-statement -I src -I src/optional -fPIC -c -o lib/monocypher.o src/monocypher.c
In file included from src/monocypher.c:54:
src/monocypher.h:289:9: warning: padding size of 'crypto_poly1305_ctx' with 4 bytes to alignment boundary [-Wpadded]
typedef struct {
^
1 warning generated.
Ah, this one's may be real. Let's see this struct:
typedef struct {
// Do not rely on the size or contents of this type,
// for they may change without notice.
uint8_t c[16]; // chunk of the message
size_t c_idx; // How many bytes are there in the chunk.
uint32_t r [4]; // constant multiplier (from the secret key)
uint32_t pad[4]; // random number added at the end (from the secret key)
uint32_t h [5]; // accumulated hash
} crypto_poly1305_ctx;
Ah, I see: we end by an odd number of 32-bit words, and in the 64-bit architecture I'm compiling this on, there will be 4 bytes of padding. But then I have two problems: first, who cares? Second, I also target 32-bit architectures on which there will not be any padding, and adding it manually would be wasteful.
There's the `#pragma` to silence the warning of course, but it's ugly, and I'm not sure it is strictly portable (unless the standard says compilers should ignore the `#pragma` if they don't understand it, but I confess haven't checked).
I use -Weverything and turn off two warnings by pragma, which are -Wpadded, the one that generated your last warning, and -Wc++98-compat because I use `alignof()`.
The pragmas may not be portable, but that doesn't matter in this specific case. I surround them with the equivalent of:
> (unless the standard says compilers should ignore the `#pragma` if they don't understand it, but I confess haven't checked).
It does (Section 15.9 [cpp.pragma]):
> Any pragma that is not recognized by the implementation is ignored.
However, Clang has the -Wunknown-pragmas warning which will fire on pragmas it doesn't recognize, so you'll probably want to do what gavinhoward does and guard non-Clang pragmas with compiler-specific #ifdefs.
Given your example actually uses "-std=c99", the "incompatible with standards before C99" warning really feels it should be disabled, and the fact it isn't really feels like a bug to ime.
So much dev time goes up in smoke because of pedantic and overzealous linters, formatters, etc. in PR checks and CI pipelines.
Your whole PR gets held up because of a style nit like an unused import or trailing whitespace. That forces you to pull up the PR in Github, dig into the CI failure, check out your branch again, make the fix, commit, push, and wait for CI all over again. If your PR was previously approved, then you have to bug the reviewer to re-approve since the review gets dismissed when you push new commits.
I know this isn't exactly what this article was talking about, but it's such a big frustration to me. I'm all for leaving warnings like this on and cleaning them up over time, but yeah I strongly believe that turning all lints and checks and warnings all the way up and then gating PRs behind them is a recipe for huge developer frustration and loss of velocity.
Sounds like your team may benefit from pre-commit hooks!
Our team has an onboarding script we have every new dev run that (among other things) installs pre-commit. Our main repo includes hooks for style, max file size, etc.
If there's a style nit, git will yell at you and not accept the commit. If possible, it'll actually correct the style issue (e.g. blackify a python file for you) and you can just stage the updated file.
We still have CI/CD, but that catches other errors, not linting issues. This method has worked pretty well for us so far.
These hooks must be really high quality — fast and reliable. Otherwise they will get in the way, slowing down development or even breaking git.
List of problems include:
• the hook runs tools directly on the working directory, and not on the stage/index, so it lints against a mixed version of the code, and not what is actually being committed.
• the hook tries to fix the problem above, but doesn't understand submodules and screws them up
• a new version of a linter adds a new warning that fires on existing code, and now nobody can commit anything without disabling the hook.
• hook expects the tools to be in PATH that isn't set as expected when git is used via IDE or some other GUI tool
We have a fairly good middle ground - our linters only alert on changed lines. They run locally with a commit hook, so you can fix them before they hit the build. A clean linter run isn't required to merge, we trust devs to review the warnings and decide if they should fix them. Also you can turn off the github "feature" that dismisses the review when you push a new change. Trust devs to only make trivial changes after the review - or to re-request the review themselves if they make bigger changes.
A lot of this is a symptom of CIs being deliberately dumb to avoid being wrong, and our workflows being shit. Fuck the CI, the velocity is quite bad just from the programmers machine.
One small thing that arcanist (random example) does that raw git does not is making it very "automatic" to make a really simple change to some code. Even theoretically very trivial things like guessing which files you meant to add to the commit can be a real time saver.
In fact, contrary to what I just wrote, it's not even entirely about time - a lot of this is really about mental context switching.
(Also it's not universally true that you need re-approval)
Right thinking people can also disagree on what is a false-positive. I remember a dialog between two people about one such thing. The summary is:
1. No improper behavior occurs currently because of the item flagged
2. However, reasonable changes to other parts of the code could cause the problem to occur
3. There is an obvious way of rewriting that avoids #2.
One person argued it was a false positive because of #1, while the other argued it was a true positive because #2 and #3 combined means you definitely want to change the code...
Our general rule is that if you think the linter/warnings/style preference is wrong, open a PR up with the upstream linter project and convince them they should remove it. Otherwise just change the submission to pass.
Compiler generated warnings are generally pretty high quality. Dynamic analysis tools generate high quality warnings. Static analysis tools that work on a significant amount of the AST seem to generally give good enough warnings.
It's the shallow pattern based, heuristic tools seem to generate a lot of noise.
When I have "start from an empty buffer" code I try to crank the warnings up to "ten" at least. It helps a lot.
But I agree that all the way to "eleven" is usually counterproductive. And rarely does one get to start with an empty buffer, especially given you're going to #include something.
- -Wall -Wextra, and -Weverything with some exceptions;
- address, memory, thread, and undefined sanitizers;
- fuzzing (at least some sort of), coverage-guided fuzzing with sanitizers;
- static analysis, starting with clang-tidy;
- other static analyzers, such as Coverity, CodeQL - have the least priority.
An example of how everything of the above, everywhere, all at once is applied in public can be seen in the ClickHouse repository: https://github.com/ClickHouse/ClickHouse in the per-commit checks. But keep in mind that the checks are really heavy - run for multiple hours, and we have to apply kludges to support GitHub Actions with spot instances...
C and C++ developers who have been around long enough understand why clang specifically says not to regularly use -Weverything.
Historically even -Wextra has resulted in spurious warnings for perfectly valid and clean code, as most if not all -Wextra warnings are heuristics. -Weverything is also all heuristics, but even more volatile and debatable. Any worthwhile -Weverything heuristic will eventually move to -Wextra.
So, sure, use -Weverything on occasion to see if there's anything interesting to take note of. But if you include it by default you're inviting fatigue and unnecessary rewrites, which may be just as likely to result in the introduction of bugs as to fix any bugs.
I love -Weverything. I turn off two warnings by pragma in headers (-Wpadded, -Wc++98-compat), but other than that, I make the code obey all of warnings. This means things like explicitly marking unused arguments, marking pointer casts with a void pointer cast (with an assert to ensure alignment is okay), etc.
But my code will build cleanly everywhere. That's really nice.
In the epistemically Lovecraftian realm of UB insanity that is C and C++, I happily kneel at the altar of following petty rules if it means it can keep my code safer from bugs.
The post author is basically staring in the face of decades of mind-rending CVEs and, when glancing over to something that can preclude most of them, says, "Yeah but this is annoying and distracting. I need to not be annoyed or distracted so that I can focus on writing good code (unlike all the programmers that came before me)."
> when glancing over to something that can preclude most of them
That's where we vehemently disagree. IME some diagnostics promote worse code.
But my argument regarding -Weverything comes down to something more practical--your code may compile today, but it will stop compiling tomorrow when the suite of diagnostics and their implementation change. And in a project maintained by more than one person, attempting to keep abreast of -Weverything is a losing battle, with the only benefit being aesthetic. Again, useful diagnostics will migrate to -Wextra, and the best of them to -Wall.
You're arguing against a strawman, alluding to those who resist -Wextra or even -Wall. But -Weverything is defined as a testing bed for diagnostics. By definition even the compiler maintainers disagree about the value--how much it helps versus incentivizes unnecessary, even dangerous code modifications.
Rather than relying on -Weverything to identify flaky C or C++ code, perhaps one should look into not using those languages for the bulk of their application. That way you avoid the most problematic areas of C and C++. Running around throwing casts everywhere to satisfy -Weverything is running in the wrong direction.
-Weverything to my mind is something that might only seem useful to those copy+pasting code from Stack Exchange or now GPT. I'll stick with my well worn copy of the C and C++ standards, and spending my time proving the correctness of hairy code.
> But my argument regarding -Weverything comes down to something more practical--your code may compile today, but it will stop compiling tomorrow when the suite of diagnostics and their implementation change
Yes, but I always use the latest Clang. And I am a one-man show. Totally not a problem for me.
I think it's generally preferred to enable most warnings but have some annotations to silence false positives, preferably with sufficient comments accompanying those annotations to help the code reviewers.
I've never thought that's a very good example of introducing a bug while trying to fix warnings. The warning was correctly pointing out that the code was broken, and the obviously incorrect "fix" merely made it never work instead of sometimes work, but sometimes working really wasn't good enough either.
Iunno, for this example, isn't this actually UB according to standard C? I don't think it'd be that objectionable to put e.g. a semantic comment saying "yeah no this is actually not UB because we check in ./configure that your compiler is one known to not do bad things," *if* such a thing were standardized across linting tools.
This is not standard C, this is already in the land of compiler extensions and platform specific code. The behavior is defined by the compiler and no compiler should treat this as UB.
It is using compiler intrinsics, I grant, and taking advantage of IB is yes, well-defined given an implementation (as we have here) … but it seems like it's still C?
Here, I'm not 100% certain this code isn't UB: we type-pun a __m512i and a uint64_t … I'm not certain that's not UB. If I'm forced into writing C, I try to avoid doing that at all costs … I'd rather not encounter nasal demons.
The warning also seems a bit merited, given that each "increment" is over more than a single object on the underlying array. (But, of course, that is the intent. But I'm just not sure the static analyzer can see it.)
Plus, a pointer to __m512i might require different alignment than a pointer to unsigned long.
This code received a pointer to unsigned long from somewhere that might have cared about the data it pointed to, so his code should acknowledge that (with type casts or assertions) or make a __m512i or void* pointer part of this function's signature, making the type change part of the contract with the caller.
Someone can perhaps update me for C11, which I'm not up-to-date on, but at least as of C99, while the details of the specification was perhaps unclear on the specifics, the intent was clear that type-punning (of non-character types) without using unions is UB.
Is the code actually compiled with -fno-strict-aliasing (which IME is rare for userspace code, but I don't work with numerical code very much), or are you saying this is because of the types involved in the cast? I didn't see anything in [0] about a change in pointer semantics from using the vector extensions in the compilation unit?
In this case where you're using the wrong pointer type the worst thing that can happen is you can have aliasing issues. For example, if I have an int* and a double*, the C compiler is free to assume that the pointers can never alias each other so it can optimize loads and stores assuming that an update to the int pointer will never affect the double pointer. In general this can cause problems if the pointers actually do alias each other, but if there's no aliasing occurring then nothing bad will happen.
My understanding was that the UB was defined such that:
int i = 0;
float f = 1.0f;
void* p;
if(rand() & 1)
p = &i;
else
p = &f;
int j = *(int*)p;
assert(i == j);
could legally be compiled to a no-op (or a single call to rand() with an ignored result, I suppose), despite there being no "weird" pointer stores, since the ill-typed-at-runtime load "can't happen," so the compiler can reason that (int*)p must equal &i, and hence j = i.
I agree that the GitHub analyses are worse than useless, but I also think it highlights the friction between intrinsics and the program in which they appear. There probably is a better model.
I think this "friction" is really important to understand. For code of this sort, usually you figure out what assembly you want produced, and then you figure out the intrinsics necessary to produce this. In your mind, there is a clear one-to-one correspondence between the what you write and the assembly you want generated, but too frequently the compiler may have other ideas. It constantly feels like your are fighting the compiler's attempts to subvert you. Personally, I think this best current solution is writing blocks or entire functions in assembly, but admittedly the tools for this aren't great.
> However, I fear that it is part of a larger trend where people come to rely more or more on overbearing static analysis to judge code quality. The more warnings, the better, they think.
Don’t get me started on security scanners which basically report all “open” ports as a security issue.
Or, maybe stop doing pointer arithmetic? We have had some pretty good new languages appear in the last 7 years. I know people are averse to change, me included. But you may want to ask yourself if continuing to use a 50 year old language is the right answer.
This is low level code written in a library to make full use of AVX-512.. which is inherently a processor native feature that has to be written in a language close to assembly because the processor only understands machine code.. there is no way to avoid assembly or the pointer arithmetic. Someone has to do the pointer arithmetic so the higher level languages and libraries don't have to.
Let us throw away all the existing code written in languages made in a different age and use languages that are completely binary incompatible just because we can.