Hacker News new | past | comments | ask | show | jobs | submit login

This may be an instance of "you should really know the gcc/clang sanitizers and use them to test your code":

clang test.c -O0 -fsanitize=undefined



test.c:17:12: runtime error: 9.22337e+18 is outside the range of representable values of type 'long'

Interestingly gcc doesn't throw that warning.

I Second this. Most weird behaviors in C++ today can be detected by ASAN and UBSAN.

There also exists low-cost random-sampling-based ASAN implementation that can be enabled in production: Google uses GWP-ASAN for all server-side applications as well as Chrome on Windows/Mac. See https://www.youtube.com/watch?v=RQGWMLkwrKc for details.

According to most surveys, they aren't used as much on real life.

Here 14%, https://www.jetbrains.com/lp/devecosystem-2019/cpp/

Here 40 - 55%, https://www.bfilipek.com/2019/12/cpp-status-2019.html

At CppCon 2015 or something, at Herb's question during his keynote, about 1% of the audience as per his comment on the video.

This looks interesting. I was wondering if it's due to poor build tool support. In real life I found rr + sanitizer as a strictly better choice than plain gdb.

Just yesterday I had upgraded the version of clang used to compile Android's emulator. It got reverted due to some post submit test failing suddenly. The test case was shifting values in a byte array into one value but the LHS of a left shift didn't have enough bits to represent the shift, which is explicit UB. The statement had multiple shift operations (and other sub expressions with templated types) so it wasn't immediately clear that was the issue. -fsanitize=undefined found it immediately. "Spot the UB" is seemingly becoming my pastime.

Does that verifier have a strong possibility of false positives? I'm curious why C compilers have such a strong history of making reasonable checks optional and hidden behind a bunch of switches.

No, actually the false positive rate of these flags is practically zero. (I'm not sure if it's 100% zero, but I used those extensively, reported many bugs and every time a developer told me "this is a false positive" they were wrong.)

The reason they aren't enabled by default is that that's not what they're designed for. They have a significant performance impact, you can't enable them all at once, they conflict with other security features and they may introduce security issues.

These are developer features. They aren't there to run your production code, they are there to test during development and bug finding.

I know TSan has zero false-positives, because it only flags a data race when multiple threads access the same memory without synchronization (and at least one of those accesses is a write). Not sure about the other LLVM sanitizers.

Here's one that I encountered that I cannot figure out:

  $ g++ -x c++ -fsanitize=undefined -
  #include <iostream>
  int main() {
      int *a = nullptr;
      std::cout << std::addressof(*a) << std::endl;
  $ ./a.out
  <stdin>:5:29: runtime error: reference binding to null pointer of type 'int'

You are not allowed to dereference a null pointer no matter how you use it even if you later convert it back to address.

You are allowed to dereference a null pointer, but you are not allowed to access the result or use it as an initialiser for a reference. You can only immediately discard the result, or immediately take the result's address. Using std::addressof(* a) first binds a reference, then uses that reference to take the address, hence the error. You won't get any UBSAN error with &*a.

> You are allowed to dereference a null pointer

Can you cite me something for that? The very first mention of dereferencing in the C++03 standard - ISO/IEC 14882:2003 1.9 Program exection ¶ 4 (page 5) - would seem to disagree:

> Certain other operations are described in this International Standard as undefined (for example, the effect of dereferencing the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior. ]

EDIT: As does 8.3.2 References ¶ 4 (page 136):

> [...] [Note: in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the “object” obtained by dereferencing a null pointer, which causes undefined behavior. As described in 9.6, a reference cannot be bound directly to a bit-field. ]

The standard doesn't say dereferencing a null pointer is invalid. The fact that it gives it as an example of undefined behaviour is a defect in the standard. In the discussion on core language issue #232, the intent has been explicitly stated:

> Notes from the October 2003 meeting:

> ...

> We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior.


WRT your edit: no, that says dereferencing a null pointer and binding a reference to the result produces undefined behaviour. That agrees with what I was saying. "which" refers to the whole of "to bind it to the "object" obtained by dereferencing a null pointer", not just to "dereferencing a null pointer".

I see! Thanks for the reference.

In C99 however, you are allowed to do &٭a because when the operand of & is the result of the unary ٭ operator

> neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue

For anyone who wants it, this is in section Address and indirection operators, paragraph 3.

And to elaborate on how the error message relates to this: std::addressof takes a reference which also can’t be null. So when you dereference the pointer you’re actually turning it into a reference (or binding the reference to the pointer, as the error message says).

A pretty common way to do the offset_of thing is to take a null pointers. Eg: &foo->member where foo is null.

sizeof on invalid pointer dereferences is also pretty common.

Doesn't addressof() take a reference?

> every time a developer told me "this is a false positive" they were wrong

This sounds like a good UB learning experience. The parent comments sounds like the author is running into UB more than they really want to.

In my experience both clang ASAN and UBSAN (and TSAN, the thread sanitizer) are very solid tools, IFIR I haven't seen a false positive yet (of course some code may be specifically written to rely on undefined behaviour, but of course that's a mine field).

On the other hand, the clang static analyzer may have a shocking amount of messages when run first on a large existing code base, and some of those warnings can be considered more "opinions" than warnings. It's still makes sense and is very rewarding to make a code base "static analyzer clean".

The runtime sanitizers in comparison are very precise and always pointed to actual "sleeper bugs", it's almost definitely a good idea to use them and take their warnings serious.

But anyway, clang ASAN, UBSAN, TSAN and the static analyzer are all really excellent and important tools for everybody writing C or C++ code.

PS: the reason why those checks are optional is that they increase compilation time (sometimes dramatically, like 10x slower compilation or more), and they add runtime instrumentation code which both increases the executables size and decreases performance dramatically (also 2..10x times or more, although the clang sanitizers are really quite fast compared to other solutions).

A lot of developers I've worked with don't want to use these tools because they are working with already-awful legacy codebases that will emit screenfuls of (legit) positives when run, and it feels overwhelming. I remember working on some teams where all compiler warnings were turned off because if you turned on even the default ones, you'd get hundreds of thousands of them during the build, and it looked bad. Since nobody had the budget to spend any time cleaning up warnings because "customers don't care about compiler warnings" they would never get fixed. Forget even sanitizers: You wouldn't even get the splash screen displayed before ASAN would barf on you.

The best projects are the ones that start out from day 1 with all the pedantic warnings turned on, warnings-as-errors, static analysis and sanitizers run as part of the automated build and any peep out of them gets treated just like an error would get treated. When you start the project out that way, there isn't that de-motivating initial hump to get over.

Kind of opposite mindset here.

Current codebase generates around 3000 casting errors. I doubt I'm the only one with this kind of "history" to deal with AND the application is crashing with memory access errors.

What's my cleanup plan for this? Multiple compilers with every warning enabled on multiple platforms. Even a platform we're not targeting. We've mapped out which files and which lines have the biggest code smell. Now we have a giant map on a 65" tv to guide us.

Why all this attention? We're moving this nightmare from 32 to 64 bits. Parts of it were originally 16bit which have already been updated. Cast errors alone are now signposts to other bad code.

> Why all this attention? We're moving this nightmare from 32 to 64 bits

It helps that you then almost certainly have buy-in and are allowed to treat this as a priority because it's tied to work that the business is willing to prioritize. You're already over the hurdle of "customers don't pay to fix compiler warnings".

Can't debug anything with a soup of warnings and other mess in the way. They either clear the rot or start from scratch. Our estimate is two years devel to get back to now. So, no time to restart.

Getting rid of the warnings and more importantly the related other debris means being able to use enhancements we make each week. Well... after qa/test have done their thing.

Been good practice for my own indie dev. Treat every warning as an error and make a habit of running a detailed reporting build each day or week. The trick of using other platforms and compilers I brought from years back.

Been there done that. One great feature I ran across was that the intel compiler has a #pragma incantation that enables many checks on a per-file basis. This lets you create new code and selectively fix old code without these obvious defects.

When most people who know C learned C, this technology didn't exist. Currently John Regehr teaches his students that they must use sanitizers.

Note that ruining sanitizers in prod might be insecure. They're for development.

*running, obviously. Too late to edit.

UBSan and ASan have essentially zero false positives. They do point out undefined behavior which happens to work on your platform, but at the very least those are still portability bugs.

This is a runtime check and so has a (small) performance overhead.

That's true. I am using memory sanitizers in my workflow, but I haven't been using the `undefined` sanitizer. This could have saved me a day worth of effort.

or just start moving away from using c. the last five years have brought nice alternatives like zig and rust

So start over 40 years of progress? Everything in my OS is written in C. Why am I always told not to use this language in HN if it's what my computer runs on. It's dangerous, sure, but seems like as a software engineer it's something I need to learn instead of run away from.

Progress is exactly what brought us safe languages. Proper engineering dictates eliminating, or reducing as much as possible dangerous/unsafe/etc. behavior. There's a reason C and C++ top the CVE lists with their buffer overflows and undefined behavior.

It's pretty much established that even expert C and C++ programmers, especially for larger code bases, will end up making some sort of mistake that will cause a security vulnerability or undefined behavior.

That's not my point, I don't contest that safe languages are safer. I contest the idea that we should "just use" something else. All my tools are written in C, the entire GNU toolchain is in C, how am I supposed to operate in this world if I "just don't use" C?

That is the disadvantage of open-source, it forces you to use the language of someone else's source.

I used Delphi on Windows. Most of the code I wrote in the last 20 years is in Pascal.

And that works really well on Windows. You have a stable API, and it does not matter what language the API is written in. Any language can use the API in the same way.

I tried to run some old projects this month. My 20 year old Delphi Windows programs run better in WINE on Linux than most programs I wrote 5 years using Linux tools, because the libraries have changed, but the API has not

I think OP was implying not to write new code in C. If possible, we can also use replacement tools written in safer languages if they offer the functionality we need (e.g. `ripgrep` instead of `grep`, or web servers written in safe languages over those written in C/C++).

ripgrep isn't POSIX compliant. It's hardly a proper replacement.

Right, it is intentionally not POSIX compliant. This saves fairly significant development effort and also permits expanding the tool to be more user friendly, such as transparently searching UTF-16 encoded files.

ripgrep cannot be a drop-in replacement. Despite that, it can certainly replace grep in a wide variety of use cases. See: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#pos...

Congratulations. You have succesfully regurgitated the same sentence I've read here a thousand times.

Not only the last 5 years, but I digress.

Yeah, it's kind of weird how little attention some of C's (historical)competitors in the systems programming space get. You'd think to see a lot of comparisons to whatever Ada* or Pascal would do in the same context.

* Probably because the Ada practitioners are all trapped in Scifs somewhere in the inner mantle of the Earth.

That is why I use Pascal

20 years ago it was advertised as the safe C alternative.

No. I'd rather learn how to use my tools (that I have invested years to be comfortable with) properly than starting over.

I would have trouble convincing myself I'm capable of writing C that doesn't blow up randomly, when everyone else has been failing at that for my entire career and more.

C is not very productive.

Maybe, but it’s super fun and elegant. C and Scheme are at the top of my “most elegant” list. Though, of course, for different reasons.

Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact