
Clang-11.0.0 Miscompiled SQLite - creolabs
https://sqlite.org/forum/forumpost/e7e828bb6f
======
klysm
If I understand correctly, everything is working as intended: a fuzzer caught
a bug in an unreleased version of clang. The title makes it sound like
somebody fucked up pretty badly.

~~~
jfkebwjsbx
What strikes me as odd is why SQLite is committing changes just to make an
unreleased compiler happy...

~~~
Someone
I think it’s a code improvement, too. It now computes the new flags in one
place.

Just looking at these lines, I don’t understand the code, though. Apparently,
_sqlite3VdbeMemRelease_ can change _pMem- >flags_, but that change can be
ignored. I can’t think of a simple explanation for that.

Also, the function name _sqlite3VdbeMemRelease_ seems to indicate memory gets
released, but _pMem- >flags = MEM_Str|MEM_Term…_, to me, indicates it contains
a string.

~~~
dathinab
> but that change can be ignore

No the changes _must_ be ignored. This is where the bug came from.

The code does store the flags, does an operation and then restores some flags
while setting some new ones.

But after clang reorders the operations it will not restore the flags from
_before_ the function calls but instead takes them from _after_ the function
call.

I.e. any change `sqlite3VdbeMemRelease` does to `pMem` mutst be ignored but
after reordering changes to `MEM_AffMask` `MEM_Subtype` are no longer ignored.

~~~
shawnz
I think what they are saying is, why does sqlite3VdbeMemRelease change the
flags in the first place if the change must be ignored?

~~~
thesz
Because these changes may not be ignored in some other place?

~~~
Someone
Probably, but it doesn’t look like good API design to provide a function whose
actions you sometimes have to partially undo.

------
mehrdadn
Question for any low-level optimizing compiler engineers here: I obviously
realize these are all important, but in your judgment, how much of making an
error-free compiler would you say is about having a comprehensive test suite,
vs. having _very_ careful software engineers, vs. having extremely thorough
code reviews, vs. something else? Put another way, if you were to lose one of
these, which ones do you think would have the most/least negative impact (or
be the easiest/hardest to make up for with other things) in terms of the
correctness of the final product?

~~~
g_airborne
Somewhat unrelated but I read somewhere that the Rust team test their releases
by compiling the entirety of crates.io and all Rust GitHub repositories to
find regressions. The tool is called Crater. Source:
[https://www.pietroalbini.org/blog/shipping-a-compiler-
every-...](https://www.pietroalbini.org/blog/shipping-a-compiler-every-six-
weeks/)

~~~
grenoire
No longer the entirety but still a sizeable chunk of it; it helps with
regression detection for them. World's biggest test suite?

~~~
jfkebwjsbx
It is very common for compilers to be tested using client code. It is a
practice that goes back decades.

Nowadays with so many open source projects and so much distributed computer
power, it is even easier.

~~~
pornel
What makes it easy and powerful for Rust is that ~all Rust projects use Cargo
as the build system and test runner, so Crater can really test every Rust
project in existence, not just code samples hand-picked into compiler's test
suite.

~~~
jfkebwjsbx
Only open source projects, which will not be the majority if Rust ends up
taking up speed.

------
ridiculous_fish
These sorts of bugs are more common than you may think! Here’s gcc
miscompiling fish shell: [https://github.com/fish-shell/fish-
shell/issues/6962](https://github.com/fish-shell/fish-shell/issues/6962)

~~~
gnulinux
These sort of things get very tricky. Years ago in college me and my team was
building an operating system and we were 100% sure we found a GCC bug. It
turned out to be a gdb bug (we were disassembling with gdb) instead that made
us _think_ it was a GCC bug, but it was actually a bug in our code. Very
subtle.

------
mister_hn
I don't understand trying to use the trunk version of a under-development
compiler. Would you use the binaries built from an unstable compiler in
production?

Given that here we talk about SQLite, what's the advantage to use clang 11
instead of a previous version?

~~~
viraptor
You're looking at the wrong side of sqlite usage. If you develop sqlite, why
wouldn't you test it on every compiler/version combination you can get your
hands on? Whether someone uses it in production is irrelevant.

~~~
Nitramp
Because you'll spend a lot of time chasing down bugs that are actually issues
in the compiler. Unless you're trying to vet the compiler itself, this won't
improve your software but cost you a lot of effort.

~~~
saagarjha
If you’re SQLite, you generally do care about popular compilers converting
your code into something wrong because you care about your code doing
something wrong in general. If there was a bug in Intel processors that made
SQLite fault randomly during tests, I’m sure they’d be interested in that too.

~~~
Nitramp
Yes.

Two things though:

1\. Hopefully none of your users run an unreleased clang from head, so you're
not using anything a user will run into.

2\. Consider if it is it worth the effort to validate every commit pushed to
head, or whether you'd get more out of your investment if you only tested
releases that received some more vetting (e.g. nightlies).

~~~
progval
> 1\. Hopefully none of your users run an unreleased clang from head, so
> you're not using anything a user will run into.

Yes, but clang eventually gets released. You want to catch bugs in clang and
get them fixed before that release happens.

~~~
jfkebwjsbx
Yes, but Clang is many months away from a release.

It is not the same as playing with a release candidate.

~~~
pas
It's an automated fuzzer, it caught a bug very early, but the report got sent
to SQLite instead of Clang. Probably because OSFuzz doesn't handle this
distinction. (Though it might should. Or maybe usually new compiler versions
catch problems in old code a lot more frequently than it catches compiler
bugs. Dunno.)

------
adr_
Clang 11 hasn't been released yet, right?

~~~
Twirrim
Why is OSSFuzz using such a bleeding edge compiler? That seems a little nuts.

~~~
bradfitz
Wouldn't you rather catch bugs before they're released in a stable version?

~~~
zamalek
Exactly. This is precisely the point of nightly builds, is it not?

~~~
Twirrim
Clang 11 is still in early development stages. Release date is several months
away. Clang 10 was released just a couple of months ago. 11 is expected to be
buggy and not fit for use yet.

The SQLite devs now have to deal with "is it or isn't it a compiler bug"
nonsense, taking their time away from fixing actual problems, working on
features etc, from OSSFuzz deciding to use a compiler that the compiler devs
themselves don't think is fit for use.

How much trust can you have that even fuzz results exposed are actually
legitimate either? False positives, or worse still false negatives?

------
aidenn0
Given that it's SQLite, this is likely a compiler bug. However, the code given
is insufficient to demonstrate a compiler bug. Given:

    
    
          c = pMem->flags;
          sqlite3VdbeMemRelease(pMem);
          pMem->flags = MEM_Str|MEM_Term|(c&(MEM_AffMask|MEM_Subtype));
    

You could have say:

    
    
        sqlite3VdbeMemRelease(struct foo pMem) {
            *(some_other_type *)&pMem->flags = bar;
        }
    

In which case the C aliasing rules would allow the compiler to assume that the
assignment through "some_other_type" does not affect the assignment through
whatever type pMem->flags is.

I have seen this bug happen before, when compilers got better at inlining,
where it was something like

    
    
       void getAddressOfSomething(intptr_t *address);
       ...
       char *p;
       getAddressOfSomething((intptr_t *)&p)
       *p=foo
    

The compiler could reorder the \\*p=foo line to be before the
getAddressOfSomething call for the same reason.

TL;DR: Turn of strict aliasing via compiler flags (-fno-strict-aliasing on
gcc) if you ever type-pun anywhere without using a union.

~~~
chx
> Turn of strict aliasing via compiler flags (-fno-strict-aliasing on gcc) if
> you ever type-pun anywhere without using a union.

That's the topic of Gary Bernhardt's (of JS WAT and Execute Program fame) very
important "A Case Study in Not Being A Jerk in Open Source".
[https://www.destroyallsoftware.com/blog/2018/a-case-study-
in...](https://www.destroyallsoftware.com/blog/2018/a-case-study-in-not-being-
a-jerk-in-open-source)

~~~
schlupa
What an utter bullshit article. If people can't see the technical merit of an
article past the swear words, they should stop working and join their convent
of choice. Weaklings.

~~~
saagarjha
I think that is the entire point of the article.

------
fctorial

          c = pMem->flags;
          sqlite3VdbeMemRelease(pMem);
          pMem->flags = MEM_Str|MEM_Term|(c&(MEM_AffMask|MEM_Subtype));
    

'pMem->flags' is a u16[1]. Shouldn't it be copied to 'c'. How can
'sqlite3VdbeMemRelease' alter the value of 'c'.

[1][https://github.com/smparkes/sqlite/blob/8caf9219240123fbe6cf...](https://github.com/smparkes/sqlite/blob/8caf9219240123fbe6cff67b1e0da778c62d7621/src/vdbeInt.h#L148)

~~~
raverbashing
You're right.

But for some reason, it's thinking it can just reread the flags from the
structure after that function, which has a non-const pointer

void sqlite3VdbeMemRelease(Mem *p);

I'll be waiting for the excuses of the "C standard evangelists" and the
extreme optimization people on how the SQLite developers are wrong on this
case and how it's funny to pull the rug under them.

~~~
account42
> But for some reason, it's thinking it can just reread the flags from the
> structure after that function, which has a non-const pointer

The pointer being const would not give the compiler any more guarantees that
would allow it to reorder the read.

------
eyegor
Holy cow, this happened at -O1. This doesn't seem like the sort of
optimization that should be possible at such a low level. I've run into plenty
of trouble with higher level optimization flags in compilers before, but this
is wild.

------
pwagland
I see several references in this thread to "a bug in Clang that is already
fixed", but I can't see anywhere where anyone references this bug.

Can anyone point to the bug report (and/or fix) in Clang?

~~~
danhor
[https://bugs.llvm.org/show_bug.cgi?id=46195](https://bugs.llvm.org/show_bug.cgi?id=46195),
it's now linked in the sqlite bug

------
nurettin
Why would OSFuzz send the bug report to SQLite instead of Clang?

~~~
zmodem
OSSFuzz is an automated system.

~~~
klysm
More generally, bug reports frequently get filed to a project that uses the
system that actually has the bug as dependency - even with non-automated
systems.

------
hoseja
So... if I'm understanding correctly, you're accessing released memory? How is
this anything but undefined behaviour?

~~~
jabl
If I understand correctly, sqlite3VdbeMemRelease doesn't free pMem nor
pMem->flags, just some other fields in the pMem struct.

~~~
KMag
You're almost certainly reading that correctly, that pMem is a pointer to a
pointer to a block of memory, plus some metadata for that block of memory, and
sqlite3VdbeMemRelease either releases the memory back to the application-level
pool, frees it back to the memory allocator (glibc
malloc/jremalloc/tcmalloc/dmaloc/etc.), or directly brk()/mummap()s it
directly back to the OS.

I have no experience with sqlite internals, but it's pretty clearly doing some
application-level accounting on top of the normal memory management.

