
Rebuild of the Debian archive with clang - yungchin
http://clang.debian.net/
======
lysium
Looks like the most comprehensive list of differences between clang and gcc.
I'm amazed.

Apparently, most differences are either because clang and gcc use different
standards or interpret them differently.

Also, only 9% of the Debian packages have issues, meaning clang is getting
more and more worth considering.

~~~
valisystem
If you put basic standard compliance aside, it also strips down the errors
from arguable code to horrifying coding practice.

[http://clang.debian.net/status.php?version=3.0&key=VARIA...](http://clang.debian.net/status.php?version=3.0&key=VARIABLE_LENGTH_ARRAY)

[http://clang.debian.net/status.php?version=3.0&key=NON-P...](http://clang.debian.net/status.php?version=3.0&key=NON-
POD)

~~~
aidenn0
Really I never got the argument for disallowing variable-length arrays at the
end of a C structure. I completely agree with disallowing non-PoD variable-
length arrays as well as variable-length arrays in the middle of a structure
though.

~~~
nknight
The entire concept of variable-length C arrays is at best iffy, but including
them in structs is pretty crazy.

Consider these two questions:

1) What is the sizeof a struct containing a variable-length array?

2) How do you create an array of structs containing variable-length arrays?

~~~
ori_b
1) The sum of sizes of the padding and fixed-size members of the struct. In
the context of C, this is expected and fairly sane. It also matches the C89
idiom of ending a struct intended with a single element array when you want a
variable length array. If C allowed zero-element arrays, then they would be
used.

    
    
        /* c89 */
        struct {
            int len;
            int vla[1]; /* really len elements long */
        } MyVLAStruct;
    

2) Carefully.

~~~
nknight
> _2) Carefully._

Can you provide code demonstrating how you will "carefully" create a C array
of structs with a variable-sized member? It will be very educational for me,
at least, and I think others, as well.

~~~
ori_b
Well, in the normal case, you wouldn't do it. These variable length structures
need to be created on the heap to be able to be used in a variable length way,
for the most part, so you'd just put a pointer to them into an array.

However, I can think of a couple of methods, such as packing into an array,
and using a second one to index it, like so:

    
    
        a   = [aaaa,bb,ccc,dd]
        idx = [0,4,6,9]
    

To get to the i'th element of a, accesses would go through idx like so:
a[idx[i]]. In general, of course, there's no way to allow O(1) access and
updates without occasional repacking.

~~~
nknight
This is exactly the problem I'm getting at, you're not actually working within
the confines of C here, you're creating funny workarounds for the fact that
certain C features don't work how you want them to, and you're violating the
type system in the process.

The concept of VLAs does not fit the language well, they're inherently
something of an anomaly. Allowing them in structs would simply multiply the
anomalies.

I'm frankly shocked that they were codified in C99 at all, rather than
codifying something akin to alloca() with implementation-defined behavior, but
I'm infinitely grateful the committee did not elect to make them anything more
than they are -- which is a semi-portable mechanism for allocating arbitrary
amounts of automatic memory.

------
rbanffy
This is seriously amazing and will, certainly, improve all the codebases
involved. Sylvestre and the others involved deserve a lot of good karma for
this.

Going a step further, wouldn't it be great if all packages had automated tests
that could easily be run on the 91% of the packages that were successfully
built?

~~~
sylvestre
Thanks. I appreciate ! :)

~~~
rbanffy
Thank _you_.

------
viraptor
If there was even an easy way to get involved in OpenSource development in
general - this is it. It's pretty much a list of trivial 1-line bugs to fix!

~~~
udp
And patches to generate, mailing lists to find and emails to write. If only
every open source project were on GitHub.

~~~
viraptor
That's also something I meant by getting involved in open source. I'm sure for
some projects you'll have to figure out its local rules, sometimes maybe even
explain to the developers what's clang, why the change is required to support
it and why they should care. It's the whole experience, not only code change
;)

------
unwind
Amazing list.

I was frustrated that I didn't manage to figure out how to locate the results
for a given package, if there were any.

Without that, how should I (as a package upstream owner) know if I need to fix
my code, or at least analyze the results with respect to my particular
package?

~~~
maxerickson
The build logs are all together:

<http://clang.debian.net/logs/2012-01-12/>

(I guess the ones with a 'b' appended are for clang)

~~~
unwind
Fantastic, just what I was after. Thanks a lot!

And phew, my package was not listed. :)

------
tomdeakin
Would be really interested to see what caused those segfaults! But the link in
the table just goes to the table.

~~~
nitrogen
Here's one from one of the logs
([http://clang.debian.net/logs/2012-01-12/libgtkada2_2.14.2-5_...](http://clang.debian.net/logs/2012-01-12/libgtkada2_2.14.2-5_lsid64b.buildlog)):

    
    
      Building libgtkada.so.2.14.2
      cd obj-shared; x86_64-linux-gnu-gcc -shared -fPIC -Wl,--as-needed \
      	  -o libgtkada.so.2.14.2 -Wl,-soname,libgtkada.so.2.14.2 glib*.o gdk*.o \
      	  gtk*.o pango*.o misc.o misc_broken.o -lgtk-x11-2.0 -lgdk-x11-2.0 \
      	  -latk-1.0 -lgio-2.0 -lpangoft2-1.0 -lpangocairo-1.0 \
      	  -lgdk_pixbuf-2.0 -lcairo -lpango-1.0 -lfreetype \
      	  -lfontconfig -lgobject-2.0 -lgmodule-2.0 \
      	  -lgthread-2.0 -lrt -lglib-2.0   -lgnat -lX11
      Segmentation fault
    

In this case it looks like the compiler is crashing when invoking the linker
(I'm assuming that x86_64-linux-gnu-gcc has been aliased to clang).

The others:
[http://clang.debian.net/status.php?version=3.0&key=SEG_F...](http://clang.debian.net/status.php?version=3.0&key=SEG_FAULT)

------
frownie
However this doesn't proove that clang produce error-free (was afar the
compiklation goes) executables.

~~~
stingraycharles
No one can ever prove such a thing. Neither can it be proven that gcc or any
compiler for that matter produces error-free executables. What this does,
however, is providing information and analysis about the quality of the clang
compilation process as compared to gcc.

~~~
ars
No can prove it, but at least the GCC version are actually run by people -
these version and created, but never actually used.

He should do some fuzz testing of these programs, with the exact same fuzz
sent to the GCC versions, and then report any differences. But collecting the
"results" would be hard - it's not always visible in output, you'd have to
track system calls and IO.

~~~
rbanffy
The ideal scenario would be if upstream projects provided embedded test code
with standardized hooks so that tests could be built, executed and results
collected in an automated way.

Even if we started with a small test set for some projects, it would be a huge
win in the long run just to have this scaffolding in place.

Any ideas on how to make a distro-agnostic testing hook?

~~~
johnpaulett
It does not necessarily have to be distro-agnostic.

Debian packages can already hook the upstream's test suite (e.g. via
dh_auto_test).

From my (extremely limited and mostly dynamic language) Debian packaging
experience, it seems that more often than not, packages do not use this
existing hook. Not sure why that is though.

------
lhnn
Why is everyone itching to get off of GCC? Or is it all just posturing, to get
GCC to work harder now that it has competition?

~~~
getsat
GCC is becoming larger and more unmaintainable with each release. Competition
in this space has been pretty sorely lacking for quite some time. The OpenBSD
folks are also pushing PCC as a GCC alternative.

It'll be nice to have a BSD-licenced GCC equivalent for distribution with
FreeBSD/OpenBSD, too.

Clang also has REALLY great error messages compared to GCC. It will tell you
exactly where you forgot a comma, semicolon, quote, etc. instead of randomly
pointing to some line around the error in question.

~~~
cliffbean
It is useful to keep in mind a common confusion.

Apple and various BSD's stopped updating GCC around version 4.2, the last
GPLv2 version, which is approximately 4 years old now. For people who only
develop on those platforms, this is what "GCC" means. So when someone from
Apple or someone with an obvious BSD bias talks about how much better clang is
than "GCC", they're usually talking about how much better clang is than an old
unmaintained version of GCC.

clang does have strengths, but when we make comparisons, we should be clear
about what it is we're comparing.

~~~
nknight
Unless there's been a concerted effort in the last ~year or so to improve
GCC's error messages and compilation speed, clang's primary technical
strengths remain unchanged relative to GCC.

Saying we're comparing old versions isn't really useful unless the new
versions have actually addressed the relevant issues.

~~~
cliffbean
Recent versions of GCC do, in fact, have improved error messages. For example,
GCC 4.6 fixed a problem that the post I was replying to mentioned, where a
missing semicolon after a struct definition would cause GCC to elicit an error
message pointing to some other nearby line.

If you're comparing clang to just GCC 4.2, then say that. If you've compared
it with recent versions of GCC too, then say that.

~~~
adbge

        $ gcc -v
        ...
        gcc version 4.6.2 20120120 (prerelease) (GCC)
    

When it comes to diagnostics, GCC isn't even competitive.

