Hacker News new | comments | show | ask | jobs | submit login

Fair enough.

I had some trouble using the ubuntu 13 packages for gdc, so i downloaded it from the gdc project binaries as of the latest available there, as recommended by the readme.

Using that to compile warp with gdc with the flags it suggests (-release is not recognized by gdc, -O3 is), i get a warp that works.

For including every file in /usr/include/boost/*.hpp in one .cc file (which produces roughly 16 megabytes of C++ code), we get:

[dannyb@mainserver 12:40:56] ~ :) $ time gcc -E e.cc >f

  In file included from e.cc:101:0:
  /usr/include/boost/spirit.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-Wcpp]
   #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
    ^
gcc -E e.cc > f 3.18s user 0.25s system 97% cpu 3.528 total

[dannyb@mainserver 12:40:51] ~ :) $ time clang -E e.cc >f

  In file included from e.cc:101:
  /usr/include/boost/spirit.hpp:18:4: warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-W#warnings]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
     ^
  1 warning generated.
clang -E e.cc > f 1.42s user 0.14s system 93% cpu 1.657 total

[dannyb@mainserver 12:40:33] ~ :( $ time ./warp/fwarpdrive_gcc4_8_1 -I/usr/include -I/usr/include/c++/4.8 -I/usr/include/x86_64-linux-gnu/c++/4.8 -I/usr/include/x86_64-linux-gnu -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include/ -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/ e.cc >f

  cla/usr/include/boost/spirit.hpp(18) : warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
./warp/fwarpdrive_gcc4_8_1 -I/usr/include -I/usr/include/c++/4.8 e.cc 2.88s user 0.06s system 95% cpu 3.080 total

I've repeated these timings 10 times, and they are within 0.5% of these numbers each time.

I've also tried this on a large C++ project i have, that generates about 200 meg of preprocessed source (that i can't share, sadly) and got similar relative timings. I also tried it on some smaller projects. Based on data i have so far, clang blows warp out of the water by a factor of 2 in most cases i've tried it.

The above tests include stdout IO, but the relative numbers are the same without it:

[dannyb@mainserver 12:48:24] ~ :( $ time gcc -E e.cc -o f

  In file included from e.cc:101:0:
  /usr/include/boost/spirit.hpp:18:4: warning: #warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-Wcpp]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
    ^
gcc -E e.cc -o f 3.14s user 0.27s system 99% cpu 3.418 total

[dannyb@mainserver 12:48:33] ~ :) $ time clang -E e.cc -o f

  In file included from e.cc:101:
  /usr/include/boost/spirit.hpp:18:4: warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp" [-W#warnings]
  #  warning "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
     ^
  1 warning generated.
clang -E e.cc -o f 1.41s user 0.13s system 94% cpu 1.631 total

[dannyb@mainserver 12:48:40] ~ :) $

(I reordered this one to make the timings in the same order as they were before)

[dannyb@mainserver 12:47:38] ~ :( $ time ./warp/fwarpdrive_gcc4_8_1 -o f -I/usr/include -I/usr/include/c++/4.8 -I/usr/include/x86_64-linux-gnu/c++/4.8 -I/usr/include/x86_64-linux-gnu -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include/ -I/usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed/ e.cc

  /usr/include/boost/spirit.hpp(18) : warning: "This header is deprecated. Please use: boost/spirit/include/classic.hpp"
./warp/fwarpdrive_gcc4_8_1 -o f -I/usr/include -I/usr/include/c++/4.8 e.c 2.93s user 0.02s system 99% cpu 2.953 total

Warp is definitely faster than GCC, though.




Just to make your results readable:

  gcc:   3.14s user 0.27s system 99% cpu 3.418 total
  clang: 1.41s user 0.13s system 94% cpu 1.631 total
  warp:  2.93s user 0.02s system 99% cpu 2.953 total
         2.31s (with recommended build settings)


Because of different #define's, Warp may take a very different path through header files than other preprocessors do. In fact, Warp doesn't have any predefined macros other than the ones required by the Standard. Hence, to use it with cpp or clang's preprocessor, it needs to be driven with a command that -D defines each macro.

There's a command (I forget at the moment what it is) that will tell cpp to list all its predefined macros. It's quite a few. You'll need to do that for clang to get an equivalent list, then drive Warp with that.

You'll be able to tell if it is taking the same path or not by using a diff on the outputs that ignores whitespace differences.

The reason Warp doesn't predefine all that stuff is because every install of gcc has a different list, and it's completely impractical to try and keep up with all that.


I did in fact, use warpdrive, which uses those predefines, as you can see in the commands.

I'm also familiar with the innerworkings on llvm and gcc (having hacked a lot on both), and generated the list of include paths i used with warpdrive (emulating gcc 4.8.1) to be exactly the same as GCC on my system uses for 4.8.1.

I also verified the preprocessed output is "sane" in each case, as per diff.


>Using that to compile warp with gdc with the flags it suggests (-release is not recognized by gdc, -O3 is),

Andrei wrote on Reddit:

"We build warp at Facebook using gdc with -fno-bound-checks -frelease -O4."

http://www.reddit.com/r/programming/comments/21m0bz/warp_a_f...


This is not the set of flags you have in the makefile on github though. I would expect people to use those :)

In any case, building warp with this brings the timings down to 2.31 seconds, so clang is still 40% faster (1.41 vs 2.31)

In any case, at least on my side, i don't have time to further explore, i'd love to see cases where warp is faster, but i haven't found them.

(There is also a certain irony of saying you don't post numbers because you get accused, then saying i used the wrong flags, but ...)


Thanks for doing this. I have read that clang uses some SIMD instructions to speed this up, and I don't know how much that contributes. Warp doesn't use any inline assembler.

And, as your numbers show, suggesting the change in compiler flags was entirely justified.


the SIMD usage is mainly to do two things.

In the lexer, it is used for block comment skipping. It will find the end of block comments 16 characters at a time (on both PPC and x86).

During line number computation, it will also find newlines 16 characters at a time.

This could actually (nowadays) be done 32 characters at a time on newer processors, but isn't.


This is flat-out fascinating. Thanks.


64 characters with AVX-512? =)


I didn't check to see if the instructions exist, but possibly :)

You do start to hit two issues though as oyu increase the size of the skipping:

1. Alignment 2. If the average block comment/line is < 64 characters, you may lose more time performing the instruction and then counting the trailing zeros in the result to find the place it ended.

I have no numbers to back up whether this matters, of course :)


AVX-512 does not seem to have PMOVMSKB, which is how I assume it is being done with SSE2. There are other ways to skin that cat, but it's unclear whether they have any advantage over using AVX2 with VPMOVMSKB.


I posted a patch here: https://gist.github.com/dberlin/9867614

It adds AVX2 and SSE4.2 instruction support. It makes no discernible difference performance wise that i can find :)


Heh, awesome! I'll try it out today.


I'm curious how much of an effect this has.


I forgot to ask - are you compiling Warp for a 64 bit executable? Use -m64 if you're not sure.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: