Hacker News new | comments | show | ask | jobs | submit login
Include-what-you-use: Clang tool to analyze includes in C and C++ source files (include-what-you-use.org)
151 points by ingve on Jan 23, 2016 | hide | past | web | favorite | 40 comments



I would love if imperfect #includes (what I understand this tool is supposed to point out) simply became a compile warning and part of the toolchain. I would love such a feature and would use it a lot.

However wrt this project I just looked at the "Instructions for Users" and it tells me how to build it and that I need to observe this and that and a lot of other things for it not to go wrong. Sorry, I just don't have the time for that. Why not just provide binaries for the most important platforms?



If you're using Ubuntu, it's this package:

sudo apt-get install iwyu

I agree the name isn't clear; I also wasted time downloading and compiling stuff until I realized it was in apt after all.

On my VS 2015, in C#, the IDE grays out unused includes, so Microsoft has at least ~something. I'm hoping these tool are just a first step towards mostly automatic #include management.


I'm trying to get into refactoring (and even semantic analysis, and code-generation) for large C++ codebases using libclang, libtool, and the ilk.

Any guides to get started, and/or best practices?

Does it increase productivity by order of magnitude? (I'd like to think so. I think it's madness not to use any AST tool when working on large codebases).

Also I'm assuming "white-box" C/C++ tools like libclang/libtooling are much more powerful than black-box ones (like eclipse CDT), because of extensibility/programmability. Any comments, experiences?

what I'm trying to say is that a non-IDE but programmable editor (vim/emacs) combined with programmable libclang/libtooling, would trump a prepackaged IDE (like Visual Studio, Eclipse+CDT) by a wide margin. (of course there would be quite a bit of programming involved though).


LibreOffice has a pile of clang plugins for analysing their huge and somewhat smelly C++ codebase. They've caught and repaired some horrible code with these. Pretty sure they ran iwyu over it as well.

howto: https://wiki.documentfoundation.org/Development/Clang_plugin...

code: http://cgit.freedesktop.org/libreoffice/core/tree/compilerpl...


I think you'll enjoy this talk: CppCon 2015: Atila Neves "Emacs as a C++ IDE" - https://m.youtube.com/watch?v=5FQwQ0QWBTU


Great talk.

So he created cmake-ide for emacs. I'm wondering is there an equivalent for make projects.

EDIT 1: The github page [1] says "It automatically detects Ninja and Make builds and sets the compile command accordingly". I'm wondering if that means make is fully supported.

[1] https://github.com/atilaneves/cmake-ide


The main feature that I miss from CLion is the ability to refactor. In CLion you can rename a class, and all of its usages, declarations, etc. will be renamed.

The most useful thing CLion can do is change the signature of a function, and its corresponding implementation and header declarations will be changed. So, you can rename a parameter and its declaration and implementation will be renamed for you, or you can delete a parameter, or change the type of a parameter, and CLion will change all its usages/definitions/declarations for you. All of this effects across all files in your project, so if you're in a .cpp file and you change the function signature, its declaration in the corresponding .h file will change as well.

(I believe you can do all of this in Eclipse as well)

Sadly I have yet to see a vim/emacs configuration that gets anywhere near the level of Eclipse/CLion.


>and all of its usages, declarations, etc. will be renamed.

Along with header include statements in any file that referenced the class - so say I refactored class Foo in include/mylib/foo.hpp and it's included in src/mylib/foo.cpp as <mylib/foo.hpp> - CLion happily "refactors" that to #include "foo.hpp" - spent the same ammount of time fixing this stuff as I would manually renaming use cases...


As a new IDE user, I was very excited about the language-aware "rename" refactoring ability. But I realized that the variables I rename are mostly badly chosen names in IDE code-generated boilerplate, i.e. names of widgets inserted into forms. When I'm writing new code in vim I just choose good names to begin with, so it's a bit of a wash for me.

I hope there is room in my brain to stay good at vim despite using VS at work.


Make yourself familiar with the sanitizer tools from gcc and clang (asan, ubsan, tsan, msam). They can help you find many types of bugs.


I've been using deheader (http://www.catb.org/esr/deheader/) for this.

It detects unnecessary inclusions and also warns about missing headers required for cross-platform compatibility.


I wrote a tool to do this in 10 minutes just several days ago. It iterates over a file, and removes one #include at a time (while keeping all the other headers in place), and each time builds a file. For any successful build, it reports that header.

I used this to prune unnecessary includes throughout the TXR project.

There are some obvious false positives, like includes wrapped in #if/#ifdef and also certain system headers which look unnecessary on one system, but are actually required on others (due to the mess that is called POSIX).

After I applied some of the header removals, I pushed the changes to git and went through all the supported platforms to validate the changes. Almost every platform had some problem which resulted in some header having to be put back: MinGW, Cygwin, Solaris, Mac OS X. For instance, Mac OS is a stickler for needing <signal.h> to declare the kill() function. On other platforms, it also shows up elsewhere, perhaps unistd. On glibc, you get va_list if you include <stdio.h>. Some platforms guard against this; they use some internal typedef name instead of va_list for vfprintf, so you must include <stdarg.h> for va_list. Various problems of this type. So just because a tool tells you, "hey this compiles without <stdarg.h> just fine", that of course means "well, on this system".

   #!/usr/local/bin/txr
   @(next :args)
   @file.c
   @(next `@file.c`)
   @(collect)
   @  (some)
   @lines
   @  (and)
   @    (line linenum)
   #include @header
   @  (end)
   @(end)
   @(require (boundp 'linenum))
   @(do (rename-path `@file.c` `@file.c.bak`))
   @(try)
   @  (do (each ((rem-line linenum)
                 (rem-hdr header))
            (with-stream (s (open-file `@file.c` "w"))
              (tprint (partition* lines (pred rem-line)) s))
            (ignerr (remove-path `opt/@file.o`))
            (when (zerop (sh `make opt/@file.o > /dev/null 2>&1`))
              (put-line `@file.c:@{rem-line}: can remove @{rem-hdr}`))))
   @(finally)
   @  (do (rename-path `@file.c.bak` `@file.c`))
   @(end)
The output is in a form that looks like compiler diagnostics. I can stick it in errors.err and run "vim -q".

This simple tool works in conjunction with the project rule that that headers don't include other headers. The .c files include everything they need in the right order. But the inclusions are added by hand by copy and paste, which can pull in something that isn't actually needed.

You have to iterate the tool. If you remove some header which isn't needed, a second pass can then determine that yet another header isn't needed, because it was only needed by that which was just removed. The iteration isn't worth building into the code.


Personally I don't like this. Often enough there's includes in headers, which are an implementation detail for the functionality in the originally included header. Iwyu will just include that, increasing coupling between components.


Looks like you can use flags on the fix_includes.py script to avoid that, so something like:

    fix_includes.py --ignore_re '^boost' < /tmp/out


You can use the IWYU "export" pragma to inform IWYU that this is what you're doing. https://github.com/include-what-you-use/include-what-you-use...


Not a particularly good solution imo, because that means modifying random libraries I use. Besides adding verbiage.

I can see value in removing unneeded includes, and doing that supervised every once in a while. The way it wants to add includes? Not so much.


I don't think the program does that, if I understand your post. A more accurate name might b "exclude what you don't use." It doesn't add in chained includes, it just lets you know about superfluous includes.


I was pretty sure it does that, since people in projects I work on use it. To make sure that I'm correct, I just ran it on some code I work on, and indeed, it wants to add (...c should add these lines ... ) implementation details to a lot of sourcefiles.


I guess this can be a problem for cross platform code, yes?


Or even compatability with different versions of the same compiler, runtime or libraries.


I don't get it, can you give an example?


Big libraries like Boost and OpenCV usually have meta-headers for features and components that just include all the other headers for that particular part you want to use.

(e.g. https://github.com/Itseez/opencv/blob/master/include/opencv2...)

A tool like this would see right through that and walk the tree all the way to the final leaf headers that actually implement the parts you use. This not only means one single include will balloon to many smaller ones, but when the library moves things around internally your code will break.

(Unclear if this is what this tool in particular will do, but I've certainly seen this behavior with Clion for example when it recommends what header to include.)


The obvious one would be #include <windows.h>. I have never checked to see how big that file actually is, but it's not anywhere near what you would think based on what functions MSDN says need you to #include it for. The truth is, it #includes several other files, because Microsoft, in its infinite wisdom decided that, for the most part, programmers don't have to know what file a declaration is actually in, they just need to remember #include <windows.h>.


I have made a similar tool that is a lot more hacky, but does not require compilation or preprocessing of the source (and is this not reliant on any single compiler): https://github.com/orlp/iwyu.

It also does nothing smart, it is user trained. Every symbol prefixed with a namespace you are interested in will be presented to you, and you must tell it where to find the symbol. Any other symbols will be ignored.



Actually it would be nice to see that functionality as a compiler warnings.


With CMake it is possible to run the tool as part of the normal build process [1]. Not as nice as first class compiler support, but still quite helpful as part of day to day development, I find.

[1] http://stackoverflow.com/questions/30951492/how-to-use-the-t...


I find that a strict coding style can make this easier to implement, and more likely to restrict warnings to the code that you directly control.

For example, I used a module prefix consistently in my code so that any occurrence of an "Xyz_" prefix in a file would pretty much guarantee that "Xyz.h" is required to compile. I also put all local #include references in the same part of the file under a predictable comment header. That way, I didn't need a C++ parser; I just needed to know where to start searching for the list of headers, infer prefixes from their names, and complain about prefixes that were not found in the rest of the file. This has worked surprisingly well, and I can tune the script to skip modules for any reason.

This is also important for the reverse case. I don't really need an IDE or ctags to figure out what file is needed for a #include in most cases because I have the name of the function/type/constant/etc. to guide me. Similarly, I know exactly what file to open in my editor without using magic tricks. (Although, code completion and jump-to-definition are still quite helpful, especially for things like OS calls that may follow no particular convention.)


Hmm, I happen to know a colorful-logo company that has a internal unrecommended tool that has the same name and does the same thing :) Are they making this public?


I'm pretty sure we open sourced it in 2011 actually ....

(I don't think this version has been kept up to date with the internal one, but i could be wrong)



The first sentence reads: "'Include what you use' means this: for every symbol (type, function variable, or macro) that you use in foo.cc, either foo.cc or foo.h should #include a .h file that exports the declaration of that symbol."

For clarity and correctness, I think it should be changed to: "'Include what you use' means this: for every symbol (type, function variable, or macro) that you use in foo.cc that is not declared in foo.cc, either foo.cc or foo.h should #include a .h file that exports the declaration of that symbol."

Not having that phrase tripped me up for a few minutes. I guess everyone else thinks it's implied, but for some reason that wasn't clear to me until I read through more of their documentation.


Ther is also: "don't include anything except for one header, and let a tool manage what is in it based on analyzing your code":

Makeheaders:

http://www.hwaci.com/sw/mkhdr/


I've used this for years. It takes a bit of work to get it working well on a codebase, but I find it worthwhile.

Refactoring often involves moving functionality between header files, which can mean fixing up the includes in a large number of other files. IWYU makes this a lot easier. IWYU does require the code to build before it works, but this can usually be easily gotten after a refactor by making one header file temporarily include the other.

I have, a couple times, modified templated code so it wouldn't confuse IWYU, as that is easier than maintaining pragmas correcting IWYU in the files that call the templated code.



How about a tool to "exclude what you don't use"?

I have several personal hacks for doing this I have written over the years but I've yet to find anyone else who tried to automate it.

For example, one task is to determine the functions in a library that are not actually used in your program and exclude those from linking, instead of blindly linking libraries full of unused functions (that sometimes cause name conflicts).


The LibreOffice project has been using this a lot to refactor a large, ancient codebase. They've also developed a lot of Clang plugins, which can be found here: http://cgit.freedesktop.org/libreoffice/core/tree/compilerpl...


This seems like an awesome tool. I was thinking about going trough a code base for an hobby project a while ago. But this tool could make it a much easier task. I haven't tried following the instructions yet. But it seems to be correct. Good work!





Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: