
Facebook Open Sources flint, a C++ linter written in D - andralex
https://code.facebook.com/posts/729709347050548/under-the-hood-building-and-open-sourcing-flint/
======
haberman
It amazes me how far out of their way many people will go to avoid using
parsing tools.

He avoided using a lexer generator because "I'd figured using a lexer
generator was more trouble than it was worth". To him it was more convenient
to write a lexer manually (using macros) in C++, then later completely rewrite
it in a different language (D) as a one-off trie matcher code generator.

I am amazed that this could be thought of as less work. How can writing two
lexers from scratch manually be less trouble than writing a bunch of regexes
in a Flex or Ragel file? Especially since the result was likely slower than
using Flex or Ragel would have been?

To me the interesting question is: how much of this allergy to external tools
is:

1\. the trouble of learning/maintaining something new/different (inherent tool
overhead)

2\. design of the tool isn't as user-friendly as it could be (incidental tool
overhead)

3\. irrational belief that the tool will be more work/trouble than it actually
is (non-optimal decision making)

If part of the answer is (2), then improved tools will lead to greater
adoption. And hopefully more convenient tools will lead to them being better
known and more widely adopted, which should lessen (1) also.

Everyone uses regexes; using a lexer generator should (in my opinion) be as
easy as using a one-off regex. I think the key is to make the lexer generator
an embeddable library like regex libraries are.

~~~
WalterBright
Using third party lexer generators means your project is now dependent on that
tool. This means:

1\. the tool must exist on every platform your project will ever be ported to

2\. the same version of that tool must exist on every such platform

3\. if the hosted compiler changes, the tool must be updated in sync with that
compiler - out of sync means your project won't build

4\. if there are blocker bugs in the tool, you're not in a good position to
fix it. You're stuck with developing workarounds.

5\. Everyone building your project has to get the tool successfully installed.

The current build of D's runtime library has a dependency on libcurl.
Supposedly, since libcurl is ubiquitous, that shouldn't be any problem. But it
has been an ongoing sore relentlessly sucking up multiple peoples' time, of
the issues I outlined above.

~~~
e12e
> Using third party lexer generators means your project is now dependent on
> that tool.

I would submit that porting D is signficantly harder than porting flex and/or
ragel. Not to mention that the number of platforms that support C++ and D but
not flex should be smaller than the number of platforms that support C++ and
flex but not D?

~~~
WalterBright
It's easier to just write the lexer by hand than deal with those issues.

------
staunch
Once you get used to tools like `go fmt` or Perl::Tidy it feels barbaric not
to be able to cleanly format all your code instantly. Read only linters are
nice but something that can reformat code is much nicer.

~~~
Scaevolus
Clang-format aims to provide this for C++.

[http://clang.llvm.org/docs/ClangFormat.html](http://clang.llvm.org/docs/ClangFormat.html)
[http://www.irill.org/videos/euro-llvm-2013/jasper-
hires.webm](http://www.irill.org/videos/euro-llvm-2013/jasper-hires.webm)

~~~
jbergstroem
While reading the article, I also thought why they didn't at least mention it
(and why they abandoned the idea).

It's powerful, fast and will most likely be the base for a lot of {h,l}inting
in editors/IDE's moving forward.

edit: Somehow missed that they mentioned Clang not being mature enough when
starting this project.

------
doe88
Years ago I remember learning template metaprogramming in C++ from an amazing
book [1] from the same author of this tool. Even if looking back, I now think
template metaprogramming when over-used may be a little bit over the top, I
nevertheless have no doubt judging by his author that this new tool must have
some very good qualities.

[1]
[http://en.wikipedia.org/wiki/Modern_C%2B%2B_Design](http://en.wikipedia.org/wiki/Modern_C%2B%2B_Design)

~~~
eco
If you enjoyed that you should pick up The D Programming Language by Andrei.
It's a good read. Metaprogramming with D is a dream compared to C++ so some of
the techniques in Modern C++ Design aren't even necessary.

[http://www.amazon.com/D-Programming-Language-Andrei-
Alexandr...](http://www.amazon.com/D-Programming-Language-Andrei-
Alexandrescu/dp/0321635361/ref=sr_1_1?s=books&ie=UTF8&qid=1393281290&sr=1-1)

------
jlarocco
I'm a little skeptical that it doesn't use a standard parser like clang. My
gut instinct, from experience with crappy c++ "parsers", would be that it
works great 95% of the time, but fails the 5% of the time when it'd be most
useful.

On the other hand, given the author, and the fact he explicitly mentions
wanting to support C++11, I'm not sure my skepticism is warranted.

I wonder if anybody's tried it on a big C++ code base outside of Facebook?

~~~
apetresc
It doesn't seem that flint parses at all – it only tokenizes, and then you
write "rules" on "streams of tokens." It kinda sounds like a set of fancy
regexes, but on tokens instead of characters.

~~~
evincarofautumn
That’s exactly right. And tokenising C++ is infinitely easier than parsing it,
though there are still the odd context-sensitive edge cases like “> >” versus
“>>”.

------
foobarian
I chuckled at this passage: "flint is written in the D language, making it the
first D codebase open-sourced by Facebook. In fact, our initial version of
flint was written in C++; the D rewrite started as an experiment. From the
measurements and anecdotes we gathered, the translation turned out to be a win
on all fronts: the D version of the linter turned out smaller, dramatically
faster to build, significantly faster to run, and easier to contribute to."

Seems like the argument is stronger to rewrite the code being linted, than to
use the linter itself ;-)

~~~
eru
D is actually a pretty nice language. (The caveats are around the libraries,
especially the standard libraries to choose from.) I can see D being a better
C++, in that it better solves that problems that C++ is supposed to be good
for.

~~~
MaxBarraclough
> The caveats are around the libraries, especially the standard libraries to
> choose from.

The old Tango/Phobos situation?

With the advent of D2, that's no longer a thing. Tango is now all but dead.

~~~
eru
Thanks for the update! I haven't toyed around with D in a while---but it was a
mostly pleasant experience the last time I did.

------
agwa
GitHub repo:
[https://github.com/facebook/flint](https://github.com/facebook/flint)

------
shin_lao
"Even now, clang cannot compile some of our C++ codebase."

Andrei, I'm a bit surprised by this statement. Could you give an example?

Otherwise great work.

~~~
Gownta
Here's a gem that clang produces when compiling folly:

"clang-3.4: error: unable to execute command: Segmentation fault"

~~~
shin_lao
So we're not the only ones. We had to go back to Clang 3.3 because Clang 3.4
had issues with our meta-programming craziness.

But still, I submit a clang-based tool is a better bet in the long run.

------
dman
I am surprised how short the tokenizer is. The other thing is D looks
surprisingly approachable coming from C++.

~~~
deadalnix
D is crazy good at boilerplate generation.

~~~
jeremiep
Exactly. The only thing I know to be more powerful are LISP macros.

~~~
eru
Have you tried template Haskell?

------
jbergstroem
To me, this is more a proof of concept implementation in D than a tool I'd
consider having part of my workflow. The dependency chain and "exotic"
language choice makes just having it available in your ecosystem a higher jump
than what utilities like linters should require.

~~~
andralex
Clearly that's something to worry about. What won our team over was the
(sometimes spectacular) speed gains compared to the C++ linter.

~~~
vl
Could you comment on why it is faster?

------
frou_dh
This looks really good. Having checks like this integrated and automated is
the big win. Like data backup, if it's not automated then it doesn't really
count.

------
dedosk
> Marking namespace-level data as static inside a header is > almost always a
> bad idea.

Can anybody explain this to me with some example?

~~~
adsche
Next sentences:

    
    
        Labeling the data as such potentially generates one instance of the
        static data inside each compilation unit, including that header. Fixing
        these issues has led to measurably smaller and faster executables.
    

When you include that header in two different .cpp files, which compile to two
different .o files, you have that data twice, local to the .o file
(compilation unit).

Now link them together and you have that data twice in the executable. You
probably did not want that. (But -- as with some more of their examples --
maybe you did indeed want that?)

------
puppetmaster3
I'm going to the D conference!!! :-)

