
How Intuit Manages 10 Million Lines of Code - aritraghosh007
http://www.drdobbs.com/tools/building-quickbooks-how-intuit-manages-1/240003694
======
patio11
A lot of this is (at least aspirationally) table stakes for modern software
development, but for those of us who have seen the sausage get made, a) a lot
of the Well-Known Best Practices (TM) are not actually observed at a lot of
companies, even with arbitrarily high levels of sophistication and resources
available and b) they really do save metric tonnes of effort at scale. One of
the reasons why seemingly trivial, clearly beneficial changes don't get
adopted at some companies is that, at certain scales, it's like trying to turn
an aircraft carrier. (Any process improvement which takes an engineer one day
worth of productivity to adjust to costs Intuit _several million dollars_.
Something which a startup could decide on in a week -- "We should switch to
git and then have per-team branches with integration at..." -- is probably an
$X00 million project there and if it failed would be catastrophic. You might
blow the shipping schedule on Quickbooks, and if that misses tax season, "oh
dear.")

It's always interesting how the larger Fortune 500 software companies adopt
practices associated with teeny-tiny little software firms, and vice versa.

~~~
bguthrie
I fundamentally agree with your point: changing processes at large software
teams is really, really hard, risky, and expensive.

But if I may pick a nit about something I think you were describing as a
potential process improvement, per-team branches have been tried at scale many
times before, both with and without Git. Many smart people I know who've had
to clean up subsequent messes believe, counterintuitively, that the larger and
more complex you get, the more branches hurt you, because they encourage you
to delay integration, and integration is painful. Git merge capability, while
excellent, can't save you if two teams independently decide to rename a method
that both rely on; that's still a conflict. I've heard of several other large
shops relying successfully on toggles with a single mainline.

Fowler describes this pain in terms of feature branches rather than team
branches, but it's the same idea:
<http://martinfowler.com/bliki/FeatureBranch.html>

Branching is a poor substitute for genuine modularity.

~~~
patio11
I'll buy that -- I was just picking an example of a process tweak which
sounded like a potential improvement that, absent scale, would be accomplished
in two days with a project roadmap that would fit in a blog post.

------
plinkplonk
I have seen some of this code (worked at Intuit briefly) and it is some of
_the_ most horrible Java (for web products) and C++ (desktop apps) code I have
seen in my life. Admittedly this was a few years ago, but I've never ever seen
such convoluted code anywhere, with (literally) thousands of cut and pasted
fragments and poke-your-eyes-out code. You could look at a fragment of code
and discover >1 in error every other line. I once asked my boss to select a
random file of code from one of the flagship products and discovered 40 errors
in 5 minutes (he was _not_ happy, long story). It is a wonder these apps work
as well as they do.

At the time they had this weird home brew mix of PSP and "agile" as their
'methodology' and while it generated a lot of meetings and paperwork, it
didn't improve the code one bit. I am glad to know that they are moving
towards better practices, but Intuit is _the_ most technically inept
organization I've seen, so I wouldn't be too hopeful of the end results. OTOH
they _do_ understand their customers (and marketing to them) _really really
well_. I learned a lot from Intuit about these aspects of product dev. Which
goes to prove you can make billions with totally screwed up engineering.

~~~
edwinnathaniel
There's no way 10M LoC can be improved without taking a loss in the balance
sheet.

No Way.

Because in order to do that, you need to stop writing new code effectively and
just heads down and improve things. The thing is, if the codebase is tightly
coupled, you've got tons of work to do as typically you can't improve one
thing without changing the other. In the absence of unit-testing, nobody has
the guts to refactor the code without high-risk of breaking the product.

I haven't even mentioned the cultural (human) issues that need to be
fixed/changed as well.

So people truck forward, like in any software development shops :).

Speaking of which, thanks for sharing your insight working for Intuit. The
article never mentioned the actual quality of code, but instead focusing on
things that he improved :) (not a small feat, but probably don't improve the
quality of the code).

~~~
anthonyb
The cultural issues are usually the largest issue (qv. Conway's Law). You
don't need to stop writing new code, but you do need to stop adding crap code
(or at least stop adding it faster than you can clean it up).

It's why I'm a big fan of code ownership as a negative reinforcement tool.
Despite it being a really bad idea, it does tend to rein in the cowboys -
because they're too busy fixing the bugs in their code to add new ones.

------
makecheck
Having lots of lines of code isn't a badge of honor, nor is it a particularly
good measurement to begin with. I wish that publications would stop stating
this number like it means anything. I could probably think of 10 ways off the
top of my head to skew the lines of code that a function would require without
even affecting the size of the binary (heck, just change the style of
_braces_...). I could change one line and create a buggy mess, or remove 1000
lines and create a brilliant masterpiece of efficiency.

Something that _might_ mean more is growth in code complexity over time. For
instance, if measured every month for a year without changing the compiler
version or platform, how much bigger or smaller did the compiled binaries
become? How much more or less memory and runtime were consumed? How many
"severe" bugs were found? When overlaid with information on the features that
were added or removed during that time, it should become a lot clearer how
large the code base really is and how maintainable it will be.

~~~
austenallred
"How Company X manages a 15,000 pixel icon."

~~~
Trufa
I don't understand you comment. Could you please explain? Sorry, thanks.

~~~
austenallred
Saying that a company has 10,000 lines of code is just as relevant as saying
they have a design that's 10,000 pixels big. It could hold more data, or it
could be something super simple just made really poorly and very bloated.
"Lines of code" is not a good unit of measurement for complexity.

~~~
GlenAnderson
What would you use instead?

LoC isn't a perfect metric but it is very easy to relate to. Given some extra
context like the type of application, the size of the company or the age of
the codebase one can mentally account for some of the weaknesses of using LoC
as a metric.

If this were a study using LoC as a sole metric with no other context I'd
agree with you but LoC seems perfectly adequate in this case.

~~~
Darmani
Halstead complexity? Cyclomatic complexity? Those are just a couple of famous,
old-school ones.

Software metrics is a major research area, and dates back to the 60s. The
paper "Software Metrics: A Roadmap" has a good summary of the state-of-the-art
as of 2000.

~~~
Silhouette
Could you, off the top of your head, give a rough idea of what cyclomatic
complexity or Halstead complexity corresponds to a "large" project? In fact,
given even a very simple code snippet, could you state on a cursory
examination what these complexity values would be? Could most people reading
this discussion?

If not, your alternatives don't serve the required purpose. Everyone gets that
a project with 10 million lines of code is big. Whether it's bigger in any
useful sense than another project of 8 million lines isn't really the point,
and any alternative that doesn't have an immediate intuition for people
reading the discussion isn't helping much.

------
j45
This is some great insight into the world that:

\- doesn't have the luxury of having code that's not backwards compatible,

\- code that can't be thrown out every 3 years

\- code that has been around for more than 5 years and isn't going away

~~~
maximilianburke
Sounds a fair bit like the code base I work on. We have in the neighborhood of
5 million lines of (mostly C++) video game middleware that my department works
on, all of which must be backwards compatible, all of which can't be
significantly overhauled easily. It also builds on 15 different target
platforms with 2-5 configuration variations per platform.

Similarly, we use Perforce. Similarly we have a CI system that builds our code
on a farm of VMs, though ours does fairly extensive unit testing on every
build. We also have a significant number of version permutations we test on a
fairly regular basis as well.

We regularly, automatically, run Valgrind and it's on my to-do list to hook up
the Microsoft static analyzer and investigate code coverage tools.

One saving grace is that the code base is very modular and has decently quick
iteration times in general. Sometimes the legacy and the degree of cross-
platform support can be frustrating but for the most part the infrastructure
works so well now that it's no big deal.

------
damian2000
"Unit testing isn't a big part of QuickBooks for Windows — the bulk of the
codebase was written before unit testing was acknowledged as a best practice."

If there was ever an application that would benefit greatly from unit testing,
this is it.

~~~
whazzmaster
I'm a developer on QuickBooks for Windows.

Code bases where fundamental aspects (models, database interaction, custom UI
libraries, etc.) were developed and solidified far before unit testing was the
norm or expected are a tough nut to crack, but we are making improvements
every day. Within the last three months I've solved compilation and linker
hurdles and we're now integrated with googletest and googlemock, which are
fantastic libraries!

Also, while our list of C++ unit tests is small but growing now, the use of
NUnit, etc. was championed from Day One on C# projects stretching back to our
usage of .Net 1.1.

It's interesting how much working with Ruby the last 7 years has affected my
C++ development. I'm dual-wielding Avid Grimm's Objects-on-Rails and Michael
Feather's Working Effectively with Legacy Code with great results.

~~~
damian2000
Well done - you are obviously doing a good job with a huge codebase.

------
tfigment
That was almost a blast from my past. They were using a lot of the tools I
used at a previous company. That company's UI code base was 3 MM SLOC or so
and we used perforce and silk though we didn't use any build accelerators as I
had the UI down to under 1 hour from scratch and the backend was under 3 hours
if I recall and this was 8 years ago or so.

That codebase was a beast and there was code from 1979 floating around in some
of the core Fortran routines. The scary part was the app ostensibly had
backward compatibility to files created in the 80s though it was well designed
enough to have anticipated most of cases.

I'm sort of surprised their build takes so long without accelerators but then
again I'm not as without using precompiled headers I think our build was
several hours longer. If they could not structure their projects to take
advantage of pch files and if the source is very #include heavy I can see it.
How external code references are handled is another reason interpreted
languages rock and Java/.NET are superior to C and C++ build systems in my
experience.

------
codegeek
I work in investment banking and this article is no surprise. Most of the
banks I have worked at so far have old legacy sphagetti code all around and
like someone mentioned, it is extremely difficult to work on refactoring which
does not add any benefit for the _business_ in short term. I have had many
conversations with managers and even though everyone knows, no one wants to be
the guy who changed status quo.

------
tsahyt
How do you manage to write 10 Million Lines of Code for an accounting
software? I haven't had a proper look at it yet, but that seems like overkill.
That's an entire OpenSolaris worth of lines in an accounting software, just to
put this into perspective.

~~~
AndrewDucker
UK tax law is 11,000 pages long. Once you've converted that into code, badly,
it could easily be 100 lines of code per page.

~~~
woobar
Isn't QuickBooks limited to sales tax/VAT? Intuit sells other products to deal
with taxes (Lacerte, TurboTax, QB Payroll).

~~~
whazzmaster
No, payroll code must track fed, state, and local tax rules in order to
calculate employees' paychecks and payroll taxes due.

~~~
woobar
But that's a separate product - QuickBooks Payroll.

Also, instruction for payroll deductions are pretty simple[1]. Usually
everything you need is on one page. Plus a very short list of exemptions
(401(k), HSA, FSA, cafeteria). Nowhere near 11,000 pages of tax code.

[1] <http://www.suburbancomputer.com/tips_state_tax.php>

~~~
whazzmaster
QuickBooks Payroll is an add-on whose code lives side-by-side (and is deployed
with) the QuickBooks Desktop code. I am the developer that maintains the
integration of the TurboTax tax processing code and engines technology with
QuickBooks' accounting and payroll functionality.

"Payroll deductions are pretty simple" is not as true when you require
nationwide support. The Yonkers residency tax, the Indiana counties payroll
tax, etc. are not as simple as getting the employees' states and running
calcs. Add in non-tax deductions (401(k), wage garnishments, worker's comp)
and you've now found yourself in an interesting world.

------
jobu
_In the past, Burt says, when a build took 4 hours instead of 45 minutes..._

Wow! And they've apparently spent a tremendous amount of time and resources
optimizing the build process.

It's been 10 years since I worked on a project where a build took more than a
few minutes. Back then, a project I worked on took over 3 hours to compile on
a 4-processor Solaris box. The PC technology improved so drastically that in a
few years we were able to get it cross-compiling to Solaris from a Linux box
in under 20 minutes.

~~~
larsberg
Not terribly surprising. Back when I worked at MSFT, I spent ~3 months (and
had two other developers and a couple of build engineers working with me)
taking our 3 _day_ build and getting it down to ~12 hours. But, that was
several times larger than this system (even then), and it was about 7 years
ago.

And it had to support all sorts of weird things such as "which compiler do you
use to build the compiler; the compiler you just built, the compiler you last
used, or the compiler we last shipped?" for a very large configuration of
compilers, runtime platforms, etc.

~~~
bcbrown
When I worked there a couple years ago, requesting a custom Windows build
still took overnight, if you wanted more exhaustive testing than just xcopying
a couple binaries.

------
johnnygoods
Badly. I've never seen a company ship so many bugs so consistantly.

------
melling
"We can build the entire product — 8 to 10 million lines of code — in 45
minutes,"

Hearing this makes me want to learn Go. Maybe changing a language to improve
compilation speed is a worthwhile feature.

------
zbruhnke
I think I could've summed this article up in one word. Before reading it.

POORLY

~~~
josephcooney
How many code bases of this size have you managed?

