
How We Built the Software that Processes Billions in Payments - drewolson
http://www.braintreepayments.com/inside-braintree/how-we-built-the-software-that-processes-billions-in-payments
======
swombat
And _that_ is how you write a great job advert.

More specifically, it gets across:

\- the company culture

\- the development practices/methodologies in place

\- what's exciting about the work/company

\- what technologies they like (and the fact they use a variety of
technologies)

\- the fact that they get that typical job adverts suck

... all while looking like a regular article.

Great work.

------
breck
How do you guys maintain an ever growing code base?

It seems to me that the practices you have--TDD with full code coverage and
regression tests, pair programming everything, many client libraries--would
lead to a massive amount of code that needs to be maintained.

In my experience I've found that TDD often leads to a huge amount of code that
needs to be rewritten if you do a major change to your codebase. If you have a
new pattern you want to implement it may require rewriting dozens or more
tests. Also, when I'm working alone I constantly go back and clean up
functions and what not to make them shorter and clearer. I've found I do that
less while pairing because you want to keep moving forward.

My opinion is probably just biased by the fact that I've done relatively
little pair programming and TDD. I'm sure the more you do the better you get.

But I'm curious if you could take some about the size of your code base and
how these practices effect that?

~~~
brown9-2
How do you safely make those major changes to the code base _without_ all
those tests?

~~~
Retric
One of my favorite stores from an old and vary experienced coder. Someone had
been fired after a spending around a year on a project that never quite
worked, it was hack upon hack and had failed several passes though QA with
major issues etc. So he spent 2 weeks cleaning the thing up redoing about 1/2
of it in the process and sent it to QA to see if anything else was missing.

Anyway, a week later he get's a call that some fairly basic functionality was
broken in production which he fixed. Afterward he asked some people in QA how
it ended up in production so quickly and how they had missed such a basic
issue and their response was "Ops, we stooped testing your code a few years
ago this is the first time it bit us."

TDD always struck me as an attempt to answer to the question what do I do if I
don't know if this actually works and nobody is going to QA this crap. But, if
you actually consider what happened TDD was unlikely to help because the coder
was simply unaware that existing code was broken and needed to do something
else.

~~~
mnutt
Sure, but that just means that QA knew something about how the application
worked that the developer didn't. In an ideal world the developer would be
familiar enough with the product to write good tests. In the reality, TDD is
good for checking the really repetitive edge cases, and QA is good for
catching business logic failures that the devs aren't aware of.

------
yellowredblack
_Testing: testing is at the forefront of our development philosophy. We never
need to check our code coverage to know that it's at 100%: with disciplined
TDD, no line of code will be written without a test._

Bravo. In my experience that can be overkill, but with finance, I agree: why
risk it. TDD everything.

 _We don't have a QA team._

WTF?

 _That might be terrifying when you consider the type of software that we're
building, but we're confident that our automated testing is thorough and will
catch any regression bugs._

Are regression bugs the only kind of bugs?

 _We use continuous integration to test every version of every client library
against our gateway._

What happens when someone uses your client library in a way you didn't
anticipate?

What happens when johnny-botnet hits your API directly without using the
client library?

I spent several years developing games, with QA teams that outnumbered the
developers. The QA team did not just play the levels through and say "it
works!". Sure, they did that for the first hour. Then they'd start doing all
those things that they thought someone might try (e.g. in a fit of boredom, or
for a laugh). After that they'd just try breaking stuff. What a lot of bugs
they would find!

As I write I find it hard to believe that I, a game developer, is having to
explain the importance of QA to a _financial_ company.

~~~
dan_manges
Dan from Braintree here. I think a QA team would be valuable; we just don't
have one right now. To answer your rhetorical question, regression bugs aren't
the only type of bug, but they're one of the most dangerous in a payments
system. If there's a bug with an unanticipated use of a library, it will show
up in our sandbox environment before merchants hit it in production. But if
functionality that works in production breaks because of a change, that's a
really serious problem.

~~~
yellowredblack
How are regression bugs more dangerous than other kinds of bug?

Here's how I'd rate the "dangerousness" of a bug:

    
    
      1. How easy is it to detect?
      2. How easy is it to reproduce?
      3. What are the consequences of it occuring?
      4. How likely is it to happen?
    

Look, bravo for all the TDD. TDD eliminates a huge chunk of bugs. But by
definition, the bugs that you find with CI are easy to detect, easy to
reproduce and 100% likely to happen. Sure, _without_ a TDD/CI system, these
bugs may not have been detected, may not have been easy to repro. But the
reverse does not hold: a TDD/CI system doesn't make _all_ bugs easy to detect
and easy to repro.

So all the other bugs that your system has right now, are the ones that are
left: hard to detect, hard to reproduce, and don't always happen. Now turn on
a thousand users. How many users are you hoping to have btw?

Your worst kind of bug:

    
    
      * Is not detected for months.
      * Unable to reproduce.
      * Company killer. (Reputation, lawsuits, whatever).
      * Happens once every 40,000,000 sessions.
    

Not detectable using TDD and CI. Company still dead.

~~~
hello_moto
And QA would be able to detect these worst kind of bugs?

I'm not suggesting that QA is useless, I think QA should guide developers in
terms of testing, as in QA should help writing the test-cases including the
corner-cases in spec and let the developers write more tests around those
things.

I also think that QA should help performing benchmark tests, load tests, and
probably write end-to-end automation-tests (what do they call it these days?
Acceptance tests?)

Last but not least, QA should redefine the software processes if bugs happened
regularly in a particular area. Consider QA to be a manager that responsible
for the productivity of your software team: if a software process doesn't work
(let's say one day you found out that TDD doesn't work well), QA should detect
that and figure out a better way.

Unfortunately, QA these days are still old school button clicker and test-case
fanatics (i.e.: prepare 1000 test cases and ask the director for a week to run
them all).

But at the end of the day, bugs exist. No amount of human or practices would
cover those exotic bugs.

------
sawyer
Love the insight into BT's process.

I'd love to hear some more about pair programming, has anyone here done it
extensively enough to shed some light on the pros and cons? My gut is that it
would be less productive than a simple code review procedure, but does it
reduce bugginess to a level that offsets that productivity loss?

~~~
wisty
I've heard it's bad for creative stuff, but really good if you know your
requirements.

For payment processing, that's a good thing.

As I understand it, the productivity isn't too bad, as the programmers egg
each-other on. Sort of like having an obnoxious micro-manager over your
shoulder, without the obnoxious bit.

------
thadeus_venture
Different strokes for different folks i guess. I know a lot of people will be
fans of this, but

>we pair program to write all of our software. We work on Mac Pros with two
keyboards and two monitors. We work in an open team room; no cubicles or
private offices.

No thanks, if I'm the developer. And if i wouldn't do it, why would i make my
employees.

~~~
getsat
I've worked at Pivotal Labs' office in San Francisco and done pair programming
at a few companies now.

When you're pair programming, you're not wearing headphones and "getting into
the zone". You're openly collaborating, sharing thoughts, bouncing ideas,
prototyping things on a whiteboard, and so on. Just like at a party, you
subconsciously filter out the surrounding noise when you're talking to your
pairing partner. An open office is perfectly fine for this. I didn't think it
would be a good situation at first, either. :)

However, I personally do not enjoy pair programming. There's a few reasons
why, but the big one is that it's mentally exhausting. Eight hours of engaging
in conversation completely wipes me out even if it results in amazing code. I
couldn't deal with it any more. To a lesser degree, I do not get the same
sense of accomplishment from completing tasks when pair programming that I do
from completing tasks by myself.

That said, if I ever run a company with a handful of programmers or more, I'm
going to hire engineers who like pair programming.

------
goo
Ironically, their website is not handling the load from HN (and anywhere else
they're presently linked from), from what I can tell.

I don't mean to complain - when you're so focused on other parts of your
business, it's easy to let things like preparing your front-end for heavy
bursts slip by.

~~~
phinze
Yeah, you're spot on. We've so been focused on scaling our Gateway and
services that we didn't prioritize the infrastructure serving our marketing
site. It could really use some love, which we'll be giving it in the very near
future. There should be enough caching in place now to handle the HN bump, and
we'll be keeping an eye on it.

------
3am
I wouldn't advertise you don't have a QA team.

~~~
elbenshira
Why shouldn't they? With TDD and pairing, I'd say that their developers _are_
the QA team.

~~~
3am
Well.. I disagree, but I wouldn't have downvoted you because it's a fair
point.

It's a bad idea for developers to QA their own code for a number of reasons.
1) Developers have cognitive blinders, like everyone else. They might not test
for something that they is unlikely 2) Some errors can be impractical to catch
outside of top level integration testing (getting into unexpected states in
state machines or race conditions) 3) There is a conflict of interest between
deadlines and meeting requirements.

I have seen companies reach 100+ developers using this approach, conclude it's
unsustainable, and be forced to make exceptional efforts to build a QA team. I
believe reason 3 is the biggest risk.

------
sigil
_...we are able to perform all our maintenance without downtime. We can deploy
new versions of our software, make database schema changes, or even rotate our
primary database server, all without failing to respond to a single request.
We can accomplish this because we gave ourselves the ability suspend our
traffic. To make this happen, we built a custom HTTP server and application
dispatching infrastructure around Python's Tornado and Redis._

Why is it necessary to suspend traffic to make these kinds of changes? Just
curious.

~~~
ary
Probably because the data transformations and storage required to complete a
transaction need to be handled by a coherent version of their code. Processing
payments with a half-updated stack sounds painful and error-prone.

------
becomevocal
Awesome post. I deal with payment gateways daily, and you guys seem to be a
cut above the rest. Keep it up!

------
evertonfuller
Too bad you're US only.

That's one thing you can't seem to do.

