How Balanced Automates Testing and Continuously Deploys

mahmoudimus · on March 20, 2013

If these kinds of problems interest you and you’re looking for a real challenge, contact us! We’re always looking for sharp and talented individuals that can make an impact.

NmQ2ZjYzMmU3Mzc0NmU2NTZkNzk2MTcwNjQ2NTYzNmU2MTZjNjE2MjQwNjU2MzZlNjU3MjY1NjY2NjY5NjQ2MTY1NmI2MTZkNmY3NDc0NmU2MTc3Njk=

juskrey · on March 20, 2013

No offence. (and also using your excellent service) But.

My statistical background have always questioned me, are these guys, who are posting rebuses on job pages are actually searching for rebus solvers or need their job done?

Or did they ever conduct a practical coding experiment with applicants who love rebuses and who dont..

argarg · on March 20, 2013

It took me exactly 20 seconds to get the email address behind this ... and I'm not a professional rebus solver. I just happen to know how a base64 and hex encoded string looks like.

msoad · on March 20, 2013

I thought it's a new way of securing email addresses. I didn't even understood this is a puzzle!

lifeisstillgood · on March 20, 2013

dammit - I thought I had decoded gibberish for 5 minutes there ! COuld not work out what I had done wrong.

when you say 20 seconds did you reach for a bash command ? interested knowing which.

solarkennedy · on March 21, 2013

Here is some bash kinda:

mahmoudimus · on March 21, 2013

You should apply. I'm looking for people like you :)

64bittechie · on March 22, 2013

iwanttomakeadifference@balancedpayments.com

I really wouldn't hire someone who wants to use grep to parse out html[1] :P

Then again, I would not like to work for a place where engineers don't understand this ;)

[1] http://stackoverflow.com/a/1732454/366152

argarg · on March 20, 2013

Nah I just googled for an online base64 decoder, then for a hex decoder.

lifeisstillgood · on March 20, 2013

hah!

so much for import base64!!

mahmoudimus · on March 20, 2013

You gave it away :P

masonhensley · on March 20, 2013

!gnihtyreve ton ,llew

juskrey · on March 20, 2013

Oh no! Not another puzzle! ;)

j-m-o · on March 20, 2013

I think there's probably a pretty high correlation between people who can solve them, and people who can get the job done (or quickly learn...).

I wonder if the results of that study would show whether it makes any difference. :)

juskrey · on March 20, 2013

Actually, there were some studies that showed absence of correlations of ability of solving puzzles with actual productivity. See "The Invisible Gorilla" book by Simons and Chabris. Very rewarding reading.

npcomplete · on March 20, 2013

(I work for Balanced) Just a simple thing to keep bots and Nigerian princess promising giant riches for small exchange of name/address/ssn/bank account/horoscope off. Also, we have explicit emails from people who decoded and it and found it fun. So, yes, this was a data driven decision

64bittechie · on March 22, 2013

Decode this - 25599cbb64c13f5385d1a4b3acb946f27f350b9

I'll give you a billion dollars for it.

jareau · on March 20, 2013

it's also great for screening out recruiters :)

gsinkin · on March 20, 2013

http://02varvara.files.wordpress.com/2011/04/01-how-about-no...

jjoos · on March 21, 2013

Nice post! Someone on my team just posted a similair post!

http://blog.factlink.com/post/45861768695/yolo-spend-less-ti...

pulledpork · on March 20, 2013

Any light on how to pull back once a change has been deployed? If it's version controlled you can check out a previous version but do you automate tht?

msherry · on March 20, 2013

Hi there. I'm the author of this post.

We use git, so it's easy to revert commits, or push an update with '-f' that resets to an earlier version. Once we push this new commit (that simply puts us back to an earlier state) to our release branch, it's picked up by the testing system just like any other commit and pushed out.

pulledpork · on March 20, 2013

That seems like a slow process if you're "oh shit, rollback, rollback, rollback!"

msherry · on March 20, 2013

If we're already at the "Oh shit" stage, then we might be happy sacrificing test coverage to go back to known-good code, in which case we would run our fabric `deploy` task and have our rollback completed in ~30 seconds. The idea behind this process is that we never have "oh shit" moments like that :)

(Please see my other response to a similar question, as well)

wahnfrieden · on March 20, 2013

We used a similar continuous deployment process for Canvas, minus the staging server (now that we're doing iPad software, things have changed). A nice next step is to deploy to a single server at first and only roll out to the rest of the servers once that one has been audited - if that server starts failing, divert traffic from it until it gets back to a healthy state.

mjallday · on March 20, 2013

This is a great idea. Can you share any insight into how to monitor this via an automated process as part of a deploy and how to finish scheduling the remaining deploy once it has been verified?

I'm guessing we could use Jenkin's join plugin for this and have a job that just waits for x minutes but time does not necessarily correlate to a feature being used.

wahnfrieden · on March 21, 2013

I'm still doing that step manually as I figure out how I want to automate it. An easy way would be to emulate your load balancer's health check (or interface with your load balancer if you can), in our case this is AWS's ELB which just hits a /ping endpoint on the instance that gets handled reasonably deeply in our stack and puts that instance out of service after enough consecutive failures.

It helps to look at what your past failures have been when going live in production to know what to audit during incremental rollout. In our case, /ping has stopped responding before, but some other common cases were increased 500 response rates or severely decreased performance across average or peak response times. These can be used as metrics when we automate. It doesn't help much for infrequently-used features, but I think the main idea is to prevent all of your instances from going down/going haywire at once together, which is unlikely to be a problem caused by such features even if you rolled out to all at once.

Would be interested to hear if you get that working with Jenkins to handle incremental rollout, I haven't tried yet. I'm not even sure how this would be done in Fabric but that might be another option (and have Jenkins call Fabric and block on its completion).

mahmoudimus · on March 21, 2013

wahnfrieden,

Can you reach out via support @ balancedpayments.com? Come hang out on our IRC - irc.freenode.net #balanced.

Would love to talk to you more!

smackay · on March 20, 2013

The article mentions it takes 10 minutes to do a release so if you have revert to an earlier version you could be looking at significant (for a payments processor) downtime. I presume you have a strategy for rolling back directly on the servers?

msherry · on March 20, 2013

The whole idea behind this testing infrastructure is so that we're fairly certain that any deploy isn't going to leave us scrambling to revert to an earlier version. Since we've implemented it, we haven't had to do that yet -- it's caught all manner of issues large and small that we're glad never made it into production.

The 10 minutes is to run the full suite of tests. In the case of an emergency, if we needed to get code out and wanted to bypass tests (e.g. reverting to an earlier version), we could run our deploy task manually, in parallel on all machines, and be done in about 30 seconds. This is obviously a nuclear option that we'd hope to never have to use, but if we were reverting to known-good code after a botched deploy, it might be our best bet.

This process is a safety net. If we want to work without a net, of course we have more options -- this is the tradeoff that we've made.

EDIT: small clarification

tieTYT · on March 20, 2013

950 unit tests? The number of unit tests sounds very small. I was expecting 5000+. Why so small? What makes your team decide to write a unit test?

msherry · on March 20, 2013

Hi there. Why would you expect 5000+ tests, or anything more than 100? It seems like the number of tests that is the "right" number would be very dependent on the size of the codebase, which wasn't mentioned in the article.

We write tests when new code is added, or a bug is fixed and we want to make sure it doesn't reappear. Hopefully we add new tests without prompting, because that's the right thing to do, but our coverage enforcement will goad us into writing tests if we forget. Ideally, new code should be testable by only a few tests -- if it requires more, then it's probably too complicated and should be refactored.

masklinn · on March 21, 2013

> Hi there. Why would you expect 5000+ tests, or anything more than 100?

It might be a factor of testing methodology and style, many people consider that each assert is a test/each test should only assert a single things, so the number of tests grows large. I was also surprised at a "mere" 950 unit tests (our web frontend has maybe 10% test coverage and reports almost 700 tests, but that's because the test runner counts each assertion as a test, not each test case — of which we have 200)

msherry · on March 21, 2013

Ahh, I see. Right, 950+ is the number of unit tests we have, each of which contains a number of assertions. If we count those, it's well into the thousands, so I guess the original question wasn't that far off the mark.

64bittechie · on March 22, 2013

Are you kidding me? The number of tests has nothing to do with the quality of "testing" being done. You could have 5 billion tests and still miss obvious bugs. You're one of those who use LoC as a measure of productivity.