

CTF3 architecture - gdb
https://stripe.com/blog/ctf3-architecture

======
gibybo
That was a lot more sophisticated than I expected! By the way, I think I was
responsible for about 10k of those 640k pushes :)

I was really hoping for an overview of the solutions people found for each
level though. Going through individual write ups to find them all is a little
tedious.

Here's the rough list of what I was aware of:

level0 (take input of words from stdin, output to stdout whether they exist in
the dictionary):

a) Read file into a hash set, test each word as it comes in.

b) Read file into a bloom filter

c) Realize that it is always the same dictionary and pre-compile it into a
hash set/bloom filter. The fastest solutions were generally of this type,
written in highly optimized C with a ~600kB bloom filter compiled into the
binary.

level1 (gitcoin mining):

Not very familiar with this one, but I believe near the end several people had
written custom GPU miners with openCL and were running them in small clusters
of EC2 GPU instances.

level2 (ddod prevention):

I am also unfortunately not very familiar with the high scoring solutions to
this level, since I never really figured out how idle time was calculated.
However, the one common and easy solution was:

a) Keep a list of requests per IP and ban any IP after it made more than n
requests (4 was a good number).

level3 (text search):

a) Shell out to 'grep -r -n query path', parse out the relevant part of the
path/line number and submit (surprisingly, this worked to get a score just
above passing if you tried a few times).

b) The intended solution was probably to build text search with a trigram
index/suffix array/etc. I know a few people did this with reasonable success.

c) A faster solution was to realize that the tester only ever queried words
that were in the dictionary. You still had to return results for words in
which the query was a substring, but this is a finite set that could be pre-
computed during indexing (or even before uploading your solution, since the
dictionary is known in advance). The fastest (legitimate) solutions were of
this type. Single node, single threaded (because requests were sequential) TCP
servers written in C in which the answer to every possible query (only 250k
possibilities) was pre-computed and stored in memory during indexing. It was
just a matter of running this until you got a sufficiently optimal run, since
the bottleneck was always the ~1.68ms of latency the test framework added to
the request.

level4 (distributed consensus):

a) Just proxy all requests to a hardcoded master node which ran a single
sqlite instance in memory. You would lose a connection to it, but as long as
your queries were fast you would score enough points early in the round to
pass even after the single-point-of-failure tester was added.

b) Implement Raft. Conveniently, the example code was laid out suspiciously
similar to the open source goraft implementation, which served as a drop-in
solution. Minor tweaks were necessary to account for the test framework
pinging non-leader nodes (which meant you just had to proxy to the leader).

c) The most clever solution I was aware of, which I believe was discovered by
the highest scoring player (xthexder), was to abuse the underlying http
library in the octopus test framework. To understand how it worked, you must
first understand that the test framework used a stripe developed system called
'Octopus' which would run your nodes in isolated containers that could only
communicate with each other over specifically designated Unix sockets. Octopus
would do nasty things to these sockets such as adding latency, severing
connections between certain nodes, or partitioning whole sets of nodes from
each other. However, the nodes were always running and they always had a
connection to Octopus. So if you could convince Octopus to carry your message
to the other nodes for you, you wouldn't have to worry about any of this silly
distributed consensus logic.

The way xthexder discovered to do this, was to send 302 redirects back to
Octopus with the query and the location of the hard-coded leader node (which
was running a custom in-memory SQL database that was optimized for the very
few queries that octopus ever used). I don't remember the exact details (read
his write up for more info!), but it required some trickery to get Octopus to
respect the redirects, but once it did you were able to get a perfect
connection to a master node with 0 intra-node traffic (which would normally
reduce your score).

I also discovered a couple really dirty tricks for level0 and level3:

a) The test framework actually used predetermined test cases so that if you
failed the test, you could rerun the exact same test locally and see where
your code went wrong. The test cases were randomly selected after you pushed
your code, so you couldn't hard-code the solutions into your code (that would
be cheating!). Except there was a weakness: The build phase ran on a container
that allowed external network connections so that you could download packages
from common package repositories like npm. The test framework would also
output the test case it selected to you (but not to your code) prior to
running the build phase. This meant you could write a request in your build
script to your own server which would hold open the request until you read the
test case from the test framework output, ran it locally, packaged up the
answers, and gave them to your server for it to return back to your running
solution on the test container. This allowed your solution, running on the
test container, to have the answers to level0 and level3 loaded in memory
before the test framework even sent its queries.

b) Method 'a' was patched over the weekend by moving the test case disclosure
after the build phase, so I went looking for a workaround. What I found was
just about as good. It turns out there was a fixed number of test cases for
each level (I believe it was 1000 for all levels). All of them were publicly
available on AWS S3, you just needed their names which I got from continually
running a git push, parsing the test case, aborting, then running git push
again. I ran at about 30 cases per minute, which meant running through all
1000 only took ~30 minutes on average. I downloaded every test case, found the
ones that had the highest scoring potential (generally the ones with the worst
benchmark score) and hard-coded the answers to those in my solution. I then
had a script to run 'git push', parse out the test case it would run, abort,
and rerun until it got one of the test cases that I had hard-coded. It's
important to note that the remote server still reported the chosen test case
prior to finishing the test run, so it only took a couple seconds to decide
whether to let the run finish rather than waiting a minute or two for it to
fully run. With 10 hard-coded test cases (out of 1000 possible) and checking
one possibility every 2 seconds, it only took an average of ~3 minutes to get
a run where the code running on the test server would once again have the
answers loaded in memory before the test framework sent a query.

~~~
lunixbochs
My solution to level0 accounted for about 10% of the total process runtime
(measured against an empty `int main() {}`. The rest was process/libc setup
time.

I realized after the competition you can compile with -nostdlib or
-nostartfiles and halve the process overhead, which would almost double any of
the top 10 scores at that point.

~~~
voltagex_
On this tangent, where do I go to learn how to write C with nostdlib? I've
tried this on both MSVC++ and gcc and it's really painful.

~~~
recentdarkness
Well you just use system calls and import functions from other shared libs
dynamically. There is really not more to it. The stdlibs are supposed to make
your life more comfortable, of course its not that comfy without them.

------
sync
All that for a service that was up just a week.

"We put together a quick-and-dirty LDAP server implementation"

Yeah, no problem. Just slapped that together. Clearly some great sysadmins at
Stripe.

Coming from a hands-off heroku world, this definitely increases my respect for
ops guys/gals.

~~~
gdb
:)

Even though it was just up for a week, people's Git commit histories show
there were 20,000+ hours (2.3 programmer-years) spent playing CTF. So it was
worth the time investment!

------
dmunoz
Thanks for the writeup.

I remember thinking during level1 "Wait, they have miners going for each user?
No way... these have to be premined (wrong, makes sense now why) somehow." I
recall after thinking that, looking at the timestamps and seeing that they
didn't exactly align with when the git pull happened. I would check, do some
work, check, do some work, etc., and a new commit would come in that looked to
be only a few seconds after the last commit, although it clearly wasn't
available to me at that time.

------
tlrobinson
_" One consequence of this architecture was that submitting Gitcoins was
decently slow — we weren't maintaining persistent connections to the backend
gitcoin server, so there was a decent amount of overhead. We compensated for
this by tuning the difficulty to ensure the time to mine a coin was large
compared to the time to complete a push."_

Interestingly, this is the reason Bitcoin confirmations are 10 minutes and not
less: it gives plenty of time for new blocks to be broadcast to the majority
of the network.

It's not currently a big issue since the maximum block size is 1 MB (50%
propagation typically takes less than 10 seconds:
[http://bitcoinstats.com/network/propagation/](http://bitcoinstats.com/network/propagation/))
but to scale to Visa volumes block sizes will need to be increased to hundreds
of megabytes.

It's also a reason why quicker confirmation times of Litecoin and Dogecoin
aren't necessarily a good idea.

------
mbell
Thanks to Stripe for running the ctf! It was great fun.

I'd be interested to know how much of the load was generated from folks
looping 'git push' on level4 in order to find a test case the benchmark bombed
on (or just to rng the goraft race condition). The variability in scoring on
that level was huge.

------
brown9-2
I think there is an error in this writeup - _testasaurus_ is a great name!

Curious - was all of this set up on EC2 or a similar environment?

~~~
gdb
Yep, we ran everything on AWS, which was nice since we could directly use a
lot of the management infrastructure we've built for Stripe.

~~~
brown9-2
Thanks. One more question: the writeup mentions that Stripe is hiring for
remote folks in US timezones. Is that for all engineering positions? I could
be missing something but the jobs page doesn't mention anything about remote
people.

~~~
gdb
Yes, that's for all engineering positions. We've been so busy building CTF we
haven't had time to maintain the jobs page :).

------
ChuckMcM
My word that looks like a lot of fun to put together.

