

Stripe CTF3: Distributed Systems is live - jazzychad
https://stripe-ctf.com

======
antonyme
This is awesome! Except - I haven't finished the Matasano CTF yet! Too many
flags, not enough time. :/

~~~
tptacek
Ours will be up for awhile. Stripe's is time-limited. If you're interested in
what Stripe's doing here, I'd do theirs first.

~~~
aparadja
Part of me wishes that you too would have a tight time limit. Your level 19
still keeps me up at night. It disturbs my work. It makes me slightly worse as
a person.

~~~
tptacek
If you'd like us to lock you out, we can do that for you. :)

~~~
voltagex_
Apart from not being able to run the challenges on Firefox, the Matasano CTF
has been great. Now, if only I could figure out level 2...

------
gleenn
Level0 was fun, Level1 threw me immediately into "This looks like it's gonna
suck". I know Git fairly well, but I don't know it _that_ well. I poked at the
shell script a bit, but I just ain't seeing it. It's also a lot harder to work
on without the support of the easy benchmarking ability that was provided in
Level0. How do I know if I'm actually improving.

Props in general, I just feel like Level1 was kind of a brick wall for me.

~~~
ch0wn
I thought the shell script was actually a really great way of introducing you
to the subject. You can basically look at the script, make out the "hot" part,
optimize that part and you're done.

~~~
nayefc
How would you optimise internal git commands or a loop that will still need to
brute-force either way?

~~~
emillon
Rot13 for spoilers:

Jura vaibxvat tvg va n furyy ybbc, lbh'er jnfgvat lbhe gvzr sbexvat. Qbvat
guvf va n fvatyr cebprff jvyy fcrrq vg hc n ybg.

~~~
nayefc
That's what I suspected. So essentially, mimic that git command in bash
without calling the git command (including the header stuff)?

~~~
emillon
It's not calling the git command that's problematic, it's calling an external
command at all. So you pretty much have to recode the hot part in another
language, where you just manipulate buffers instead of piping stuff around.

~~~
nayefc
I did exactly that but my SHA-1 (in Ruby) is ending up different than Git's.
Even weirder, when I take the header + body, and use an online SHA-1, I get a
third and different value! (Not equal to git-hash-object nor to mine in Ruby).
Is the body supposed to be modified to add LEDGER content?

~~~
fragmede
do you have 'commit $body.length\0'?

~~~
apgwoz
This was the part that I stumbled on for a few hours late at night and kept me
up. Next morning, when investigating what git hash-object did, it was obvious.
should have just went to sleep.

------
lasryaric
Getting the following output when pushing my code:

remote: > Your submission has been placed in the queue. remote: > Kicking off
a build for your submission (running `./build.sh`). remote: > `./build.sh`
succeeded remote: > Kicking off 3 trials. Here goes... remote: > Started
running Trial 0 remote: > Started running Trial 1 remote: > ERROR: There was
an internal error scoring Trial 0. If this error persists, please let us know
at ctf@stripe.com and include the following error token: err_3MRLt7sxWqqD4C To
lvl0-xkzdabbt@stripe-ctf.com:level0 7fcfd3d..17d0e35 master -> master

My code looks to be working because the test ./test/harness says that tests
pass.

~~~
grey-area
I was getting the same on lvl2, seems to be fixed now, so perhaps try again.
The performance after it works does seem to be extremely variable though.

------
codygman
I hope Haskell is installed, I just wrote a quick solution which worked with
my local dictionary. However, I need to add argument parsing. _Yawn_. I guess
I'll see tomorrow!

~~~
tel
GHC 7.6.2 is installed (I checked the container) and you can use cabal as
though you had a sandbox due to the containment.

------
dclara
Are we allowed to discuss solution here? Or everybody has to work out the
results separately?

I don't know Ruby, but I can guess what it is doing basically:

1\. Read a list of words from a file with an input path or a default path

2\. Take the runtime input from the user with any number of words before
carriage return (I guess the words are separated by spaces)

3\. Do a loop to compare each of the input words to lower case against the
words in the list loaded from the file. If there is a match print out the
original word, other wise print out something wrap up the word.

It's a O(NxM) algorithm. If the data set is small, we won't see to much
difference. Otherwise, we need to concern about better algorithm to improve
the speed.

1\. Sort the list from the file (remove duplicates)

2\. Always check the either half to reduce the number of checking in the list

3\. Sort the input list (save the number of duplicates)

4\. Print the duplicates directly without comparison again etc.

This way the runtime can be improved to O(N*lg M). This may not be the best
algorithm. I'd like to hear from you about better solutions.

~~~
thezoid
Or just populate a hash with the input and see if the entries exist in the
hash. If they do print it otherwise print the non matchy one.

~~~
jwcrux
Exactly- hashes and sets are much more efficient than arrays.

~~~
dclara
Of course. If it's in Java, I'd definitely do that. We are using Hashmap
everywhere. Since the original code looks like this, I thought Ruby does not
support hash.

We usually implement code on the application level, so we don't really care
about the underlying algorithms. But if we have to implement on the system
level, we do care about the how to manipulate the memory and number of
executions, like Big O.

~~~
goodwink
The original code is like this because it's an exercise. Of course ruby has
hashes.

~~~
dclara
The CTF does not give a description of the problem, but the code instead. And
it persuades the new users not to change algorithm too much, just save a few
steps, which implies to have minor changes of the code only.

Look at my another comment, hash may not be always better than list.

~~~
zwily
Sure a hash isn't always better than a list. In this case, it most definitely
is. :)

~~~
dclara
Yes, agreed. I was sort of mislead or implied. It'd be better if CFT can give
out the problem description followed by the sample code or don't give any
sample code at all.

For me, it makes more sense to receive the detailed information, especially
the system condition and restriction for trouble shooting and problem solving,
followed by brainstorming, instead of going directly to fix code. Because we
are used to the pattern to make minimum code change, especially in production.
If the code should be completely changed, then we need to know the requirement
and re-implement it. Sounds like we have different convention though.

~~~
grey-area
The change required is a tiny one.

~~~
dclara
Unfortunately, I cannot agree with you completely.

If there is no language constraint and the system resource constraint, to the
problem we have understand so far, using Java will be the fastest and easiest
way without hashmap.

Load the complete file as a string (depending on how the size of the data set,
up to 2^31 - 1), then using string.indexOf() function will get the best
result.

The underlying algorithm for indexOf() is implemented by JVM in C code which
is must fast than any other implementation.

My gut told me that it's weird to use hashmap to do string lookup. Everybody
knows hashmap is used to lookup key-value pairs. The real reason for not using
hashmap here are:

1\. hashmap's lookup Big O is O(n), but not the build cost. if the data set
size is huge, it takes long time to build the hashmap since every new element
exceeded the initialCapacity being added needs a rehash

2\. the underlying implementation of indexOf() will use a sort of algorithm
called "automata" or something else to do a fast search within a string.

So there are lots of alternative solutions. Don't always think there is only
one. I'm not in this field, and I'm not interesting to get into to it too
much. But I don't think the best answer is that tiny change.

This is why I suggested to consider if you are doing application level
optimization or changing system level algorithm. Building software is a lot
more than code manipulation. Understanding requirement is the first step in
the SDLC (Software Development Life Cycle).

~~~
grey-area
Sure, you're welcome to do try it in another language, they almost encourage
you to. No one said there's only one way, but for many simple problems a hash
is a good solution.

The best answer is one that passes the test, and using ruby it is pretty easy
to do so, but you could do it any way you like.

~~~
dclara
I'm not sure what you mean about "many simple problem". Is this one of the
example for that? I'm trying to compare the two alternative solutions for this
problem:

1) Hashmap 2) string index

which is better? I think about it what I'm going to do if I'm using it in my
web application.

First the answer is still relying on the size of the data set. If the
dictionary is huge, say up to 4G, I'd use string index vs hashmap, because the
memory space is expensive. And how to break into mutiple sub strings is
another performance tuning issue.

If the problem is simple enough with not too large data set, hash will be
working.

When I mentioned "one way", I mean the "best way". So now you are talking
about "The best answer is one that passes the test". So do you mean that all
the answers which can pass the test are the best answers, or there is only one
best answer which can pass the test? I don't put my personal preference on the
problem solving. I'm always looking for the best solution for a particular
problem under certain condition and constraints. Once we figure out the
answer, coding implementation using which language does not matter that much,
unless Ruby does not support the same algorithm of string.indexOf() as Java
does.

Hope this discussion helps.

~~~
grey-area
_So do you mean that all the answers which can pass the test are the best
answers_

Yes, what I was trying to say was that the performance required is just that
to pass the test, not more, and the dataset is a few MB here, not 4GB. This is
actually really similar to a lot of problems in real life; you can spend ages
trying to find a platonic solution when a simple solution works fine given the
dataset and requirements (return an answer within 200ms for example).
Sometimes simpler is better, and even if you can improve the solution, it
won't really matter to whoever pays the bills.

There are lots of solutions though, I tried a few just out of curiosity and
you can of course improve on a hashmap - the possible solutions to a problem
this small are pretty similar whatever language you choose, and sometimes when
a dataset is this small other more complex solutions are slower (unless you
preindex).

~~~
dclara
I have no intent to choose complicated solutions over a simple issue. First,
I'm always concerned about the size of the data set. The first sentence I
said, if the data set is not big enough, you cannot tell too much difference.
Second, I didn't insist on any solutions, so far, I proposed three of them and
would like to discuss with you guys on the pros and cons under different
conditions. Some people in other posts told me that the first solution using
sorting + binary search actually is way faster than hashmap, which deserves
more than 2000 points. I tried to verify whether it's true, but nobody
answered. Some people helped to provide the Big O comparison on the hash
lookup algorithm and linklist, which is also constructive. But did you learn
from it that building hashmap cost more than O(1)? Lastly, I proposed the Java
solution with string index, did you ever hear about that?

I don't need to spend too much time on finding the solutions, they are on top
of my head. Depending on different conditions I'll use different solutions.
4GB is the upper bound of string indexing, if it's being used for web
indexing, it's still not enough. If in this case it is used in document
indexing for enterprise level with a few MG, it's fine for using any of them.
But the difference is the score you get.

I guess eventually you will pick hashmap solution because you have the indexes
built in the lookup, which makes more sense over other solutions, but you (or
they) don't give the condition out in the first place. How can we discuss
based on that? Looks like I pretty much wasted my time on this issue and was
taught to learn that making things simple is better. Thank you.

------
lukasm
Cannot push

It looks like you have another in-progress Git operation, and we limit you to
one concurrent operation. Waiting 2s (attempt 1/3) It looks like you have
another in-progress Git operation, and we limit you to one concurrent
operation. Waiting 2s (attempt 2/3) It looks like you have another in-progress
Git operation, and we limit you to one concurrent operation. Waiting 2s
(attempt 3/3) ERROR: Timed out waiting for your other Git operation to
complete. Try again once it has finished. fatal: Could not read from remote
repository.

Please make sure you have the correct access rights and the repository exists.

~~~
gdb
Yep, that happens if you try pushing from multiple clients at once (in order
to prevent people from spamming us with solutions). If you're still getting
that even after a few minutes, ping us at ctf@stripe.com and see what's going
on.

~~~
lukasm
I tried once, but it stoped. I waited a couple of minutes and then ctrl+C'd
it.

first push:

git push warning: push.default is unset; its implicit value is changing in Git
2.0 from 'matching' to 'simple'. To squelch this message and maintain the
current behavior after the default changes, use:

    
    
      git config --global push.default matching
    

To squelch this message and adopt the new behavior now, use:

    
    
      git config --global push.default simple
    

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Counting objects: 37, done. Delta compression using up to 4 threads.
Compressing objects: 100% (33/33), done. Writing objects: 100% (36/36), 897.66
KiB | 0 bytes/s, done. Total 36 (delta 4), reused 14 (delta 0)

------
danielweber
"An unexpected error occurred"

When I make an account.

 _EDIT_ I probably already had one. Maybe I failed level -1 for that.

~~~
jazzychad
should be fixed now

------
dkhenry
Well my first thought was to use c++11 to do the same thing to give me a
better basis for moving forward, and I can't seem to use anything outside the
c++ standard libraries. Thats cool because we can always staticly link
everything and be done with it and

    
    
        version `GLIBCXX_3.4.18' not found (required by ./level0)
    

:-(

~~~
gdb
As a note, if you send us requests for Ubuntu packages, we'll happily install
them — just send an email to ctf@stripe.com.

~~~
dkhenry
The missing package that started all this with the boost regex library. I sent
an E-mail asking for all the libboost-dev to get installed that should be more
then enough for any problem that you could throw at a c++ developer.

~~~
gdb
One libboost-dev coming right up :).

------
goodwink
Any chance this will be open sourced later so that we can run the docker stuff
locally for our own team's usage for training etc.?

------
CWilliams1013
Looks like the git repositories used to submit solutions are over SSH?
Unfortunately this won't work in an environment that requires outbound traffic
to go over an authenticated proxy. Any chance of getting git-over-HTTPS
support?

~~~
pbiggar
Try Corkscrew - it lets you tunnel SSH over http proxies:
[http://www.agroman.net/corkscrew/](http://www.agroman.net/corkscrew/)

------
mbrameld
Anybody else timing out when trying to push the solution?

~~~
gdb
We're on it! Will be fixed shortly.

~~~
gdb
(Has now been fixed for a few minutes.)

------
drharris
Man, no way this is going to run on any sort of Windows environment. Guess I
won't get to sneak in CTF time at work this time around.

~~~
mjallday
Try vagrant ([http://www.vagrantup.com/](http://www.vagrantup.com/)) and spin
up a VM!

~~~
jamesgeck0
Note that `vagrant up` NEEDS to be run from an admin console for Level 2. The
sample application is written in node, and npm wants to make symlinks on the
mounted Windows filesystem, which is apparently a privileged operation.

It's a bit of a shame the test harness isn't Windows compatible; it's rather
small, and it doesn't look like the fixes required would have been very
difficult.

------
grey-area
Is anyone else having trouble compiling level3.jar on OS X? I get a lot of
this:

[error] import java.nio.file._

~~~
abbeyj
I'm not on OS X but I could only get it working after installing OpenJDK 7. I
was trying with OpenJDK 6 and getting errors similar to what you're seeing.

------
jwcrux
I'm pulling an empty repo for level1.

Isn't it supposed to start with a TXT file and a sample?

~~~
gdb
Sorry about that -- should be fixed.

~~~
jwcrux
Can confirm it's fixed - thanks for the update.

~~~
danielweber
Is level 1 kind of a test of the computational power we can bring?

~~~
gdb
There's a lot more to it than that, actually. (We'll talk about it more once
the contest is over.)

~~~
danielweber
It feels like there is someone constantly pushing Gitcoins (with the "nonce"
messages) so fast that by the time my miner finds a coin my old head is out of
date, meaning I have to pull and start over.

Which I find alternatingly very frustrating and very hilarious.

Going to understand git internals a lot more before I'm done with this.

~~~
grey-area
Yes, it seems like it is changing every minute or so (or perhaps it's just my
imagination), which makes it pretty difficult to test all the hashes even if
trying millions. I might have to switch to a faster language :(

~~~
fragmede
> it is changing every minute or so It is, just like in Bitcoin - there are
> bot miners you are racing against (otherwise the provided miner.sh would
> eventually work)

> I might have to switch to a faster language :(

Probably not, I did it in python.

There's a systems problem in how the miner is architected that you can tweak
to give you a better chance of winning the race.

~~~
grey-area
Thanks, got it now.

------
blktiger
It's only open for 1 week? :(

~~~
gdb
Yep — it's a lot of work for us to keep these running. (See
[https://news.ycombinator.com/item?id=7074064](https://news.ycombinator.com/item?id=7074064)
for a blow-by-blow account of previous CTFs.)

~~~
gsastry
What are the options for us after the week is up if we want to do it?

~~~
matthuggins
I'm gonna go with: "not doing it". (Alternative answer: "wait till next
year".)

~~~
gdb
Not sure yet! We'd love to package it up nicely for people to run, but I can't
make any promises about when we'll have a chance to do so. Note that it should
be pretty easy for us to just distribute the source code for each level, which
would give you most of the functionality of these levels.

------
lpgauth
remote: > We're temporarily out of available test cases. This probably means
we're under pretty heavy load, but don't worry -- we're creating more right
now. Please try again in a few minutes.

meh.

~~~
gdb
Sorry, should be fixed now! Turns out lots of people want to score their
solutions :).

~~~
lpgauth
Also, looks like you can get very different scores depending on which VM you
get... Pushed a second time and jumped to 148 from 119.

------
drux
No t-shirts this time ?

~~~
pc
Hm? From the post: "If you complete all the levels, we'll send you a special-
edition Stripe CTF3 T-shirt."

~~~
drux
Ups, my bad

------
lukasm
Can I use python?

~~~
eieio
As far as I know just about any language works. I know python works: I
submitted a python solution for level 0.

Just make sure that the level0 file is an executable file and change the
crunchbang to run python instead of ruby.

