
How I Hacked Hacker News (with arc security advisory) - dfranke
[Condensed version of this narrative: the news.yc code, prior to the
the release of arc3, contains a remotely-exploitable vulnerability
permitting account theft. Anyone running a news installation who has
not yet upgraded to arc3 should do so.]<p>Hacker News login cookies are random eight-character strings, stored
server-side in a hash table mapping them to user names.  I discovered
a few weeks ago that these strings were rather less random than they
were meant to be, and, through a delightful combination of exploits,
could be predicted, enabling an attacker to steal accounts.<p>Here's the rand-string function from arc.arc, version 2.  It gets
called with n=8 to generate login cookies, and n=10 for the "fnids"
that get used all over the site as hash keys identifying closures.<p><pre><code>  (def rand-string (n)
    (with (cap (fn () (+ 65 (rand 26)))
           sm  (fn () (+ 97 (rand 26)))
           dig (fn () (+ 48 (rand 10))))
      (coerce (map [coerce _ 'char]
                   (cons (rand-choice (cap) (sm))
                         (n-of (- n 1) (rand-choice (cap) (sm) (dig)))))
              'string)))
</code></pre>
The first thing you might notice about this function is that not all
characters are equally probable.  Each digit has a 1/30 chance of
occuring, while each letter has a 1/78 chance.  This alone is no big
deal: this distribution means that each character carries 5.826 bits
of entropy, versus the 5.954 that a uniform distribution would
provide.  So for an eight-character string, this bug reduces the
effective keyspace by just over a factor of two -- not enough to have
any practical implications.<p>The 'rand' function is an arc primitive, bound directly to mzscheme's
'random':<p><pre><code>  ; need to use a better seed
  (xdef 'rand random)
</code></pre>
The comment seen here is prescient, as we'll see.<p>This is the C function which implements mzscheme's 'random' function:<p><pre><code>  static long sch_int_rand(long n, Scheme_Random_State *rs)
  {
    double  x, q, qn, xq;

    /* generate result in {0..n-1} using the rejection method */
    q  = (double)( (unsigned long)(m1 / (double)n) );
    qn = q * n;
    do {
      x = mrg32k3a(rs);
    } while (x &#62;= qn);
    xq = x / q;

    /* return result */
    return (long)xq;
  }
</code></pre>
Where mrg32k3a() is:<p><pre><code>  static double mrg32k3a(Scheme_Random_State *s) { /*(double), in {0..m1-1}*/
    double x10, x20, y;
    long   k10, k20;

    /* component 1 */
    x10  = a12*(s-&#62;x11) - a13n*(s-&#62;x12);
    k10  = (long)(x10 / m1);
    x10 -= k10 * m1;
    if (x10 &#60; 0.0)
      x10 += m1;
    s-&#62;x12 = s-&#62;x11;
    s-&#62;x11 = s-&#62;x10;
    s-&#62;x10 = x10;

    /* component 2 */
    x20  = a21*(s-&#62;x20) - a23n*(s-&#62;x22);
    k20  = (long)(x20 / m2);
    x20 -= k20 * m2;
    if (x20 &#60; 0.0)
      x20 += m2;
    s-&#62;x22 = s-&#62;x21;
    s-&#62;x21 = s-&#62;x20;
    s-&#62;x20 = x20;

    /* combination of component */
    y = x10 - x20;
    if (y &#60; 0.0)
      y += m1;
    return y;
  }
</code></pre>
This, obviously, is not a cryptographically strong PRNG.  Is it possible
that we could break it, computing its internal state by seeing a few
consecutively-generated rand-strings?  Probably: it looks as though it
could be represented as the solution to a manageable system of diophantine
equations.  That, though, was more math than I felt like doing, so I went
looking for an easier approach.<p>Where does the RNG seed come from?  Ah ha:<p><pre><code>  rs = scheme_make_random_state(scheme_get_milliseconds());
</code></pre>
Where scheme_get_milliseconds is defined, after eliding some
preprocessor cruft, as:<p><pre><code>  long scheme_get_milliseconds(void)
  {
    struct timeb now;
    ftime(&#38;now);
    return now.time * 1000 + now.millitm;
  }
</code></pre>
In other words, the random seed is merely the number of milliseconds
since epoch at the time the seed function was called.<p>The part of mzscheme that calls the seed function is a bit daunting:
it appears that in some cases, the PRNG state can be thread-local and
be initialized when the thread is spawned.  However, instrumenting
sch_int_rand() with some debug output showed that in arc, the same
state vector gets used everywhere, and is initialized when the
mzscheme runtime starts up.<p>The millisecond at which news.yc last started is not an immediately
simple thing to determine, though it was at least easy to verify the
sanity of the system clock, thanks to an open NTP serevr:<p><pre><code>  dfranke@feanor:~$ sudo ntpdate -q news.ycombinator.com
  server 174.132.225.106, stratum 2, offset 0.370866, delay 0.08228
  17 May 01:45:13 ntpdate[27901]: adjust time server 174.132.225.106 offset 0.370866 sec
</code></pre>
So for a start, I thought, perhaps I could determine the server's
start time to within a few seconds or minutes.  A boring way to go
about this would be simply to monitor the server for downtime, and
record when it became accessible again.  But impatience is one of the
three great programmer's virtues, and the best way to predict the future
is to create it, and so forth, so I decided on a more proactive
approach: crash it!<p>A couple months ago, PG left this comment after news.yc recovered from
some downtime:<p><pre><code>  HN was down today for around 2 hours. Sorry about that.

  The News server currently crashes a couple times a day when it runs
  out of memory. All the comments and stories no longer fit in the 2 GB
  we can get on a 32 bit machine. We'd been planning to upgrade to a new
  64 bit server. In the meantime it was arguably a rather slow form of
  GC.

  Unfortunately the process somehow got wedged in the middle of
  segfaulting. We're not sure why and will probably never know. But that
  meant the process that usually notices when News is wedged and
  restarts it was unable to kill it.
</code></pre>
(The server had since been upgraded, so these crashes are/were no longer
happening.)<p>I figured that the watchdog works by requesting a page and checking to
make sure it gets a response, and that if it doesn't get one, then it
assumes the server is wedged and restarts it.<p>Here's arc2's top-level request handler:<p><pre><code>  (= srvthreads* nil threadlimit* 50 threadlife* 30)

  ; Could auto-throttle ips, e.g. if one has more than x% of recent requests.
  (= requests* 0 requests/ip* (table) throttle-ips* (table) throttle-time* 30)

  (def handle-request (s (o life threadlife*))
    (if (len&#60; (pull dead srvthreads*) threadlimit*)
        (let (i o ip) (socket-accept s)
          (++ requests*)
          (= (requests/ip* ip) (+ 1 (or (requests/ip* ip) 0)))
          (let th (thread 
                    (if (throttle-ips* ip) (sleep (rand throttle-time*)))
                    (handle-request-thread i o ip))
            (push th srvthreads*)
            (thread (sleep life)
                    (unless (dead th) (prn "srv thread took too long"))
                    (break-thread th)
                    (close i o))))
        (sleep .2)))
</code></pre>
So, there's a limit of 50 concurrent threads, and threads are killed
after 30 seconds if they haven't already terminated.  So if I were to
hold open 50 concurrent connections, and the watchdog were to run during
the following 30 seconds, then the server ought to restart.<p>The watchdog code has not been released, so rather than soil my hat
color by DoSing the production server, I decided to continue hacking
on my local install on the assumption that I had the ability to
determine the server's start time to within one minute.<p>So, a one-minute interval is 60,000 possible PRNG seeds.  If I kept
polling to see when the server came back up after the watchdog killed
it, then let's very conservatively assume that I could be among the
first 50 people to issue an HTTP request.  Each page that comes back
from the server typically contains 2-3 fnids, so the reply I got would
contain some from among first few hundred to be generated, and thus
from among the first few thousand iterations of of the PRNG.<p>This leaves determination of the PRNG seed comfortably within the
reach of brute force: run the PRNG for 10,000 iterations for each of
the 60,000 possible seeds, and see which one produces the fnids I saw
in response to my request.  I wrote a program that does just this:<p><pre><code>  http://dfranke.us/hacknews.c
</code></pre>
So now I was able to determine PRNG seeds, but I couldn't conclude my
adventure quite yet.  Since logging into news.yc is an uncommon
operation compared to simply browsing around, only a tiny fraction of
rand-strings that the server generates correspond to login cookies.
Furthermore, since fnids and login cookies have different lengths, and
since the PRNG gets called for a few other purposes at unpredictable
times, every individual PRNG iteration begins a candidate login
cookie.  That's 40 or more false candidates produced for every page
view.<p>Nonetheless, online brute force would still be manageable.  If each
page view produces an average of 50 candidates, and one in every
thousand page views is a login (this might be slightly optimistic),
that's 50,000 attempts necessary in order to find a working login.  HN
gets about 500,000 hits on a busy day, so this could be done in a day
or two while likely staying under the radar.<p>A marginally more efficient approach would be a bit of social engineering:<p>1. Request a page.  Find a generated fnid from the page source and
look it up in our candidate list.  Call this A.<p>2. 
  ERC&#62; /join #startups   
        &#60;dfranke&#62; Hey guys, I haven't been able to log in to news.yc
                  since the server restarted a little while ago.  Anyone 
                  else having problems?
        &#60;jrandomsucker&#62; dfranke: Works for me.
        &#60;dfranke&#62; Hmm, weird.  I'll just try again later I guess.<p>3. Request another page, note the fnid, find it in the candidate
list.  Call this B.<p>Step 4: Test the cookies that fall between A and B.<p>If this conversation takes one minute, then this reduces the search to
about 17,500 attempts -- less than a day's worth at a modest rate of
querying -- and possibly picks up multiple accounts in the process.<p>Epilogue:<p>I sent PG a draft of this post.  RTM and I wrote a better implementation
of rand-string which reads from /dev/urandom and obeys a proper uniform
distribution.  This new version appears in arc3:<p><pre><code>  (def rand-string (n)
    (let c "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
      (with (nc 62 s (newstring n) i 0)
        (w/infile str "/dev/urandom"
          (while (&#60; i n)
            (let x (readb str)
               (unless (&#62; x 247)
                 (= (s i) (c (mod x nc)))
                 (++ i)))))
        s)))
</code></pre>
PG removed the 50-thread concurrency limit and replaced it with a
per-IP rate limiter, so the DoS attack described here should no longer
work.
======
edw519
Fellow hackers, take note. _This_ is how you solve a problem! dfranke is
Pandora, a rat in a maze, Sherlock Holmes, General Sherman, William Randolph
Hearst, and your father all wrapped in one.

Like Pandora, he is so curious, he _has_ to check this out.

Like a rat in a maze, he keeps going looking for the clear path.

Like Sherlock Holmes, he applies logic to determine the next step.

Like General Sherman, he keeps marching, building tools along the way as he
needs them.

Like William Randolph Hearst, he defines the landscape. ("You provide the
pictures, I'll provide the war.") "so I decided on a more proactive approach:
crash it!" (hilarious)

And like any parent, he didn't quit until his baby walked.

Thank you, Daniel. I sure hope you've found a way to channel that talent in
your day job.

~~~
froo
_Like Pandora, he is so curious, he has to check this out._

Slight error, Pandora was a female in greek mythology (well, the first female
to be more precise)

~~~
theblackbox
most people don't know this and in the interest of knowledge I thought I'd
point it out:

Pandora was the wife of Prometheus (the Titan of Knowledge) who made all the
animals and humanity (last of all and out of the only material he had left -
clay). In order to give them life, he stole fire from the gods and gave it to
his creations, for which he was bitterly punised. In one telling he was tied
to a mountain and his heart/liver was eaten by an eagle/vulture and grew back
every day. HOWEVER, in another telling, his punishment was PANDORA and her
famous box.... I just always found that interesting, vivisection every day for
eternity, or a curious wife... fine lines =P

~~~
froo
And in the other telling, the box wasn't actually a box, but was a jar. It was
incorrectly translated by someone (I forget their name, I should go look it
up) and has stuck ever since.

EDIT - just looked it up, it was incorrectly translated by Erasmus of
Rotterdam when he first translated Hesiod's tale into Latin from Greek.

------
chime
Thanks dfranke. All these years, whenever I thought of the true hacker, this
is what I pictured at the back of my mind - material complex enough for me to
take out my Stats and Liner Algebra books. Every other web hack attempt over
the past decade has been XSS, bad passwords, and stupid form submission
issues. Frankly, I had given up on the existence of true whitehat hackers till
this post. Hats off to you Sir.

------
pg
Thanks to dfranke for giving us time to release a fix, and in fact writing
part of it.

~~~
dschobel
I hope you scoop him up for one of your YC teams.

That hack was damned impressive for breadth and depth of knowledge, pretty
rare in my experience.

~~~
SwellJoe
I imagine Daniel wouldn't have trouble getting a job with a YC company, if he
were so inclined. If we were hiring (couple more months...), we'd certainly
talk to him.

------
jgrahamc
Very nicely done. This is precisely the attack that I described in a HN post a
month ago. Glad to hear that it would have worked if I'd had time:
<http://news.ycombinator.com/item?id=596126>

~~~
dfranke
Wow. This post has been sitting on my hard drive for a little under a month
waiting for arc3's release, and the hack itself was a few evenings' work, so
this was simultaneous discovery damn near to the day. I'm not sure who was
first; I never saw that comment at the time.

~~~
jgrahamc
I'm not trying to be first. All the kudos goes to you.

~~~
dfranke
I'm not quibbling over credit; I just think this kind of simultaneity is a
fascinating phenomenon. No doubt both of our thought processes got tweaked by
the cluster of other security posts that were showing up on HN at the time.

~~~
mixmax
it's actually not that uncommon. I spoke with my patent attorney some time ago
and he could attest that very often the exact same inventions are sent to the
patent office only days apart. Sometimes even on the same day. Famously Elisha
Gray sent in his patent application for the telephone at the same time as
Alexander Graham Bell. A letter was sent out to the two parties asking them to
defend their application in order to find out who should be awarded the
patent. Gray's company didn't find it worth the trouble to respond, so the
patent went to Bell.

The point is that people and the ideas they have are influenced by external
factors, many of them unconscious. But we will pick up on the same clues,
think about the same problems, and maybe come up with the same solutions based
on external factors. Nothing can stop an idea whise time has come.

~~~
redcap
There's also the Calculus and the primacy dispute between Leibnitz and Newton.
The dispute itself isn't that interesting, but that something so complicated
was created in the background of natural philosophy by two different people.

(I've recently read The Baroque Cycle again, does it show?)

------
noonespecial
I've always been a dfranke fan. Ever since reading his "Code free or die()"
essay. He used _"beg the question"_ correctly. I got misty eyed. That's good
peeps.

And, oh look, he's also a fantastic hacker.

Thanks for taking the time to write this out. This is the kind of hack I enjoy
most. Watching the combination of obscure facts and astute observations come
together into a coherent and powerful whole is a pleasure.

------
dschobel
Very impressive. Are a you a professional security researcher or just a
programmer who tilts that way?

~~~
reduxredacted
The qualities that make a great developer and a great professional security
researcher are about the same.

Both involve taking hardware, a programming language, an API, an application
or something else that someone in the computing industry had one purpose for,
and bending it in a way to produce a rather unexpected, very unique result.

A great developer builds Web 2.0 with something as novel and limited as
JavaScript and HTML.

A great security researcher made your toaster catch fire from a different
continent (didn't they always say our toasters would be internet connected?).

I, personally, think the latter is more fun provided it doesn't cause any
actual damage (and I think that was demonstrated ... a whitehat makes his own
toaster catch fire and sometimes tells everyone who has that model of toaster
how to fix it).

Nicely done.

~~~
bobappleyard
Developers think of ways to make things.

Security guys think of ways to break things.

That's the main difference.

~~~
dannyj
Security guys know how things are made thus how to break them. Thinking as a
security guy can help developers to make things that are difficult to break.

------
soegaard
> This, obviously, is not a cryptographically strong PRNG.

Why is this obvious?

The generator used is L'Ecuyer's MRG32k3a :
[http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00s...](http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00s.pdf)

Is there a known weakness?

~~~
dfranke
It's not really a question of known weaknesses because it's not designed to be
secure in the first place; it only performs a single round. Its purpose is to
be fast and to be "random enough" for scientific applications.

------
michael_dorfman
Nicely done, and excellent explanation. Kudos.

------
biohacker42
This reminds me of the exploits I used to read in phrack back when I was still
in college. There should be a book that collects great hacks like this one.

------
alain94040
I love smart people.

(plus having the ethics to actually do no evil - great combination)

~~~
thetrumanshow
I wonder if ethics is just economic thinking with a low tolerance for certain
types of risk, or whether it's a real conviction/prime directive.

------
notdarkyet
Off topic yes, I cruised through his profile to see what else he was involved
with or does and found this small piece he wrote that I found to quite good. I
would recommend checking it out.

<http://dfranke.us/cfod.html>

~~~
abefortas
I enjoyed that essay, although he never cleared up my confusion as to what he
meant by libertarian. I generally use it as a synonym for anarchist, but he
sounds (and I could be wrong) more like a right libertarian. Is this right?

~~~
barry-cotter
You Spanish? In US English at the very least libertarian maps to anarcho-
capitalist/classical relatively closely. Anarchist is almost always a synonym
for anarcho-syndicalist.

~~~
abefortas
American. Are you saying Daniel Franke is an anarcho-capitalist?

~~~
dfranke
I am.

------
psifertex
Great work and writeup. I think my favorite part of the whole thing is that
you clearly started with the goal (or somewhere early on decided that was the
goal) of finding a bug and kept going until you'd landed it.

That's the difference between a true hacker at heart and someone who just
stumbles across something. Tenacity.

------
icey
This is pretty interesting stuff, but man it would have been nice if you were
able to post it somewhere with actual HTML formatting. The gray on gray is
tough after a full screen or two.

~~~
ralph
Gray on gray is awful. Does only one person like it? :-) Here's a fix for
Firefox that doesn't need the overhead of Greasemonkey.

<http://news.ycombinator.com/item?id=563492>

~~~
kirubakaran
Thanks. I now use the following and my HN headaches are gone:

    
    
      @-moz-document url-prefix(http://news.ycombinator.com/) {
         td { color: #000000 !important; }
         td.title a:visited { text-decoration: line-through !important; }
         td.subtext { padding-bottom:2em !important; }
      }

------
joshhart
Wow. I just realized you were the same person I sat next to in CIS 3020 at UF
- what a terrible waste of a class. You switched my keyboard to Dvorak :(

Nice to see you're doing well, I recently joined a startup myself.

------
mariorz
_> "Nonetheless, online brute force would still be manageable. If each page
view produces an average of 50 candidates, and one in every thousand page
views is a login (this might be slightly optimistic), that's 50,000 attempts
necessary in order to find a working login. HN gets about 500,000 hits on a
busy day, so this could be done in a day or two while likely staying under the
radar."_

You would have a fnid that is in the cookie hash table, yet you still would
not know to which username it is mapped to, correct?

~~~
dfranke
I was looking for login cookies, not fnids. Fnids are worthless since there's
already code that checks that the user who called the closure is the same as
the user for whom it was created.

~~~
mariorz
My mistake. So you can get a valid login, but you can't know whom you'll be
login in as, that is without doing some social engineering like with the irc
example. Impressive hack.

~~~
xenophanes
i think if you get a valid login cookie, and use it, it will tell you what
account you have in the top right.

~~~
mariorz
Well obviously, but you couldn't do a brute force attack like this to a
specific account.

~~~
e1ven
I believe the implication is that you'd have a sessionid. Effectively, the
username and password rolled into one unique number, stored in the cookie.

~~~
dfranke
I think by 'specific account' he means 'chosen account', in which case he'd be
correct without more targeted social engineering.

------
qhoxie
Nice and thorough explanation but most of all you seemed to handle this
gracefully. Well done.

------
akamaka
Aside from being a good hack, I really enjoyed reading your write-up!

------
khandekars
Neat. This reminds of the attack on the SSL stack in an early version of
Netscape.

------
chanux
Who says open source code is less secure?

~~~
cturner
Isn't it cool that with free software your users can have the rush of the hack
and demonstrate the fix without having to screw your production server?

------
pyman
I can't believe what I'm reading: You are superman. You are my hero. I want to
be your friend. I WANT TO HIRE YOU!

Let me tell you something. If this poor guy gets hired by one of you, he'll
probably end up working with a retarded person and reading cookies for the
rest of his life.

To the hacker: Keep having fun mate, but don't allow your time to be cluttered
up by things that don't make a difference, like a cookie.

PyMan

------
dlnovell
This is the most detailed and well written write-up of a complex hack I've
ever read. I was an absolute joy to read.

Thanks for sharing it with everyone!

------
wastedbrains
Wow I must say I am impressed Kudos fine sir!

------
dbul
If only "What is the story with the most points?" had been asked a few weeks
later than it was:

<http://news.ycombinator.com/item?id=604323>

Previous record holder with 530 points:

<http://news.ycombinator.com/item?id=459289>

------
lexx12
awesome! :)

~~~
Derrek
Really? Downmoded because of the comment "awesome! :)"? Pretty harsh.

------
ShellDragon
Brilliant sir! I'm your slave.

~~~
dfranke
sudo make me a sandwich.

------
csbartus
wooooow:D

what a job, what a feedback, what a community!!!! no awesomenes but more warm
warm warm!

~~~
clarkin
Impressive!

