Hacker News new | past | comments | ask | show | jobs | submit login
How I Hacked Hacker News (with arc security advisory)
928 points by dfranke on June 3, 2009 | hide | past | favorite | 78 comments
[Condensed version of this narrative: the news.yc code, prior to the the release of arc3, contains a remotely-exploitable vulnerability permitting account theft. Anyone running a news installation who has not yet upgraded to arc3 should do so.]

Hacker News login cookies are random eight-character strings, stored server-side in a hash table mapping them to user names. I discovered a few weeks ago that these strings were rather less random than they were meant to be, and, through a delightful combination of exploits, could be predicted, enabling an attacker to steal accounts.

Here's the rand-string function from arc.arc, version 2. It gets called with n=8 to generate login cookies, and n=10 for the "fnids" that get used all over the site as hash keys identifying closures.

  (def rand-string (n)
    (with (cap (fn () (+ 65 (rand 26)))
           sm  (fn () (+ 97 (rand 26)))
           dig (fn () (+ 48 (rand 10))))
      (coerce (map [coerce _ 'char]
                   (cons (rand-choice (cap) (sm))
                         (n-of (- n 1) (rand-choice (cap) (sm) (dig)))))
The first thing you might notice about this function is that not all characters are equally probable. Each digit has a 1/30 chance of occuring, while each letter has a 1/78 chance. This alone is no big deal: this distribution means that each character carries 5.826 bits of entropy, versus the 5.954 that a uniform distribution would provide. So for an eight-character string, this bug reduces the effective keyspace by just over a factor of two -- not enough to have any practical implications.

The 'rand' function is an arc primitive, bound directly to mzscheme's 'random':

  ; need to use a better seed
  (xdef 'rand random)
The comment seen here is prescient, as we'll see.

This is the C function which implements mzscheme's 'random' function:

  static long sch_int_rand(long n, Scheme_Random_State *rs)
    double  x, q, qn, xq;

    /* generate result in {0..n-1} using the rejection method */
    q  = (double)( (unsigned long)(m1 / (double)n) );
    qn = q * n;
    do {
      x = mrg32k3a(rs);
    } while (x >= qn);
    xq = x / q;

    /* return result */
    return (long)xq;
Where mrg32k3a() is:

  static double mrg32k3a(Scheme_Random_State *s) { /*(double), in {0..m1-1}*/
    double x10, x20, y;
    long   k10, k20;

    /* component 1 */
    x10  = a12*(s->x11) - a13n*(s->x12);
    k10  = (long)(x10 / m1);
    x10 -= k10 * m1;
    if (x10 < 0.0)
      x10 += m1;
    s->x12 = s->x11;
    s->x11 = s->x10;
    s->x10 = x10;

    /* component 2 */
    x20  = a21*(s->x20) - a23n*(s->x22);
    k20  = (long)(x20 / m2);
    x20 -= k20 * m2;
    if (x20 < 0.0)
      x20 += m2;
    s->x22 = s->x21;
    s->x21 = s->x20;
    s->x20 = x20;

    /* combination of component */
    y = x10 - x20;
    if (y < 0.0)
      y += m1;
    return y;
This, obviously, is not a cryptographically strong PRNG. Is it possible that we could break it, computing its internal state by seeing a few consecutively-generated rand-strings? Probably: it looks as though it could be represented as the solution to a manageable system of diophantine equations. That, though, was more math than I felt like doing, so I went looking for an easier approach.

Where does the RNG seed come from? Ah ha:

  rs = scheme_make_random_state(scheme_get_milliseconds());
Where scheme_get_milliseconds is defined, after eliding some preprocessor cruft, as:

  long scheme_get_milliseconds(void)
    struct timeb now;
    return now.time * 1000 + now.millitm;
In other words, the random seed is merely the number of milliseconds since epoch at the time the seed function was called.

The part of mzscheme that calls the seed function is a bit daunting: it appears that in some cases, the PRNG state can be thread-local and be initialized when the thread is spawned. However, instrumenting sch_int_rand() with some debug output showed that in arc, the same state vector gets used everywhere, and is initialized when the mzscheme runtime starts up.

The millisecond at which news.yc last started is not an immediately simple thing to determine, though it was at least easy to verify the sanity of the system clock, thanks to an open NTP serevr:

  dfranke@feanor:~$ sudo ntpdate -q news.ycombinator.com
  server, stratum 2, offset 0.370866, delay 0.08228
  17 May 01:45:13 ntpdate[27901]: adjust time server offset 0.370866 sec
So for a start, I thought, perhaps I could determine the server's start time to within a few seconds or minutes. A boring way to go about this would be simply to monitor the server for downtime, and record when it became accessible again. But impatience is one of the three great programmer's virtues, and the best way to predict the future is to create it, and so forth, so I decided on a more proactive approach: crash it!

A couple months ago, PG left this comment after news.yc recovered from some downtime:

  HN was down today for around 2 hours. Sorry about that.

  The News server currently crashes a couple times a day when it runs
  out of memory. All the comments and stories no longer fit in the 2 GB
  we can get on a 32 bit machine. We'd been planning to upgrade to a new
  64 bit server. In the meantime it was arguably a rather slow form of

  Unfortunately the process somehow got wedged in the middle of
  segfaulting. We're not sure why and will probably never know. But that
  meant the process that usually notices when News is wedged and
  restarts it was unable to kill it.
(The server had since been upgraded, so these crashes are/were no longer happening.)

I figured that the watchdog works by requesting a page and checking to make sure it gets a response, and that if it doesn't get one, then it assumes the server is wedged and restarts it.

Here's arc2's top-level request handler:

  (= srvthreads* nil threadlimit* 50 threadlife* 30)

  ; Could auto-throttle ips, e.g. if one has more than x% of recent requests.
  (= requests* 0 requests/ip* (table) throttle-ips* (table) throttle-time* 30)

  (def handle-request (s (o life threadlife*))
    (if (len< (pull dead srvthreads*) threadlimit*)
        (let (i o ip) (socket-accept s)
          (++ requests*)
          (= (requests/ip* ip) (+ 1 (or (requests/ip* ip) 0)))
          (let th (thread 
                    (if (throttle-ips* ip) (sleep (rand throttle-time*)))
                    (handle-request-thread i o ip))
            (push th srvthreads*)
            (thread (sleep life)
                    (unless (dead th) (prn "srv thread took too long"))
                    (break-thread th)
                    (close i o))))
        (sleep .2)))
So, there's a limit of 50 concurrent threads, and threads are killed after 30 seconds if they haven't already terminated. So if I were to hold open 50 concurrent connections, and the watchdog were to run during the following 30 seconds, then the server ought to restart.

The watchdog code has not been released, so rather than soil my hat color by DoSing the production server, I decided to continue hacking on my local install on the assumption that I had the ability to determine the server's start time to within one minute.

So, a one-minute interval is 60,000 possible PRNG seeds. If I kept polling to see when the server came back up after the watchdog killed it, then let's very conservatively assume that I could be among the first 50 people to issue an HTTP request. Each page that comes back from the server typically contains 2-3 fnids, so the reply I got would contain some from among first few hundred to be generated, and thus from among the first few thousand iterations of of the PRNG.

This leaves determination of the PRNG seed comfortably within the reach of brute force: run the PRNG for 10,000 iterations for each of the 60,000 possible seeds, and see which one produces the fnids I saw in response to my request. I wrote a program that does just this:

So now I was able to determine PRNG seeds, but I couldn't conclude my adventure quite yet. Since logging into news.yc is an uncommon operation compared to simply browsing around, only a tiny fraction of rand-strings that the server generates correspond to login cookies. Furthermore, since fnids and login cookies have different lengths, and since the PRNG gets called for a few other purposes at unpredictable times, every individual PRNG iteration begins a candidate login cookie. That's 40 or more false candidates produced for every page view.

Nonetheless, online brute force would still be manageable. If each page view produces an average of 50 candidates, and one in every thousand page views is a login (this might be slightly optimistic), that's 50,000 attempts necessary in order to find a working login. HN gets about 500,000 hits on a busy day, so this could be done in a day or two while likely staying under the radar.

A marginally more efficient approach would be a bit of social engineering:

1. Request a page. Find a generated fnid from the page source and look it up in our candidate list. Call this A.

2. ERC> /join #startups <dfranke> Hey guys, I haven't been able to log in to news.yc since the server restarted a little while ago. Anyone else having problems? <jrandomsucker> dfranke: Works for me. <dfranke> Hmm, weird. I'll just try again later I guess.

3. Request another page, note the fnid, find it in the candidate list. Call this B.

Step 4: Test the cookies that fall between A and B.

If this conversation takes one minute, then this reduces the search to about 17,500 attempts -- less than a day's worth at a modest rate of querying -- and possibly picks up multiple accounts in the process.


I sent PG a draft of this post. RTM and I wrote a better implementation of rand-string which reads from /dev/urandom and obeys a proper uniform distribution. This new version appears in arc3:

  (def rand-string (n)
    (let c "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
      (with (nc 62 s (newstring n) i 0)
        (w/infile str "/dev/urandom"
          (while (< i n)
            (let x (readb str)
               (unless (> x 247)
                 (= (s i) (c (mod x nc)))
                 (++ i)))))
PG removed the 50-thread concurrency limit and replaced it with a per-IP rate limiter, so the DoS attack described here should no longer work.

Fellow hackers, take note. This is how you solve a problem! dfranke is Pandora, a rat in a maze, Sherlock Holmes, General Sherman, William Randolph Hearst, and your father all wrapped in one.

Like Pandora, he is so curious, he has to check this out.

Like a rat in a maze, he keeps going looking for the clear path.

Like Sherlock Holmes, he applies logic to determine the next step.

Like General Sherman, he keeps marching, building tools along the way as he needs them.

Like William Randolph Hearst, he defines the landscape. ("You provide the pictures, I'll provide the war.") "so I decided on a more proactive approach: crash it!" (hilarious)

And like any parent, he didn't quit until his baby walked.

Thank you, Daniel. I sure hope you've found a way to channel that talent in your day job.

Thanks, I think that's the most elaborate compliment I've ever gotten.

Also, like Brad Pitt, you have beautiful, silky, manageable hair.

What algorithm did you use to extract that conclusion from this data? I probably have a few customers for you.

You're just making it harder for me to decide which #1 crypto post for today is my favorite.

Your welcome :)

I knew from previous discussions with you that you did work at this level, but this is the first time I had actually seen for myself. It read almost like a mystery and I found myself guessing what you'd do next. So I thought I'd pay a quick tribute with what came into my mind first.

AFAIC, you don't need a proposal or resume any more. Just email any prospect or prospective employer this link. If they don't see what you can do from this, you probably don't want to work for them anyway.

Like Pandora, he is so curious, he has to check this out.

Slight error, Pandora was a female in greek mythology (well, the first female to be more precise)

most people don't know this and in the interest of knowledge I thought I'd point it out:

Pandora was the wife of Prometheus (the Titan of Knowledge) who made all the animals and humanity (last of all and out of the only material he had left - clay). In order to give them life, he stole fire from the gods and gave it to his creations, for which he was bitterly punised. In one telling he was tied to a mountain and his heart/liver was eaten by an eagle/vulture and grew back every day. HOWEVER, in another telling, his punishment was PANDORA and her famous box.... I just always found that interesting, vivisection every day for eternity, or a curious wife... fine lines =P

And in the other telling, the box wasn't actually a box, but was a jar. It was incorrectly translated by someone (I forget their name, I should go look it up) and has stuck ever since.

EDIT - just looked it up, it was incorrectly translated by Erasmus of Rotterdam when he first translated Hesiod's tale into Latin from Greek.

I think the "he" is dfranke in this context.

Yeah, but I looked through the rest of the list and the rest of the examples could have referred to the male context, so I simply was just making sure.

The sentence was grammatically unambiguous as written.

I believe those 'he's refer to dranke, not Pandora.

Thanks dfranke. All these years, whenever I thought of the true hacker, this is what I pictured at the back of my mind - material complex enough for me to take out my Stats and Liner Algebra books. Every other web hack attempt over the past decade has been XSS, bad passwords, and stupid form submission issues. Frankly, I had given up on the existence of true whitehat hackers till this post. Hats off to you Sir.

Thanks to dfranke for giving us time to release a fix, and in fact writing part of it.

I hope you scoop him up for one of your YC teams.

That hack was damned impressive for breadth and depth of knowledge, pretty rare in my experience.

I imagine Daniel wouldn't have trouble getting a job with a YC company, if he were so inclined. If we were hiring (couple more months...), we'd certainly talk to him.

That makes me somewhat sad. I am a good coder, but... still so much to learn... so much...

It's amazing how it's never the guys with the blogs full of bombastic prose that really blow you away, it's the guys who just engineer something fantastic.

This should be exhibit A, B, and C in the death of the idea of the "rockstar programmer".

Very nicely done. This is precisely the attack that I described in a HN post a month ago. Glad to hear that it would have worked if I'd had time: http://news.ycombinator.com/item?id=596126

Wow. This post has been sitting on my hard drive for a little under a month waiting for arc3's release, and the hack itself was a few evenings' work, so this was simultaneous discovery damn near to the day. I'm not sure who was first; I never saw that comment at the time.

I'm not trying to be first. All the kudos goes to you.

I'm not quibbling over credit; I just think this kind of simultaneity is a fascinating phenomenon. No doubt both of our thought processes got tweaked by the cluster of other security posts that were showing up on HN at the time.

it's actually not that uncommon. I spoke with my patent attorney some time ago and he could attest that very often the exact same inventions are sent to the patent office only days apart. Sometimes even on the same day. Famously Elisha Gray sent in his patent application for the telephone at the same time as Alexander Graham Bell. A letter was sent out to the two parties asking them to defend their application in order to find out who should be awarded the patent. Gray's company didn't find it worth the trouble to respond, so the patent went to Bell.

The point is that people and the ideas they have are influenced by external factors, many of them unconscious. But we will pick up on the same clues, think about the same problems, and maybe come up with the same solutions based on external factors. Nothing can stop an idea whise time has come.

There's also the Calculus and the primacy dispute between Leibnitz and Newton. The dispute itself isn't that interesting, but that something so complicated was created in the background of natural philosophy by two different people.

(I've recently read The Baroque Cycle again, does it show?)

Another popular theory is that Bell flat-out stole the idea from Gray.

(see The Telephone Gambit: Chasing Alexander Graham Bell's Secret, http://www.amazon.com/gp/product/0393062066?ie=UTF8&tag=... )

Yeah. That is interesting. Pity I can't reconstruct my thought processes. I know that I'd been poking around inside the news.arc because I was looking for vulnerabilities because of my recent posts about 37 Signals and password hashing.

I wonder what set you off.

I do like the fact that here on HN there were at least two of us who thought of this way of hacking the site, one actually made it happen. It's a high quality community.

It's worth noting that reading /dev/urandom isn't exactly cheap.

Probably fine for generating session IDs in most circumstances, but if you're using it in another situation (e.g. shuffling something), you might find the performance is terrible.

But it is still better then reading from /dev/random

I've always been a dfranke fan. Ever since reading his "Code free or die()" essay. He used "beg the question" correctly. I got misty eyed. That's good peeps.

And, oh look, he's also a fantastic hacker.

Thanks for taking the time to write this out. This is the kind of hack I enjoy most. Watching the combination of obscure facts and astute observations come together into a coherent and powerful whole is a pleasure.

Very impressive. Are a you a professional security researcher or just a programmer who tilts that way?

If he's not a professional security researcher, it's obviously because he doesn't want to be one. =)

The qualities that make a great developer and a great professional security researcher are about the same.

Both involve taking hardware, a programming language, an API, an application or something else that someone in the computing industry had one purpose for, and bending it in a way to produce a rather unexpected, very unique result.

A great developer builds Web 2.0 with something as novel and limited as JavaScript and HTML.

A great security researcher made your toaster catch fire from a different continent (didn't they always say our toasters would be internet connected?).

I, personally, think the latter is more fun provided it doesn't cause any actual damage (and I think that was demonstrated ... a whitehat makes his own toaster catch fire and sometimes tells everyone who has that model of toaster how to fix it).

Nicely done.

Developers think of ways to make things.

Security guys think of ways to break things.

That's the main difference.

Security guys know how things are made thus how to break them. Thinking as a security guy can help developers to make things that are difficult to break.

> This, obviously, is not a cryptographically strong PRNG.

Why is this obvious?

The generator used is L'Ecuyer's MRG32k3a : http://www.iro.umontreal.ca/~lecuyer/myftp/papers/streams00s...

Is there a known weakness?

It's not really a question of known weaknesses because it's not designed to be secure in the first place; it only performs a single round. Its purpose is to be fast and to be "random enough" for scientific applications.

Nicely done, and excellent explanation. Kudos.

This reminds me of the exploits I used to read in phrack back when I was still in college. There should be a book that collects great hacks like this one.

I love smart people.

(plus having the ethics to actually do no evil - great combination)

I wonder if ethics is just economic thinking with a low tolerance for certain types of risk, or whether it's a real conviction/prime directive.

Off topic yes, I cruised through his profile to see what else he was involved with or does and found this small piece he wrote that I found to quite good. I would recommend checking it out.


I enjoyed that essay, although he never cleared up my confusion as to what he meant by libertarian. I generally use it as a synonym for anarchist, but he sounds (and I could be wrong) more like a right libertarian. Is this right?

You Spanish? In US English at the very least libertarian maps to anarcho-capitalist/classical relatively closely. Anarchist is almost always a synonym for anarcho-syndicalist.

American. Are you saying Daniel Franke is an anarcho-capitalist?

I am.

Great work and writeup. I think my favorite part of the whole thing is that you clearly started with the goal (or somewhere early on decided that was the goal) of finding a bug and kept going until you'd landed it.

That's the difference between a true hacker at heart and someone who just stumbles across something. Tenacity.

This is pretty interesting stuff, but man it would have been nice if you were able to post it somewhere with actual HTML formatting. The gray on gray is tough after a full screen or two.

Thanks for posting that. I'll keep it around.

he's DoSing our retinas.

Gray on gray is awful. Does only one person like it? :-) Here's a fix for Firefox that doesn't need the overhead of Greasemonkey.


Thanks. I now use the following and my HN headaches are gone:

  @-moz-document url-prefix(http://news.ycombinator.com/) {
     td { color: #000000 !important; }
     td.title a:visited { text-decoration: line-through !important; }
     td.subtext { padding-bottom:2em !important; }

Ctrl-A, Ctrl-C, Alt-Tab, Ctrl-V

Yeah, that's what I do right now. I'm just saying after a certain point, a post becomes a blog posting.

Wow. I just realized you were the same person I sat next to in CIS 3020 at UF - what a terrible waste of a class. You switched my keyboard to Dvorak :(

Nice to see you're doing well, I recently joined a startup myself.

>"Nonetheless, online brute force would still be manageable. If each page view produces an average of 50 candidates, and one in every thousand page views is a login (this might be slightly optimistic), that's 50,000 attempts necessary in order to find a working login. HN gets about 500,000 hits on a busy day, so this could be done in a day or two while likely staying under the radar."

You would have a fnid that is in the cookie hash table, yet you still would not know to which username it is mapped to, correct?

I was looking for login cookies, not fnids. Fnids are worthless since there's already code that checks that the user who called the closure is the same as the user for whom it was created.

My mistake. So you can get a valid login, but you can't know whom you'll be login in as, that is without doing some social engineering like with the irc example. Impressive hack.

i think if you get a valid login cookie, and use it, it will tell you what account you have in the top right.

Well obviously, but you couldn't do a brute force attack like this to a specific account.

I believe the implication is that you'd have a sessionid. Effectively, the username and password rolled into one unique number, stored in the cookie.

I think by 'specific account' he means 'chosen account', in which case he'd be correct without more targeted social engineering.

Nice and thorough explanation but most of all you seemed to handle this gracefully. Well done.

Aside from being a good hack, I really enjoyed reading your write-up!

Neat. This reminds of the attack on the SSL stack in an early version of Netscape.

Who says open source code is less secure?

Isn't it cool that with free software your users can have the rush of the hack and demonstrate the fix without having to screw your production server?

I can't believe what I'm reading: You are superman. You are my hero. I want to be your friend. I WANT TO HIRE YOU!

Let me tell you something. If this poor guy gets hired by one of you, he'll probably end up working with a retarded person and reading cookies for the rest of his life.

To the hacker: Keep having fun mate, but don't allow your time to be cluttered up by things that don't make a difference, like a cookie.


This is the most detailed and well written write-up of a complex hack I've ever read. I was an absolute joy to read.

Thanks for sharing it with everyone!

Wow I must say I am impressed Kudos fine sir!

If only "What is the story with the most points?" had been asked a few weeks later than it was:


Previous record holder with 530 points:


awesome! :)

Really? Downmoded because of the comment "awesome! :)"? Pretty harsh.

Brilliant sir! I'm your slave.

sudo make me a sandwich.


what a job, what a feedback, what a community!!!! no awesomenes but more warm warm warm!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact