
Race conditions on Facebook, DigitalOcean and others (fixed) - franjkovic
http://josipfranjkovic.blogspot.com/2015/04/race-conditions-on-facebook.html
======
ejcx
I actually fixed the issue that was reported to LastPass.

I could be mistaken but I believe he reported the security issue through our
regular support channel which is why it took three days to see (instead of our
security channel). From the time I saw it, I fixed it with the patch going
live within an hour or two.

When I DID see it, tried it myself with a quick shell script that that curled
and backgrounded the same request a bunch of times, I just kind of chuckled.
It was a good bug. Josip is top notch.

~~~
franjkovic
Thanks! I reported the bug to security@ email, and one of your team's members
replied on the same day (January 6th). Either way, good job on fixing this
really fast. I wish more teams are as responsive as yours.

~~~
ejcx
Oh okay I was mistaken then.

I believe the race condition is on the rise in terms of severity and
importance. Developers are aware of common OWASP bugs, but this type of race
condition is often overlooked and developers are going to NEED to be just as
aware of. Way to go.

------
MichaelGG
We should see lots more of these if people embrace eventual consistency
instead of "slow" ACID transactions. And interestingly, the more larger scale
a system, the more likely that globally consistent operations are too
expensive to enable in general, and developers will overlook cases where they
must implement some locking or double checking.

~~~
pyvpx
when did eventual consistency equate to race conditions, or even increased
susceptibility to race conditions? I don't follow. could you explain your
reasoning further?

~~~
MichaelGG
It's probably just an ease-of-use question. The more guarantees your database
can deliver, the easier it is to reason about things and make sure you aren't
being caught on a gotcha.

It's not necessarily different than using a normal RDBMS, right - you could do
a check in SQL outside a transaction and end up writing multiple times. But
with an RDBMS, you can easily solve the situation by turning on a transaction
and leaving no question about things.

This is why things like VoltDB ("NewSQL") are pushing to keep SQL and ACID,
and figure out a way to scale, instead of throwing it all aside and making the
developer deal with consistency issues.

It's not that you can't end up with the same functionality using eventual
consistency, just that it's harder. Just look at Amazon's "apology based
computing" (I think that was the name) and how they structure their systems to
be resilient by being able to resolve multiple conflicting commands in a
proper way (deciding, without communication, which command wins, figuring out
rollbacks, etc.) It's fantastic, and perhaps it's the only feasible way to
operate at their scale. But it's also a hell of a lot more complicated than
"UseTransaction = true".

(So my predictions/guesses: If developers that'd otherwise use a traditional
ACID RDBMS switch to non-ACID (BASE?) systems, they'll end up introducing bugs
due to the shifted responsibility of handling data consistency. And seeing how
big servers are, and even how far sharding can take you with normal RDBMS, the
scale at which people "need" to drop ACID is probably far higher than the
point at which people are dropping it.)

~~~
potatosareok
I've always wondered, (but apparently not enough to figure it out by reading
the Spanner whitepaper), but how do these systems typically handle it?

I guess if you were using an append only log that recorded the exact timestamp
of the transcation, your datastore would eventually reconcile that for example
promo code 1 was applied twice. But what do you do then? Rollback the 2nd
application of promo code and deduct the credit from user account?

Where would the logic for that be programmed?

------
janoelze
appreciating the joke (?) in the comments.
[https://i.imgur.com/zWE5ABQ.png](https://i.imgur.com/zWE5ABQ.png)

------
unclesaamm
Wow, it seems like there is room here for a 3rd party vendor to implement
promo code handling as a service, and to do it right once and for all.

------
d_luaz
No bounty for bug report? Should at least have a nominal fee of $100 (else no
one would bother to report it).

~~~
reagan83
The economics of bug bounty programs could lead to misaligned incentives.
Because the overhead cost to validate and communicate around bug reports isn't
zero, the % of non-bugs submitted could become imbalanced because it is free
to submit.

In most systems the reward is zero, so you can infer if a person has taken the
time to submit a bug report it is because he/she is invested in seeing it
fixed.

Context: I work at a decent sized company in SV on this type of problem.

~~~
nmjohn
So when I find a bug in say Paypal which allows complete account takeover and
could sell it to an organized hacker group for say $100,000 or report it to
Paypal "because I'm invested in seeing it fixed" and receive nothing - that is
only an easy decision for the whitest of white hat hacker.

Properly designed bug bounty programs are a cornerstone to any company who
remotely cares about the security of their product, period.

The idea of misaligned incentives due to poor bug reports being free to submit
is ignorant - and worse toxic, because it sounds so true to an executive who
has no actual understanding of the issue.

A quality bug report should take no more than 1 minute for a reviewer to look
at and know if it's really a bug or not. If it can't, it should be rejected
saying provide more clear details. For example a dom based xss attack could be
reported with just a target URL and it is quite clear what the problem is.
That would take 10 seconds to analyze.

Additionally, most bugs reported to most decent sized companies are reported
by someone who has previously reported a bug to the company before. If someone
is constantly reporting good bugs or the opposite, its quite easy to
prioritize which of those individuals gets their emails read first.

~~~
dsacco
Hi, I'm a security engineer. I work at one of the largest private infosec
firms and I've done research on bug bounties that Google, Facebook, Twitter,
CERT and 50+ other companies participated in.

Now that I've sufficiently named my experience, allow me to give my side:

1\. You will never receive $100,000 for selling a vulnerability in PayPal. You
probably couldn't even find a buyer for it on the "black market." I have
explained why repeatedly on Hacker News before, so I'm just going to link
this: [http://breakingbits.net/2015/04/01/understanding-
vulnerabili...](http://breakingbits.net/2015/04/01/understanding-
vulnerability-half-life/)

2\. Bug bounties are not always a net positive for an organization. They are
also not a cornerstone of good security posture. A foundational focus on
robust software security would start with various other things until the
financials are worked out and there is someone knowledgable to read incoming
reports.

Only 7% of submitted reported to companies for a responsible disclosure
program are valid. This is especially true for paid programs, where the
validity percentage often drops to 3% or 4%. Loads of people who know nothing
about software security try to find bugs, desperate for the gold rush of
bounties they see headlining places like HN. They submit spurious reports and
as a result the signal to noise ratio of responsible disclosure is
fantastically bad. What this means practically is that the average
organization spends between 50 and 300 hours a month investigating incoming
security reports.

You can quickly see how the cost adds up here. I'm not an executive or manager
trying to cut costs - I've managed bug bounties for plenty of startups and
Fortune 500 companies. I've also reported bugs that loads of people tell me I
could have sold for "millions" of dollars - and received nothing for it.

I love bug bounties. I run them, I participate in them. But they can be a
frivolous waste of time for development teams without a solid enough grasp of
security to review incoming reports, and a waste of money in the worst case.

3\. I'm sorry, but you lose credibility by claiming most security reports can
be qualified in a minute or less. You can certainly _throw out_ many in a
similar time frame, perhaps five minutes, but real vulnerabilities? No.

If I report a server-side request forgery in your API that requires a very
specific set of actions to occur in an obscure, undocumented application
function, you will not qualify this quickly. Unless you are literally
verifying a CSRF issue, it is completely unrealistic to assume this.

A race condition will not be qualified in a minute. A buffer overflow will not
be qualified in a minute. Budget an hour per report, and be happy when you
come across the reports that take you a few minutes. XSS and CSRF are
comparatively simple to verify with a _good report_ , yes, but most other
classes aren't.

Let's add to this the folks who can find great vulns but write bad reports. No
exploit code, but he found something real? Good luck verifying. I spoke to a
fellow infosec engineer the other day and he told me he spent an entire
morning out of his work day verifying a report that came in. Not patching or
even triaging mind you - verifying. Most security teams do not have the
olympic level efficiency and skillset diversity that Google's and Facebook's
do - it is unreasonable to assume a report can or even should be verified
quickly.

This is all to say that I believe your outlook is not consistent with reality,
with all due respect. Bug bounties are not a simple decision to make. I've
seen development teams swamped, overwhelmed and jaded from the reports they
receive.

~~~
nmjohn
> 3\. I'm sorry, but you lose credibility by claiming most security reports
> can be qualified in a minute or less. You can certainly throw out many in a
> similar time frame, perhaps five minutes, but real vulnerabilities? No.

That was exactly what I was trying to convey: real vulnerabilities can easily
be separated from non-issues quite quickly because the later mostly entail
things which can be checked in a matter of minutes.

> You will never receive $100,000 for selling a vulnerability in PayPal. You
> probably couldn't even find a buyer for it on the "black market."

$100,000 may have been slightly inflated, but with a bit of creativity it
isn't that hard to believe. Orchestrated correctly one could walk away with a
few million dollars from exploiting such a vulnerability.

> This is all to say that I believe your outlook is not consistent with
> reality, with all due respect. Bug bounties are not a simple decision to
> make. I've seen development teams swamped, overwhelmed and jaded from the
> reports they receive.

Or perhaps you and I simply have experienced different realities - as I have
not seen development teams swamped from them and have seen major security
improvements come about as a direct result to a bug bounty program.

Of the perhaps 15-20 companies (albeit all < $1b market cap) I've
spoken/worked with in regards to bug bounty programs or security in general -
none of them were receiving more than a handful of reports a week which took
up perhaps 2 hours of an engineer's time.

~~~
MichaelGG
>That was exactly what I was trying to convey: real vulnerabilities can easily
be separated from non-issues quite quickly because the later mostly entail
things which can be checked in a matter of minutes.

What about the non-issues that are reported with complicated conditions but
don't actually work? Just because you can throw out the obviously bad items
doesn't mean the rest are real.

>Orchestrated correctly one could walk away with a few million dollars from
exploiting such a vulnerability.

Exploiting it is rather different from selling it, though, right? And since a
vuln in a website can literally be closed immediately, and PayPal's got whole
divisions dedicated to preventing and undoing the damage you can do even with
"account takeover", it'd be rather much a risk to pay someone cash for a
vulnerability. At the first slip, the value drops to $0. Plus all the issues
of verifying the bug and establishing trust for both parties. Seems rather
difficult.

~~~
nmjohn
> What about the non-issues that are reported with complicated conditions but
> don't actually work? Just because you can throw out the obviously bad items
> doesn't mean the rest are real.

Yes. There will be some which don't fit into the overly simplistic categories
I provided.

However in what I've seen the complicated condition requiring reports which
turn out to not actually be bugs are rare enough where they aren't relevant to
the discussion.

> Exploiting it is rather different from selling it, though, right? And since
> a vuln in a website can literally be closed immediately, and PayPal's got
> whole divisions dedicated to preventing and undoing the damage you can do
> even with "account takeover", it'd be rather much a risk to pay someone cash
> for a vulnerability. At the first slip, the value drops to $0. Plus all the
> issues of verifying the bug and establishing trust for both parties. Seems
> rather difficult.

You are hung up on what was an arbitrary example.

My point simply is if the reward for serious vulnerabilities is orders of
magnitude higher if the researchers chooses the black hat instead of the white
one - the overall result is a huge net negative for the world.

~~~
MichaelGG
The arbitrary example is a good one though, because it nicely illustrates why
a bug in a website just isn't worth a whole lot to sell. No matter what the
issue, from a PayPal account issue to a Facebook privacy bypass, the ops teams
are monitoring for this kinda thing and will shut it down quick.

Do you have first hand knowledge of selling such an exploit?

------
Kiro
I'm a novice but would like to know how these issues can arise. What kind of
backend setup is needed for it to be a problem? What is happening when a race
condition occurs in these examples?

~~~
spdy
Its actually quite simple as example for the promo code the code looks like
this:

1\. Code sent.

2\. Check if valid.

3\. Redeem code.

4\. Invalid code.

Now if i send 10 requests at the same time with the same code maybe 4-6 will
hit the code part after 2.

And your window of opportunity is the time it takes to go from 3 to 4.
Sometimes certain tasks are put inside async queue, you have a slight delay to
your database server or you need to wait for db replication to kick in.

Because normally there is no code part to recheck how often this code was
used.

~~~
codenut
Can this issue be prevented if we use the promo code as the table primary key
or document ID?

~~~
jaysh
That won't be enough because the promo codes are shared amongst many users. If
the promo code became the primary key, then only one user would be able to
redeem it.

If you introduced some combination of a user ID and promo code, then it won't
prevent a race of one user firing many queries with different promo codes and
stacking them up. It would, however, fix the original problem.

~~~
some1else
A simple Discount domain model with validations:

    
    
      Class Discount
        belongs_to :promo_code
        belongs_to :customer
        belongs_to :order
    
        validates_presence_of :promo_code, :customer, :order
        validates_associated :promo_code
        validates_uniqueness_of :promo_code_id, :scope => [:customer_id, :order_id]
      end
    

Limiting down to a single Promo-code per order:

    
    
      Class Discount
        # ...
        validates_uniqueness_of :order_id, :scope => :customer_id
      end

~~~
gavingmiller
This right here is the heart of race condition bugs, and is NOT race condition
safe. When running multiple web servers and without a
"validates_uniqueness_of" constraint on your database, multiple requests
hitting multiple different web servers can claim multiple discounts for the
same user. Problem only grows as your number of web servers grow!

------
emmab
It would be cool if there was a browser addon that let you submit a form N
times in parallel.

~~~
ejcx
I do a lot of App Sec related things and I actually use mostly Chrome dev
tools and command line instead of burp and other tools. The way I reproduced
the bug when it was reported was by using the "Copy to curl" feature in
Chrome, and then using it as follows

    
    
        for i in `seq 1 16`;do
            curl.*&               #copied from chrome dev tools. & to background
        done

~~~
bburky
Also, curl gained a --next command line option somewhat recently. It lets you
send off multiple requests in the same curl invocation. These requests will
all be pipelined in the same HTTP connection, which might trigger slightly
different behavior in the website.

I have considered writing a program that will let me send of a bunch of HTTP
requests at once, but wait to close all the connections at the exact same
time. That would probably be the most effective way to trigger race
conditions.

------
andersonmvd
More interesting than the bounty itself is to understand which defense works
best at scale and the nitty gritty details of those kind of attacks.
Intuitively I think that we just need to avoid inconsistencies between the
Time of Check (TOC) and Time of Use (TOU), so veryfing the existence of a
discount coupon while inserting it in one query should do the trick (INSERT
INTO coupons (...) Values (...) WHERE NOT EXISTS (SELECT 1 FROM coupons WHERE
(...)) instead of increasing the time between the TOC/TOU, e.g. one query to
check if the coupon exists and a second one to insert the coupon. Besides it I
am wondering if I am missing something, e.g. is this really a problem limited
to the application layer or are the databases unable to prevent such attacks?
I think I am right regarding the app protection, but let's see what people
have to say :)

~~~
pilif
In many databases, your suggested "where not exists" sub query might not
actually protect you but just make the possible window to hit the race much
smaller. What happens is that your database would evaluate the subquery, the
rest of the where, commit another transaction and then finally run the insert
part of your query.

There are no guarantees in the SQL standard that queries with subqueries
should be atomic.

The only truly safe way to protect yourself is to fix the schema in a way that
you can make use of unique indexes. Those are guaranteed to be unique no
matter what.

~~~
MichaelGG
>only truly safe way

Or put the whole thing in a transaction, right?

~~~
pilif
Not if you don't have a unique index or put the transaction in a different
mode than the default which often is "READ COMMITTED".

You could put the transaction in SERIALIZABLE mode, but that would mean that
your database has a lot of additional locking to do which you might or might
not want to pay the price for:

Your two-part query now block all other transactions from writing to the
table(!) and conversely also has to wait until everybody else has finished
their write operation.

Doing an opportunistic attempt with READ COMMITTED and reacting to the unique
index violation (official SQLSTATE 23505) is probably the better option.

Resist the temptation of READ UNCOMMITED in this case because that might lead
to false-positives as competing transactions might yet be aborted in the
future.

------
inportb
So the review bug was a security issue but the username bug wasn't? I wonder
what else the review bug affected.

~~~
franjkovic
I think they did not reward me because you cannot really hurt anyone by having
multiple usernames.

~~~
joshschreuder
What about squatting on valuable ones? But probably not a big deal unless it
relates to Pages.

------
georgerobinson
Can anyone comment on how the author flooded HTTP requests to the endpoint
URLs? Did he use developer tools in his browser and execute his own
JavaScript, or use CURL in a tight loop with the cookie and CSRF token from
his browser session?

~~~
gislifb
Without knowing exactly how he did it I assume this is possible by doing a
POST with cURL inside a loop or with parallel.

You can then get the exact request by using Chrome developer-tools. (Find the
POST-request in the network-tab, right-click and select copy as cURL)

------
Rafert
I have reported the same issue with Digital Ocean (security) in November 2014,
and they told me I was using the wrong address and that they forwarded it to
the proper team. I triggered it by accident, using the same GitHub code twice,
and I (or the DO staffer) didn't realize it was a race condition. I never
heard back but they let me keep the balance :)

------
numair
I would be really interested to know how various forms of this bug are
resolved. This seems like a problem that, on its surface, seems easy to fix,
but isn't. Especially if you've designed your architecture for real-time-ness
and global redundancy. Google's servers with atomic clocks come to mind...

~~~
ekimekim
cynical answer: I've seen alot of races get "fixed" by adding a sleep() or
similar

less cynical answer: Commonly you already have some kind of means to handle
races - locking, transactions, some other variety of extra check - and the fix
for newly discovered races is "oh, I didn't realise that could happen. _add
lock_ "

~~~
hobarrera
If you get three requests in at the same time, and sleep the tree for N (say,
400) miliseconds they'll all still run concurrently.

Adding a random time to sleep might work, but some requests would run
noticeably slower.

~~~
MichaelGG
Unless the code is doing read-write-read. If you're using a system that
doesn't reflect writes immediately (like Elasticsearch), waiting after the
writes can give time for the system to flush and make the other writes visible
then you can execute rollback logic.

It'd be much better to make sure you're updating the same unique key and/or
use the DB's conflict resolution system.

------
jbkkd
Now that race condition bugs have been widely exposed, I have a feeling we'll
start seeing more of these "attacks" in the near future. They are relatively
easy to execute and don't raise a high suspicion.

------
tomcam
Now please fix race conditions everywhere else, like Baltimore.

------
yesmade
$3k for the facebook review bug. that's a little bit too much

\- update

thanks for the downvotes guys. keep up the good work

~~~
franjkovic
The bounty actually surprised me, too. I expected between $1000-$2000. That is
one of reasons I like reporting bugs to Facebook - they pay really good,
critical bugs are fixed really fast (<1 day).

One time they paid me $5000 for a bug I never could have found, but they did
internally based on my low severity report.
([http://josipfranjkovic.blogspot.com/2013/11/facebook-bug-
bou...](http://josipfranjkovic.blogspot.com/2013/11/facebook-bug-bounty-
secondary-damage.html))

~~~
mwsherman
It’s impressive that they are able to fix them so quickly – one needs to
imagine they get a non-trivial number of reports, and that some majority of
them are junk. They have a good triage + repro + escalation system.

~~~
franjkovic
Facebook puts out stats from their bug bounty program once a year. Most of
bugs are invalid reports - in 2013 they had 14,763 reports, with 687 being
valid.

([https://www.fb.com/818902394790655](https://www.fb.com/818902394790655))

They probably got a couple people working exclusively on bug bounty reports. I
also have to say they did a great job changing communication channels from
emails to tickets which show in /support/, it is way easier now. The downside
is that you must have a Facebook account, not sure if it was needed before the
change.

