
Google Bug Bounty – The $5k Error Page - Artemis2
https://slashcrypto.org/2017/05/17/5k_Error_Page/
======
ChuckMcM
Nice catch. A long time ago the services on the backend were killed by a
special URL. And someone found it, and it wasn't filtered by the front end.
And of course someone tried to use it, but it never returns since it kills the
service, but their client retried ... it was a lot of "what the heck is
happening" going on until SRE figured it out and then they immediately patched
the front end and the anomalies stopped. It is too bad the person who caused
it didn't file for a bug bounty like this person did, they probably would have
had something to show for their efforts besides "hey look at this funny thing
you can do, oh wait it doesn't do it any more."

~~~
carvalho
On Quora someone asked what the longest search query time was. I was able to
craft a query that took multiple seconds to complete. It used wildcards and
undocumented iteration allowing one to stuff thausands of queries into a
single query. Turns out it is someone's job to measure result response times,
and he/she came into the thread to kindly ask us to stop messing up their
statistics.

~~~
logicallee
It's hard to believe, but years ago, back when Google had what was called
"stop words" (like 'the', that it ordinarily ignored) I was able to make
Google perform a search that took over 30 seconds.

The reason stop words take such a long time is that millions of sites have
words like "the" on them, so doing a join on all those simply takes a long
time.

My method to find a long string consisting entirely of stop words, was to just
download a project gutenberg of the complete works of shakespeare, and find
the longest string consisting of just stop words in there, then search for it
as a literal quote.

The longest one I found was: "From what it is to a".

Let me see how long Google takes to do it now :)

2.04 seconds! Nice :) -
[http://i.imgur.com/IhPTpr6.png](http://i.imgur.com/IhPTpr6.png)

that took 30+ seconds 'back in the day'.

~~~
sova
Is it really technically correct to say that Google was performing web-wide
joins on data? Isn't it all about clever indexing?

~~~
logicallee
There's nothing to index. How could it have found my Shakespeare quote via an
index? It consisted entirely of words 'from what it is to a' but produced only
the Shakespeare quote. I don't see how it could have indexed anything.... it
must have done a join. (Which makes sense given the 30+ seconds I had to sit
and wait before it returned its answer, while also reporting the time it took
to produce it. What else could it have been doing?)

By the way I believe I wanted to know whether it would return the Shakespeare
quote at all.

If you mean that it might have cached the results of the query, I doubt anyone
else queried that exact phrase, other than me.

~~~
andrewstuart2
Google does index pages, in the database sense. An index in the database sense
is nothing more than reorganizing data (or subsets of data) into structures
optimized for searching and seeking, rather than full scans.

I'm guessing you're most familiar with btree indexes as present and default in
many SQL solutions, which are good for quickly answering exact, greater/less
matches. There are dozens of data structures useful for indexing, some of
which are built to index full text documents. For an example, check out the
gin and gist indexes in Postgres [1].

It's my understanding that database indexing and index compression was a
primary differentiator Google excelled at from the beginning. They could beat
others at fractions of the typical cost because they didn't need data centers
to store and query huge quantities of documents.

Seriously, there's no way even Google could intersect the sets of all crawled
web documents containing those individual words in 30 seconds, much less two
seconds.

[1] [https://www.postgresql.org/docs/current/static/textsearch-
in...](https://www.postgresql.org/docs/current/static/textsearch-indexes.html)

~~~
logicallee
>Seriously, there's no way even Google could intersect the sets of all crawled
web documents containing those individual words in 30 seconds, much less two
seconds.

I believe you're mistaken. What I've heard is that for every word, Google has
a list of every web site that contains that word - they've flipped the
database. So, I believe, if you search for (without quotes) neanderthal violet
narwhal obsequious tandem then -- and I just did this query, which took 0.56
seconds, but decided to remove some of the words, so it can get it me results.
When I did plus signs, making my query +neanderthal +violet +narwhal
+obsequious +tandem it said it worked 0.7 seconds to determine that in all of
the entirety of the Internet, there is not a single document that has those 5
words on it.

How do you think it determines in 700 ms that all of the sites it has indexed
on all of the Internet does not contain those 5 words anywhere on it?

The answer is that it has a rather short list of sites that contain the word
narwhal, which it then intersects with the somewhat larger list of sites that
contain obsequious and so on. 700 seconds is plenty fast when you take that
approach.

so, this explains why joining stop words (which consist of billions of pages,
each) takes so _very_ long.

using stop words it is easy to make queries that take one or two seconds each.

------
idonotknowwhy
I found a bug in wickr where I can re-read "deleted" messages. I submitted it,
answered their teams questions about reproducing it. A couple of weeks later,
they said they can't fix it and didn't pay me :(

I got all my wickr contacts to switch to signal, which is much less buggy...

~~~
developer2
That bug is extremely common, and the source is always the use of soft-deletes
in the database. When you view the _list of items_ (ex: inbox), the database
query includes a "WHERE deleted = false" to exclude rows which have been soft-
deleted. When viewing a _single item_ (ex: message) the URL contains a unique
identifier, whether an auto-increment integer, UID, etc. The query used to
load one item is "WHERE id = :id" instead of the correct "WHERE id = :id AND
deleted = false".

Managing soft-deletes on a database table requires an attention to detail,
with _every single query_ ever touching that table, that many developers lack
the discipline to handle. Discipline aside, it's difficult for every developer
on a team to remember which tables use soft-delete, and when checking that
flag is or is not necessary. Finally, ORM abstractions often automate soft-
delete in such a way that makes it exhausting for developers to validate every
query. I've seen this bug over and over again at every company I've worked
for. Happens so often it's impossible to keep count.

~~~
elmigranto
> Managing soft-deletes on a database table requires an attention to detail.

> Discipline aside, it's difficult for every developer on a team to remember
> which tables use soft-delete, and when checking that flag is or is not
> necessary.

That's the case where instead of "try harder not to make mistakes", you design
a system so it is not possible to make them. One way would be to rename
original table `raw_messages` and `create view messages as select * from
raw_messages where not deleted`.

~~~
endgame
One of the problems with ORMs is that because it lets people forget about the
annoying details of their databases, it also makes the forget the useful
details of their databases.

~~~
developer2
Damn you! I wrote a bloody _essay_ in a reply[1] to explain, in superfluous
detail, what you summarized in one sentence. Anyone with basic knowledge of
the topic would know what you mean. I need to figure out this magic people
like you possess. I'm tired of rambling, when nobody will read it. Thank you
for the incentive to improve.

[1]
[https://news.ycombinator.com/item?id=14374031](https://news.ycombinator.com/item?id=14374031)

~~~
endgame
That's the kindest spontaneous compliment I've received in a while. Thank you.
But: while the pithy comment might farm more imaginary internet points, the
essay may actually teach a lesson to the person who doesn't yet get it.

As for writing: it's not magic, but for me it's not consciously applied
processes either. If I had to guess how my earlier comment came about, I'd
suggest something like this as a generative process:

1\. Find two effects with a common cause (provided upthread). 2\. State each
effect, sharing words and rhythm to bring out contrast. 3\. Omit needless
words. (Thanks, Strunk/White!)

HTH.

------
ArlenBales
> _10 /02/2017 – Google already fixed the issue but forgot to tell me … I
> contacted them asking for an update_

> _19 /02/2017 – Got a response, they implemented a short-term fix and forgot
> to sent my report to the VRP panel …_

I hope Google forgetting to follow up on bug bounties and needing to be
reminded isn't a common occurrence.

~~~
jcims
Having worked on a large bounty program myself, and having at least one thing
blow up because I dropped the ball on a response, I'll just say that the
front-end aspect of it can be extremely chaotic. This guy seems like he's
pretty polite and patient, which you generally try to reward with a rapid
response and high touch, but sometimes you can get overwhelmed with a burst of
reports, distracted by problematic reporters and bogged down by working the
bug through the pipeline.

There are systems and processes to help with all of this of course, but at the
end of the day it's still a pretty tricky job to get perfect all the time.

~~~
daddyo
What's the general signal to noise ratio for bug reports?

~~~
dsacco
About 10:1 noise:signal.

This comes from a variety of experiences: I used to manage a bug bounty for a
mid-size company on Bugcrowd; in 2014 I surveyed people managing a bunch of
programs across different sizes; I've participated in bug bounty programs for
companies of different sizes.

The more you offer for rewards and the more recognizable your company name,
the more you will be spammed by people submitting reports like (I kid you
not): "You have the OPTIONS method allowed on your site this is really
serious." The last time I looked at the numbers, Google had over 80,000 bug
bounty reports per year, with about 10% of them being valid and maybe another
order of magnitude being high severity (I'm fuzzy on the last bit). It's
probably over 100,000 per year at this point. It's not uncommon for
recognizable but smaller companies to receive one or more per day.

I'm aware of full-time security engineers at Facebook and Google who do almost
nothing but respond to bug bounty reports. It's a lot like resumes - people
who have essentially no qualifications, experience or (most importantly) a
real vulnerability finding will nevertheless spam boilerplate bug reports to
as many companies as they can. Take a look at the list of exclusions on a
given program - you'll see that many of them explicitly call out common
invalid findings that are so ridiculous it's kafkaesque.

HackerOne and Bugcrowd provide a lot of technical sophistication to prime
companies for success, but there is an organizational component that is very
difficult. If your program is very active, it requires dedication to tune it
so you're not flushing engineer-hours away responding to nonsense. This is not
to say they're bad - quite the opposite, I think they're fantastic. But I
generally recommend smaller companies set up a vulnerability disclosure
program through a solid third party, and do so without a monetary reward until
they can commit to dealing with a reasonable deluge of reports.

~~~
thaumasiotes
My favorite bug bounty report so far read, in its entirety, "try it ASAP".

~~~
arkadiyt
I've received reports for things like "source code disclosure" where they link
to our jQuery.

~~~
illumin8
LOL - I'd like to report that I was able to download the entire source code of
your website by right-clicking and selecting "View Page Source..."

~~~
thaumasiotes
If only that were true... modern web pages frequently have basically nothing
of any value in the page source; it's all dynamically loaded.

------
Macuyiko
So I was thinking recently... with Google (amongst others, of course)
themselves pushing towards AI applications, it seems to me that many of these
less-advanced* bounty hunts might perhaps be able to be automated with a
fuzzer+scraper+AI based approach. The fact that bug bounties are still being
awarded does suggest that this is not that trivial, however, but might still
be fun to explore nonetheless. I.e. can one train an agent that goes off and
tries this sort of things autonomously? Might be fun to translate the HTTP
intrusion domain into a deep learning architecture.

Similar things are being applied on the "defensive" side of things already
anyway (i.e. Iranian, Turkish, Chinese firewall systems using machine learning
to identify and block new patterns), so why not apply this on the offensive
side.

*: Not to demean the author in any way; I understand that putting the time in to explore these things is easier said than done in hindsight.

~~~
komali2
I'm similarly surprised we haven't heard of a AI augmented fuzzer that's been
unleashed on random domains to just "try shit out." Seems like a good way to
find weird little bugs. Then again, the scope of the "problem" is so massive,
and the "rewards" (shit to flag as "yea check this out more") so vague, I
don't even know how you'd begin.

~~~
em3rgent0rdr
If the good people don't do it soon, the bad people will...

~~~
TeMPOraL
Or the curious one. Just make a point&click version of such vulnerability
scanner and post in on Reddit; you'll have half of the Internet scanned in no
time.

------
rprime
I discovered the same error/bug a few weeks ago when a co-worker linked "this
weird page" to me, I just looked around and thought it's pretty cool too see
that part of Google and didn't thought too much of it, closed the tab and went
back to my Terminal. :)

~~~
terminalcommand
I am a bit jealous :).

I also did a subdomain search on google a few weeks ago. I stumbled upon a lot
of login sites.

A subdomain search leaded to 95 subdomains under corp.google.com.

There is some strange javascript in those pages, there is a function called
riskMi.

I don't want to get sucked into it, I'm also closing the tab and going back to
my terminal :).

~~~
throw9912
How did you subdomain search? Was it a brute force / dict search?

~~~
schwag09
Shameless self-plug: You can use Fierce! A DNS reconnaissance tool -
[https://github.com/mschwager/fierce](https://github.com/mschwager/fierce)

------
arnioxux
In at least two other companies I've worked at we also use query params to
enable debug information on live production sites. At one of those companies
the only requirement was that you be on a corporate ip address but it actually
still works if you're on our guest wifi.

------
carvalho
Good catch! Also studied Google's 404 pages. Seems like they have unified all
but a few of them. One of them I found was vulnerable to old utf-7 injection
(specified customizable page title before character encoding) and another was
vulnerable to XSS. Got a bounty for the XSS one, the utf-7 one targeted too
old browsers, out of scope for the program (I do wonder how many IE6 users
Google sees).

~~~
joatmon-snoo
It's a very low percentage, somewhere in the low single digits (if not an even
lower order of magnitude). Still high enough for it to be worth paying
engineers to maintain the backwards compatibility :)

------
komali2
Offtopic: What's with the hyper narrow width on this page? Looks like this on
a 1440p monitor (ubuntu, chrome)
[http://i.imgur.com/m9YWcNj.png](http://i.imgur.com/m9YWcNj.png)

~~~
microcolonel
I prefer narrow columns for reading personally. Snap the article window to the
side and scale it down for maximum enjoyment.

~~~
lmm
You can make any site have a narrow column like that - the site should give
those of us who prefer wider text a way to get that too.

------
netheril96
Such a refreshing story after countless of security researchers get threatened
or sued when they report security vulnerabilities to the company that should
have thanked them instead.

~~~
lucb1e
Haven't read one of those in years. Did one come by on HN recently?

~~~
frenchie14
This one comes to mind:
[https://news.ycombinator.com/item?id=14166966](https://news.ycombinator.com/item?id=14166966)

> Raneri questioned my motivation and I said that I want to give the vendor
> ample time to resolve the issue and then I want to publish academically. He
> was very threatened by this and made thinly veiled threats that the FBI or
> other institutions would "protect him". Then he continued with statements
> including "we want to hire you but you must sign this NDA first." He also
> recommended that I only make disclosure through FINRA, SDI, NCTFA and other
> private fraud threat sharing organizations for financial institutions.

------
nickcw
I found a bug in Go which turned into this CVE

[https://www.cvedetails.com/cve/CVE-2015-8618/](https://www.cvedetails.com/cve/CVE-2015-8618/)

I applied for a bug bounty, but alas was turned down as Go isn't a Google
service and it wasn't in scope for the Patch Reward Program.

I did get into the hall of fame though!

------
louprado
Here's a link to the Google Security Rewards Program

[https://www.google.com/about/appsecurity/programs-
home/](https://www.google.com/about/appsecurity/programs-home/)

------
GauntletWizard
As a Xoogler who misses the debug stack, this was some fun nostalgia. Good
catch!

~~~
Artemis2
Can you tell what SFFE means? GFE is the Google Front End but I can't find
much about SFFE.

~~~
slashcrypto
Would be very interesting to know what SFFE means ;)

~~~
CiPHPerCoder
Static File Front-End?

~~~
puzzle
The hostname is static.corp and the service is static-on-bigtable, so...

------
roemerb
Am I the only person who had to enlarge the page to read the article? Nice
catch though.

------
dmead
thefacebook.com redirected to facebook.com/intern/vod up until last week.

------
ncal
"forgot"

~~~
Buge
What do you think the motivation would be to intentionally "forget"? To save
money? If that's the motivation why not pay the reporter $500 instead of $5000
for a small info leak. It's illogical that they would intentionally "forget"
then pay such a high bounty.

------
ensiferum
I'm surprised that anyone at the big Corp actually bothered to even reply to
this guy reporting the bug much less actually give him a bounty!

~~~
nodesocket
Where is this resentment and skepticism coming from? The facts say otherwise.
Google is known to be receptive to bounties and payout.

~~~
kristianp
This story from yesterday has raised some resentment about google's support:

A startup’s Firebase bill suddenly increased from $25 to $1750 per month
[https://news.ycombinator.com/item?id=14356409](https://news.ycombinator.com/item?id=14356409)

~~~
joatmon-snoo
Their customer/enterprise support is notoriously sketchy, at least from the
outside (a la HN) looking in.

That's not the same at all as their bug bounty program, which is generally one
of the best out there.

