
How Search Works - vijaydev
http://www.google.com/insidesearch/howsearchworks/thestory/
======
philsnow
I missed most of the content on this ... page ? Exhibit ? Installation ?
whatever it's called, because it told me to scroll, I did, and I scrolled
through a bunch of what looks like empty space and arrived at the end ("and
that's how search works"). The user is apparently supposed to stop and watch
some animation at certain places, but it's not clear where to stop scrolling.

Perfect example, near the top there's some text about "It's made up of
over[........] 30 TRILLION[.........] INDIVIDUAL PAGES[........] and it's
constantly growing." But there's nothing to indicate that I should stop
somewhere and wait for some more text to show up.

Maybe they should limit how far down you can scroll by setting the height of
some element, and only increase it when the animation is finished.

Edit: the key problem here isn't the "scrolling makes things happen" gimmick
that's popular lately. the problem is that it starts certain animations or
fade-ins some time after I've already skipped past an apparently blank space.

~~~
will_brown
I gave you a +1 because I had the same "scrolling issue".

After your comment I noticed a lot of comments on the same issue, so I decided
to try it again. The second time I noticed that a blue arrow flashes at the
bottom of the screen after all the content has populated, almost promoting you
to scroll down. I suppose most everyone, including me, initially scrolled to
fast to even see the first "arrow/prompt". Despite the discovery of the prompt
feature, some of the issues remain. Example, wondering how far to scroll down
before stopping (maybe pgdn?), and wondering which parts of the "page ?
Exhibit ? Installation ? whatever it's called" are interactive.

>page ? Exhibit ? Installation ? whatever it's called

I too was unsure what to call it, but if they listen to the feedback, I think
"whatever it's called" is really awesome and could be a legitimate substitute
to the ppt platform. At least I would be interested in making a few
presentations with it.

~~~
devcpp
>I too was unsure what to call it, but if they listen to the feedback, I think
"whatever it's called" is really awesome and could be a legitimate substitute
to the ppt platform. At least I would be interested in making a few
presentations with it.

Agreed, I like it. It looks like the stuff that goes on in futuristic movies,
with a lot of things happening on the screen and no one is really looking at
anything particular but it's impressive. This has a potential. Who's up to
make this WIC software?

------
dangrossman
The most interesting thing there is the live view of the most recently deleted
webspam. I wonder what blackhat SEO firms can learn from that to better avoid
the filters.

~~~
gokhan
Exactly. And someone somewhere writing a script to hammer that screenshots to
collect as much as he can.

------
area51mafia
It's nice overall, but the timing for making items appear is a little slow. I
was past most headers by the time they appeared, and I don't think I scroll
too incredibly fast.

~~~
crynix
It might be your machine/browser/os. The same thing happened on my laptop
(OSX/Chromium), but it worked fine on my desktop (Arch/Chromium).

~~~
ScottWhigham
If you read the top voted comment for this post (as I write this), it
describes the same experience though.

------
franze
thx matt and the google search team for doing this. it's nothing new for
technically inclined people, but every little bit helps. helps for what?
teaching people to worry about the right aspects of search and the impact on
their business, instead of worrying about bullshitphrases that were planted in
their head by a SEO agency key account or a blogpost from 2008. so well yes,
thx for doing this. i will send it to my clients (and tell them to click on
the bubbles, even though they don't look clickable)

now an anecdote (because i feel like telling one): this week started for me
with an interview that finally got published
[http://werbeplanung.at/news/marketing/2013/02/interview-
mit-...](http://werbeplanung.at/news/marketing/2013/02/interview-mit-franz-
enzenhofer) (it's german) in that interview i claimed that

* 80% of everything written about SEO and Google is bullshit

* that all the rumors, tipps and trends are actually hurting business

* that we should treat SEO as a numbers based craft of constant optimizations

* instead of the esoteric bullshit art it is currently

* and, if search traffic is important for the success of a business, they must rid themselves of external (agency) dependencies and develop internal structures

nothing to far fetched i think. everybody knows the SEO vertical is full of
bullshit, i just took some time to estimate a number (based on a random sample
of collected blogposts (that at least one person tweeted about))

yeah, i got a lot of angry emails, skype messages, linkedin messages, xing
messages after the interview was published.

most of them mentioned at least one of these words

    
    
      * pagerank
      * whitehat
      * blackhat
      * grayhat
      * linkjuice
      * panda
      * pinguin ...
    

so yeah, thx google for educating people about search. keep up the good work.

~~~
Julianhearn
Your 80% is based on what exactly? A tiny sample size. Please if you don't
have solid data don't quote percentages it just encourages people to spread
the number like it's a fact, which it isn't.

If you read the right sources a majority of seo advice is correct.

Www.seomoz.org
[http://static.googleusercontent.com/external_content/untrust...](http://static.googleusercontent.com/external_content/untrusted_dlcp/www.google.co.uk/en/uk/webmasters/docs/search-
engine-optimization-starter-guide.pdf) Www.inbound.org (homepage stuff that
has been voted up.)

~~~
Inufu
>> If you read the right sources a majority of seo advice is correct.

That's a contradiction. If you have to read the right sources, then by
definition the majority of advice is not correct.

~~~
Julianhearn
Why does it mean the majority of advice is not correct? That is a myth.

~~~
nialo
Because the majority of advice is coming from the _wrong_ sources

~~~
Julianhearn
You say majority like its fact. Its not.

Here is a list the top 100 seo blogs, find the BS in there.
www.branded3.com/seo-blogs/

------
tmoertel
Has anyone deciphered the fat-mustache diagram in the "Query Understanding"
circle? It's in the Algorithms section.

At first I thought it was supposed to represent a Gaussian-like probability
distribution. But when I clicked on it, the resulting animation showed a
series of such distributions getting flattened by some kind of distribution-
flattening hydraulic press. The accompanying caption: "Gets to the deeper
meaning of the words you type."

If I was confused before, now I was completely lost.

How is deeper meaning represented by distribution flattening? I'd think it
would be just the opposite, raising probability mass around the likely
meanings, not spreading it out into a uniform distribution over all meanings.

Baffling.

If anyone has figured it out, please do share.

(Maybe I'm taking the diagrams too seriously.)

EDITED TO ADD: New option: If you don't have any clue what it means either,
come up with an entertaining _yet plausible_ story that fits the hydraulic-
press-vs-mustaches animation and share that story instead.

EDITED TO ADD: Example: At Google’s new eco-friendly data centers, NLP
computations are performed by genetically enhanced inchworms. Difficult
queries, however, can cause the inchworms to get cricks in their backs. In
such cases, Google’s innovative back-massager descends and restores the
inchworms to their preferred position (prone), from which they can return to
their computations with renewed vigor.

~~~
MattSayar
You're taking diagrams too seriously.

But the way I interpreted it was, before, the query was short, scrunched up,
and slightly ambiguous. The algorithm them lengthened it, representing
expanding it to find the deeper meaning.

------
dylangs1030
I don't know what to take from this.

That search is very complex (I knew that, but not with this technical detail).

 _Or_...that Google is trying very hard to maintain user interest with
gimmicky shows of why it's cool and cutting edge and necessary.

Not that Google isn't those things...this just seems like an unnecessary
expenditure of time. We know it's complex Google. Improve some other features
and stop shutting others down instead of making these web 2.0 animations.

~~~
bambax
I too find this completely void of any useful information.

There are many things that are still broken in search; I talk about one
specific experience here:

[http://urgeous.com/p87t3aaa40g-for-some-queries-all-
first-10...](http://urgeous.com/p87t3aaa40g-for-some-queries-all-
first-10-results-on-google-are-spam)

("For some queries, all first 10 results on Google are spam").

------
jojopotato
Interesting that they show the approximate number of searches / second at the
bottom. Is that an otherwise publicly available number?

------
eykanal
I was halfway through before I realized that some of the content was
clickable.

Very nice page, though.

------
JDDunn9
Their characterization of their spam procedures is grossly misleading. They do
not send emails to most people that have been penalized, nor do they give
clear instructions on how people can fix their sites.

Thousands of small sites were killed by Panda for no good reason, and have
little hope of getting their traffic/incomes back. Google's spam policy is
skewed heavily in favor of large sites and their own properties.

~~~
rossjudson
Didn't read that way to me. Doesn't it say that the webmaster tools page is
the primary way to get notifications?

Crap factor = %advertising on page.

------
ywyrd
I keep checking every so often, but searching for "this phrase" or +absolute
+requirement is still broken. Even "Verbatim", isn't. If they can't even get
simple search right, who would trust them with anything more?

~~~
moultano
Do you have some example queries to debug?

~~~
ywyrd
I wish I had saved the results every time over the last few years that Google
showed me a page it claimed had what I was looking for, when neither searching
the visible text or even the source code of the page produced any such string.
I am sure that it's happened to me hundreds of times by now, if not thousands.
For a long time, it was surprising and ranged from annoying to infuriating.
Now I just sigh and accept it as the cost of Googling.

------
aviswanathan
Scrolling is really becoming the new thing in UX design. It's an interesting
contrast to the 'movie-like' flash animations of a few years ago that required
no interaction on behalf of the user.

~~~
largesse
_Scrolling is really becoming the new thing in UX design._

Am I the only one who finds it irritating as hell to scroll when it renders
slow? I don't think this is the end game. There has to be something better.

~~~
lflux
It's not only you, I found this irritating as well.

~~~
xuhu
Since forever, Chrome has been doing scroll at less fps than Firefox, where I
can read comfortably while scrolling.

------
prezjordan
They left out the part where they index your emails and choose items you agree
with over items you don't :)

~~~
hurstdog
I think you're joking, but just for those that don't think that, we don't
actually do that.

~~~
anoncow
I don't remember if hotmail used to run ads.

~~~
snowwrestler
It did and they were display ads. Incredibly distracting.

At one point the "homepage" of Hotmail was a huge ad space, stories from MSN,
and a tiny link to "Inbox."

The new Outlook is so much better. If Hotmail had evolved that way earlier, I
would not have switched to Gmail.

~~~
anoncow
So it is a bit hypocritical of MS talking about ads in gmail. But again where
those ads contextual?

~~~
snowwrestler
The "Scroogled" campaign has nothing to do with products or customers; the
point is to broaden the PR base for Microsoft's ongoing campaign to convince
the feds to initiate anti-trust proceedings against Google. That is why they
hired a political PR executive to create the campaign.

------
Xorlev
38,800 requests/second according to their estimation.

~~~
lucb1e
Seems I'm not the only one who found that interesting :)

------
johnmurch
Is this just PR for Google? Would rather see a more technical approach -
although great for forwarding to clients when asked :)

~~~
ChuckMcM
Apparently, perhaps the 'scroogled' campaign is having an effect.

However it does give a better insight into the challenges of building a search
product. It is a series of really challenging problems. So many people take
search for granted these days.

~~~
johnmurch
Yes and no - 'scroogled' is bring up stuff like - selling ads based on context
- but are you paying $20/year for outlook.com email? Gmail is free and pretty
awesome (haven't gotten spam in years).

------
cryowaffle
Whoa... really, 100 MILLION gigabytes to store "The Index"? Wow. That's big.

~~~
aiiane
aka 95+ petabytes.

~~~
ithkuil
100 million gigabytes = 100 petabytes ~= 88.8 pebibytes

100 million gibibytes ~= 95 pebibytes

~~~
runlevel1
I see the value of this distinction, but I can't shake the feeling that a word
used for years has been co-opted by marketing and replaced with something that
sounds silly when spoken out loud.

------
sytelus
There are some good facts and numbers hidden in rather toy explanation:

1\. Spam detection is automatic

2\. There 6 types of spam

-Unnatural outbound links (link selling)

-Content copy/manufactering

-Keyword stuffing

-Forums/user generated spam

-Parked domains

-Sites hosted on spammy DNS

-Different content humans and bots

-Hacked sites

3\. Google is removing as many as 50K spam sites per month, they get 8K
reconsideration requests

4\. Google's machine learned relevance model may be using about 200 features

------
manojlds
> By the way, in the 47 seconds you've been on this page, approximately
> 1,813,260 searches were performed.

Aren't these just some random numbers that they pull out of the air?

~~~
alok-g
That would be about 38K searches per second. Does this include Google instant
searches?

Google search results show a time value for each search. E.g.: About
2,210,000,000 results (0.12 seconds). Is this time machine time per search?
This number is often around 30 ms, give or take a factor of two. If so, each
machine can handle about 30 searches per second. If so, 38K searches per
second need about 1000 machines. Sounds a bit too low... so my interpretation
must be wrong at least somewhere.

~~~
Geee
You didn't define 'machine'. If the 'machine' is Google's supercomputer grid
cloud cluster, then yes, each search takes 30 ms of machine time.

~~~
alok-g
Is there any publicly known information about what the 30 ms number means (or
alternatively what the machine is)? Given 30 ms number and the number of
searches per second, the number 1000 means something; I just don't know what.

------
aeon10
A beautifully designed page more than anything else

------
lysium
Nice scroll-UI! Took some time to see the clickable items. Interesting bits
about spam pages.

------
moeedm
An awful way to learn anything.

------
state
The better people understand their tools, the more effectively they can use
them.

------
wfunction
"We write programs & formulas to deliver the best results possible."

No kidding.

------
denysonique
Some of the live listed 'spam' pages appear to be genuine to me.

------
joshhart
Answer: It uses a bunch of skip lists.

Source: I do hacking on top of lucene.

------
yarou
vijay: very interesting link. thought it was interesting, despite the obvious
slant.

------
moha24
This is not how search works!!

------
asawant
This is brilliant !!!

------
OGinparadise
"We write programs & formulas to deliver the best results possible"

There's a slight oversight, it should be: "We write programs & formulas to
deliver the most profitable results possible for this quarter"

~~~
moultano
This is completely false. The effect on revenue is not used to make launch
decisions for ranking changes.

~~~
JDDunn9
So when Google's Panda update killed tons of user-generated-content sites like
Mahalo, eHow, HubPages, etc., and greatly improved YouTube's (which is 99%
garbage) rankings, that was pure co-incidence?

What about when Google rolled out universal search only after buying YouTube?

~~~
rossjudson
I don't miss any of the crap content factory sites. Do you?

~~~
JDDunn9
Nope, and I wouldn't miss YouTube's crap factory site from the SERPS either.

