
Is Bing Censoring Questions About Microsoft? - chanux
http://www.chicagostyleseo.com/2009/06/is-bing-censoring-questions-about-microsoft/
======
KirinDave
Man, people are really eager to espouse malice to Microsoft. The author's
posit is probably not the case. Let's go over 2 things the external viewer
should understand about Bing:

 _1._ It represents a major technology refresh. As such, there are bound to be
rough spots as the search engineers fine-tune their new index and ranking.
Same goes for the sub-sections of the site (like the reference search, video
search, image search), most of them have not rolled out all their features
just yet, and are still improving results.

 _2._ One area where Google retains an advantage is with very long query
strings. Some sub-components of Bing (e.g., reference) can handle long query
strings with aplomb, but in general keyword search engines suck with long
queries. Google has been working on this steadily for years, and Bing still
has a ways to go before matching Google's performance there.

But then again, most few engines answer this query the way Google does, so I
wonder if maybe Google has some tuning for this specific kind of query in
their ranker.

Check out the yahoo results:
[http://idisk.me.com/dfayram/Public/Pictures/Skitch/yahoo-
res...](http://idisk.me.com/dfayram/Public/Pictures/Skitch/yahoo-
results-20090808-081257.jpg)

Check out the Altavista (yeah, they're still up!) results:
[http://idisk.me.com/dfayram/Public/Pictures/Skitch/altavista...](http://idisk.me.com/dfayram/Public/Pictures/Skitch/altavista-20090808-081425.jpg)

Check out the Ask.com results:
[http://idisk.me.com/dfayram/Public/Pictures/Skitch/ask-20090...](http://idisk.me.com/dfayram/Public/Pictures/Skitch/ask-20090808-081533.jpg)

Only Ask.com actually pulls back that page. And Ask.com's early business model
was devoted to english-ish question queries.

People need to understand that search engine ranking is a problem harder than
they understand. Unless you're working on it _right now_ , you probably have
no clue the huge amount of subtle thought, heuristics, and outright fudge
factors that go into making a good, modern ranking function. Things which were
reasonable 2 years ago are now insufficient. This is not a solved problem, and
no one really shares algorithms to improve the performance of all their
competitors.

~~~
jacquesm
The basics of search engine ranking are so well understood that if your engine
does not pull up results comparable or better than your competition that you
have a serious algorithmic problem. The results, after all are a given. The
method you use to arrive at them creates some variation but does not magically
wipe out certain results completely and replaces them with spam pages. (unless
of course your algorithm is seriously broken).

I dare to say the above because I built a proof-of-concept engine about 3
years ago, it took several months and was eventually abandoned because I
foresaw that the amount of funding needed to do all the crawling (bandwidth
costs) and to scale up the design to hold billions of pages in stead of 100's
of millions of pages (pretty good for an amateur effort) was beyond my ability
to raise.

The quality of the results had little to do with it, the differences when
comparing with google were mostly in how the results were ranked, with google
outperforming my little toy considerably. But if a page was present and
relevant I invariably found that it was somewhere near the top, but usually
not in the 'right' order.

To miss out on relevant content that had been 'seen' would have taken a
serious effort.

Keep in mind that ranking only is a factor _AFTER_ you have the relevant
results for a query, it is a sorting issue. That does not mean that google or
any other engine actually does the sorting when it displays your search
results page, but effectively that is what is happening, it is just that for
efficiency reasons that the process behind the scenes is setup completely
different. Stuff does not magically disappear because of ranking (unless it
gets pushed beyond page 100 in googles example, but that means there are more
than 1000 links 'relevant' for a given query). You'd have to really mess up to
get two engines rank the same page for the same result near the top 1,000
slots lower in the alternative.

I hope this is all clear, these are difficult concepts, the best way to learn
about this stuff is to implement a toy search engine/crawler combo yourself.
80legs.com makes it a lot easier nowadays than it was in the past.

~~~
KirinDave
Hi. I'm an employee of Microsoft, which is why I have to be so vague about
them. I was an employee of Powerset.com, which implemented a natural language
search engine of non-trivial size. We were recently purchased by Microsoft.
I've worked closely on Powerset's search engine infrastructure and with their
linguistic packages for just about 2 years now, and I've started to understand
some of what Bing is doing in the last 6 months.

The basics of search engine ranking are well understood, but compare those
basics to the results that Google and Bing pull up and you start to see some
serious discrepancies from naive rankers. This is because query type strongly
influences ranking decisions, and there is implicit knowledge that STRONGLY
affects ranking. A trivial example is the geoip data of a querier, which can
be used to aid ranking for queries like "presidential scandals". But trending
news headlines might also be an input to your ranking algorithm. A very good
ranker like what Google and Bing employ is a complex beast with special inputs
heuristics, and secret tricks which make them as good as they are.

I confess, I am predisposed to be very irritated at your post. You made a toy
engine (and no, holding 100's of millions of pages in a keyword index really
isn't a huge deal these days), a toy ranker, and now you're an expert on the
state of the art? But before I can get too angry I have to admit that when I
left mog.com to go to powerset.com, totally ignorant about all but the basics
of search, I too had a similar opinion. I figured we'd just use some variant
of LSA for relevance and be done with it. Boy, was I wrong. So I can't get too
angry at you about it.

~~~
jacquesm
> So I can't get too angry at you about it.

Thank you for that :)

Ok, so I'll take your word for it then, you are obviously the expert in the
field.

I did not mean to step on your toes, I hope that I'm as well informed as you
can be about the subject as an 'outsider', if there is anything that you can
reveal about the real reasons for these discrepancies then consider me all
ears.

I'm not above wanting to learn about this stuff (that's why I'm on HN in the
first place), my perspective to date (based on my own effort and whatever I
could read up on that is publicly accessible) was that the results are not the
hard thing, the spam is where the real problems lie.

Fudge factors does not sound encouraging by the way, possibly you are proving
the original posters point here in some unintended way ;)

EDIT: it would be nice if you could state clearly that you are not aware of
any direct effort on the part of microsoft that influences the search results
in a way that either promotes microsoft and their products and/or changes the
results when they are critical of microsoft, including the blacklisting of
critical pages. I think that would go a very long way to laying these rumours
to rest. It's microsofts trust image that comes to the surface here, and it
seems that that is not very high. Chinese walls between search and the rest of
the company would have been the way to go here.

~~~
KirinDave
Sure.

I am not aware of any direct effort on the part of Microsoft to influence the
search results in a way that deliberately attempts to obscure negative press
about Microsoft's products. I would _not_ be surprised if some security-
related things were in fact concealed, but I can see how that might be
considered unreasonable in some circles. In any event, I'm certainly not aware
of any generic policy in this regard outside of age-related filtering.

Now, I am a low-man on the totem pole working in a satellite office. So I
wouldn't necessarily know. If I did know of such a scenario, it'd be a firing-
level violation of my NDA to talk about it here, but it'd also be a violation
of the ethics guidelines to lie about it publicly, so I probably wouldn't be
posting here at all about this subject if I knew anything like that.

If I did discover Microsoft doing this, I would probably resign. But I don't
believe they're doing it. Microsoft is very serious about this Bing project.
I've met and talked with a lot of the people who manage the product, and
they're serious, talented people who clearly understand (and have directly
said to us) that a search engine is about results and people trusting those
results. It would be incredibly risky to deliberately filter things in Bing
and risk discovery at the formative stage of Bing-as-a-brand's reputation.

PS. "Fudge factors" are just some things like saying, "Wikipedia and c2.com
are awesome, give them a nice boost." It's pretty clear that Google loves
Wikipedia even more than what we know about its ranking algorithm's major
features would suggest. Once upon a time all wikis scored very highly in
google because of the way page rank and link text worked, but they've since
reduced that effect, it seems. Wikipedia's ranking never really went down.

PPS. I really can't talk about specific features of Bing or Powerset's ranking
algorithm. One reason for this is my NDA. The other reason is that they're not
my primary domain of expertise, so I'd feel uncomfortable lecturing about them
instead of their actual architects, who are often unsung heros of a search
team.

~~~
jacquesm
Thanks, that is really appreciated, nice to see you being such an upstanding
guy!

It's funny how absolutely crucial search is and how we depend on it but how
little we actually know about what goes on inside. I think that is part of
what drives these wild goose chases based on limited querying, if there were
more transparency then this would not take hold. At the same time the spammers
would waste no time or effort trying to exploit such knowledge so out of
necessity it needs to be under wraps.

Or maybe a 'many eyeballs' approach here would help too.

------
eli
Which is more likely?

1) Bing engineers came up with a dictionary + algorithm to determine anti-MS
content (as opposed to pro-MS content) and applied to the results....

2) Or bing just gives kinda crappy results for most queries?

------
ErrantX
The writer is a bit unfair on the "is microsoft evil" one - because those are
news results. So assuming Bing gives equal share to the keywords (which is
fair enough, it has no clue MS is the focus of your question) then the news
results will, surely, depend a lot on when the story was published.

In which case the Google story (with 2 keywords in the title!) seems a fair
one ot come top :)]

EDIT: incidentally I get this: <http://screencast.com/t/dlYz0HMM> so Im not
convinced this isn't just waffle.

------
jacquesm
The thing that really surprises me about all this is the surprise. Bing was
not set up with Chinese walls in place (they way it should have been done) and
has been produced by a company that sees every communication with others as a
marketing effort (this may be a good thing, I'm not sure).

They'll do everything they can to portray themselves in a good light.

For instance when you search on bing for microsoft vs stacker the wikipedia
page that specifically adresses the lawsuit is not even in the top 10 search
results, whereas it is arguably the best page on the subject on the web.
Coincidence ? Possible. Probable ? I don't think so.

Bing has its uses, but to get objective information about its owners you'd
have to look elsewhere.

The interesting thing with all this is that people are now so conditioned to
find stuff using search engines that if a search engine does not list a page
it might as well not exist.

~~~
socillion
"Bing" "microsoft vs stacker wikipedia". The first result is a wikipedia page
on disk partitioning. Google the same, you get as the first three results
wikipedia pages on Stac Electronics and MS-DOS, and as the third result this
HN page.

As you can see, Google did what you wanted. However, the litigation was
between Stac Electronics and Microsoft, so now try "microsoft vs stac
wikipedia". Bing: returns wikipedia category "microsoft criticisms" and the
third result is "microsoft litigation". Google: returns the wikipedia pages
"stac electronics" and "microsoft litigation".

The conclusions _I_ draw from this is that Bing and Google just have very
different algorithms, relatively speaking. Wikipedia appears to have a much
lower ranking in Bing than Google, at least in cases where there isn't a page
with a very similar name. Here are two searches to compare:
<http://www.bing.com/search?q=microsoft+vs+stac>
<http://www.google.com/#q=microsoft+vs+stac>

While I understand why people bash MS, as well as Google (I agree with most of
it), I think sometimes its best to take an unbiased look. I cannot see any
attempt by MS to remove useful results; the first one is the original text of
the lawsuit!

~~~
jacquesm
Interesting !

Top relevant results for me on google:

<http://en.wikipedia.org/wiki/Stac_Electronics>
<http://vaxxine.com/lawyers/articles/stac.html>

(query: <http://www.google.com/search?q=microsoft+vs+stacker> )

Bing:

<http://www.vaxxine.com/lawyers/articles/stac.html>

query:

[http://www.bing.com/search?q=microsoft+vs+stacker&go=...](http://www.bing.com/search?q=microsoft+vs+stacker&go=&form=QBLH&filt=all)

I did not enter 'wikipedia' as a criterium.

------
axod
Who cares? :/ who seriously uses bing anyway.

I doubt this is malice, it's just a ridiculously bad search engine which
returns bad results.

------
wmeredith
No. Or at least if they are, this is most certainly not proof. This article is
anecdotal BS.

------
jrockway
Quality blog comments:

 _If I were you, I would (Being that youre a SEO person!!!)

Not make it difficult to understand your writing (First Line!!!!)

Before you complain about BING!!!!!!!!!!!!!!

And get a copy of word from someone !!!!!!!!!!!!!!!_

What?

------
eli
Oh, please. Not buying it. Nothing to see here.

------
smithjchris
I think he's moaning because he's in the SEO business. Basically, his entire
working knowledge is shattered by a viable Google alternative. Now he's got to
understand two ways of working (PageRank vs whatever Bing uses). I think that
has caused some anti-Bing bias.

Try searching for "visual studio crap" and you'll see how unbiased it is.

