
Google should open source what actually matters: their search ranking algorithm - ivankirigin
http://cdixon.org/2009/12/22/google-should-open-source-what-actually-matters-their-search-ranking-algorithm/
======
showerst
The notion that being 'committed' to open source means that a company should
give away everything of value is ludicrous. Google came up with great software
that makes millions of peoples' lives easier, and they have every right to
keep it secret and profit off of it.

The zealous idea that open source and trade secrets cannot coexist is
detrimental to the open source movement as a whole, in my opinion.

~~~
mattmanser
I think you're missing the point, the article is pretty much a direct response
to:

<http://googleblog.blogspot.com/2009/12/meaning-of-open.html> (submitted here:
<http://news.ycombinator.com/item?id=1008908>)

In it a Google open source zealot rants wildly about openess being ultimate,
apart from search algorithms which should be closed. It also fails to mention
all the times open has failed (communism being one of the more spectacular
failures of open).

A cynic might surmise what they actually mean is content should be open so
they can profit from their closed search algorithm.

~~~
RyanMcGreal
> communism being one of the more spectacular failures of open

Wait, what? Communism is a spectacular failure of centralized planning over
the entire economy. It has nothing to do with open or closed in principle, and
in every implementation it has been aggressively _closed_ in terms of
information flows and decision making.

~~~
jacquesm
> Communism is a spectacular failure of centralized planning over the entire
> economy.

Wow, and here I was thinking it was a form of government.

Communism in principle has absolutely nothing to do with the flow of
information and/or decision making, and its implementations have only very
little to do with its basic ideas.

A _true_ communistic government has never even been tried, we've only seen a
'new boss same as the old boss' approach to communism with one batch of fat
cats replacing another.

~~~
RyanMcGreal
>Wow, and here I was thinking it was a form of government.

Like most ideologies, the term "communism" is gooshy enough to mean _whatever
you want it to mean_ \- but at heart, communism is concerned with the matter
of who owns and controls the means of production. It's a form of government
insofar as the decision on who gets to control the means of production is
ultimately a political decision.

Communists believe the _people_ should own and control the means of production
- but every real-world implementation has used _government_ as a proxy for
_people_.

I suppose it's possible to run a communist society via direct democracy on a
very small, local scale; but good luck trying to scale that up to the national
or even regional level. At some level you need to adopt some kind of more or
less representative government. In the case of communism, the track record is
definitely skewed toward the "less" end of the representation spectrum.

In other words, your claim:

>A _true_ communistic government has never even been tried...

is a straightforward _No True Scotsman_ fallacy. It's no different from "real"
conservatives distancing themselves from the Bush administration because the
implementation of their political system ended up violating most of its
ostensible core values.

If you can't implement an ideology without discarding the very principles that
define the ideology, it's a broken ideology.

~~~
Semiapies
"It's no different from "real" conservatives distancing themselves from the
Bush administration because the implementation of their political system ended
up violating most of its ostensible core values."

To define American-style conservative and liberal philosophy as _what the
Republican and Democratic parties and their politicians do_ is ridiculously
simplistic. The parties are just coalitions of interests and voter
demographics, not philosophies - you might as well talk about what direction a
plate of spaghetti points at.

And as for conservatives' "political system" - _whose_? Do you mean
evangelicals and other theocrats, PNAC neocons, populist teabaggers, or Lew
Rockwell/Ron Paul fans? We're not talking remotely the degree of unity imposed
by the Politburo and Comintern upon the global communist movement when it
comes to either US party. In a bipolar system, both parties must be big tents
or be crushed by the bigger tent. Elected officials are not (and Bush was not)
advancing a single "conservative" or "liberal" ideology; they're trying to
please various power blocs who agree or disagree with them on different
issues.

Bush's failures highlight the failure of particular policies and ideas, but
they don't say much about the ideas of self-described "real conservatives" who
thought Bush was a ridiculously free-spending, anti-federal, protectionist,
power-consolidating, Big Government guy long before Iraq. (No more than
Obama's decisions indict the ideas of supporters who disagree with him on
those decisions, certainly.)

------
swombat
That is a completely ridiculous point to make. Come on, be serious here. What
would be the benefit to anyone except for spammers, if Google open-sourced
their algorithms? It's a lose-lose option. Google loses some money, we lose a
lot of search engine accuracy, and spammers gain new ways to screw around with
search results.

And does anyone gain from it? Not really. At this point, Google's supremacy is
due just as much to their hardware know-how, to their market leader position,
and to their excellent brand, as it is to their search algorithms. It won't
help any competing search engines - just spammers.

~~~
roc
The stated goal isn't to help other search engines; it's to get the community
to help _Google_ improve, for the good of all its users.

The theory goes: if you let more people look at the algorithm, they can
suggest better answers than Google alone, to thwart the spammers who are
gaming the system either way.

But I suspect the real goal has more to do with the deep suspicion that Google
can or does do a significant bit of hand-tweaking. And these people want to
force the algorithm into the open to either end or curtail that.

~~~
jacquesm
I'll believe that when I see open source achieve a conclusive victory against
email spam.

Spam:email = 90% or worse

in search results it is closer to the opposite.

I'm sure once the open source community actually manages to do this and prove
the 'many eyes' doctrine applies to more than just security and bug fixes that
many doors will open.

------
lacker
(I used to work in the Search Quality division at Google.)

Security by obscurity is a bad analogy here. The problem is that Google's
algorithms are not mathematically perfect, in the sense of cryptography.
Instead, they merely statistically tend to find good results. You can't find
bugs just by looking at source code. You have to statistically analyze the
effects of any proposed changes.

------
Mystalic
Google is a business. Their job, legally defined, is to create value (aka
profit) for its shareholders. A more open web has helped Google, but opening
up its search algorithms, a cornerstone to Google's growth, is just ludicrous
and bad business.

Compared to a lot of other companies (Apple, Microsoft, nearly every other
tech giant), they are an open company. It's already great that they realize
there's value in an Open-source browser, and open-source OS, and multiple open
platforms.

~~~
tome
_Their job, legally defined, is to create value (aka profit) for its
shareholders._

This is a meme I see a lot, but never with any evidence. That the board has a
fiduciary duty to the shareholders I can understand. That it would have a
legal obligation to create value, well, I think that would be hard to define,
and I have never seen any evidence for the existence of this obligation.

~~~
mbreese
In the context of a publicly traded company this is the definition of
fiduciary. The company has sold stock to investors with the _sole_ purpose of
increasing the value of the stock. Therefore, it is up to the board to make
decisions that reflect this duty.

You can argue about the timeline (short versus long-term growth), but you
can't say that there isn't a legal obligation to _attempt_ to increase value.

~~~
roc
> "The company has sold stock to investors with the sole purpose of increasing
> the value of the stock."

I know they're not popular, or exciting like a stock value swing, but
dividends still exist, right?

~~~
mbreese
You're right... I should have said something like "increasing shareholder
value though the increase in value of the stock or profit distribution via
dividends".

I actually think that more companies should give dividends.

I can see why companies like to have billion dollar war chests, but at some
point you have to wonder when enough is enough. I mean, even Microsoft was
forced to start paying dividends. Then again, their stock price has been
pretty flat for the past few years, so that might be the only reason to hold
it.

------
rpdillon
This post manages to completely miss the point of the Google letter on
openness.

As some have incorrectly asserted, Google's latest post is _not_ about
"everything open", it is an _internal_ memo to product managers that it
occurred to them would be fit for public consumption.

The central thesis of the letter is that "open" is hard to define (and defined
differently by different people) and yet crucially important. Because it is
hard to define, many people misuse it (OpenDNS comes to mind), and no one can
call them on it because it is a vague term.

Internally, Google employees argue about what open means, and the purpose of
this letter was to clarify what open means to Google. They lay it out in
extremely clear terms, and it is well structured.

This response essay by Dixon misses all that completely - he unilaterally
defines "open" to mean "open source", and then asks for Google to give away
their software, claiming that if they don't, they're hypocrites.

Going back to Google's definition of "open" (which is complex), one
significant aspect they emphasize is the notion of "lock-in", i.e. locking in
users, and locking out competitors. This is one area where Google puts their
money where their mouth is; they let users leave. You can get at your email,
export your Google Reader content, download all your Google Docs, archive your
Picasa pictures, and export all your contacts. That means if a competitor
comes along with something better, you can start using it. Try that with
Facebook.

Dixon acting like that is worthless because it isn't the "open-source" form of
"open" is patently absurd. If this is confusing, I highly recommend reading
Simon Phipps essays on open source to understand what kind of software makes a
good candidate for a business to open source (operating systems, browsers,
compilers...the basics that we all use to compete) and what does not
(grid/cloud computing algorithms, specialized software, and other types of
software that are differentiators). Essentially, the business model is to pool
our collective resources in areas in which we don't want to compete (web
server software, operating systems) and to spend time innovating in areas we
do wish to compete (search algorithms, voice recognition, image analysis,
etc.) Over time, as those specialized areas become more mainstream, they
become better candidates for open-sourcing.

------
mustpax
The analogy to "security through obscurity" is flawed. Open security works
when there's an obvious way a secure system should behave, say, make it
infeasible to open a lock/decrypt data without the correct key.

In Google’s, case the correct relevancy ordering of the web is a huge point of
contention. So they cannot simply open it up hoping to close out all the holes
in the open. It is also hard to distinguish acceptable behavior (linking) from
unacceptable behavior (selling links) except in the aggregate through
heuristics that assume ignorance on the part of the attacker.

In this sense, Google’s system is a lot closer to an intrusion detection
system than a lock or a cipher suite. You want to establish some ground rules
but there’s no added security in declaring what you’re monitoring on your
servers and how.

------
jlees
An interesting thing to think about even though the basic reaction is "wait,
no, that's their industry secret/revenue basis/whatever". Google is _so_
dominant right now that maybe it's going to get to the point where, in the
interests of the open web, they get pressured and lobbied to release their
secret sauce; after all, as swombat points out, it's not just the algorithm by
a long chalk.

It's especially interesting if you consider the origins of PageRank in
academia, freely published and peer-reviewed, and still not that well
understood by laypeople; would the open-sourced magic of Google even be
understandable by that many folk? Those that do understand it could make a
fortune as consultants; at the moment, shady SEO snake-oil salesmen are so
prevalent and 99% of their advice is outdated or plain wrong, something that
definitively calls them out on it could be helpful.

------
btilly
I upvoted the story not because I agree with it, but because it explains a
very successful open sourcing strategy that I wish more people understood.

In general people are willing to spend a certain amount on a _solution_.
Therefore if you can things that make the _complement_ to what you make money
on cheaper, that increases how much you can charge for your piece of the
solution. Open sourcing is a way to make those pieces cheaper.

Google does this. They give away tools and knowledge about how to make
websites. They give away analytics and an A/B testing platform and educate
people about how that makes it easier to produce good websites. This makes
building good websites cheaper, and therefore increases how much people are
willing to spend to _advertise_ those websites.

This isn't a new strategy. In the late 90s when Oracle and IBM got into open
source, this was their strategy. Oracle wanted to get people to stop paying
for Sun hardware so Oracle could negotiate higher prices. (I know more than a
few people who were surprised to find that licenses for Oracle on Linux are
higher than they are on Sun.) IBM was looking to reduce the price for
complements to IBM websphere, and was looking for opportunities to avoid
having customers pay the Microsoft tax.

If you run a company and wish to strategically open source something, you
should think in the same terms. What is the complement to what you are
charging for, and how can you make that cheaper?

------
amix
I would love if they open-sourced their infrastructure tools such as GFS,
BigTable and Map-Reduce tools. Most of these tools are built using open-source
technology such as Linux.

~~~
stanleydrew
Apache's Hadoop (<http://hadoop.apache.org/>) has open-sourced implementations
of those tools.

~~~
iamelgringo
Yahoo was the company that opensourced hadoop, not google.

~~~
stanleydrew
Never said it was Google, just pointing out that open-source implementations
exist.

------
oujheush
I am disturbed by the lack of faith in open source among the readers here. I
mean, I understand some skepticism but this, to me, is pretty extreme IP-
fetishism.

It's perfectly understandable. Lots of money is made in the existing paradigm
and so many of you have incentive to believe it's the most rational system.
Besides, hard to imagine Google's doing something wrong, right?

But the argument that "the only people this will help is spammers"? Really?
Are the only people helped by open cryptographic protocols those who look to
decipher traffic?

"They have every right to keep it secret and profit off of it." Let's allow
this for a moment. We've still got a false dichotomy. Making it public will
not deprive them of their profit. First, incredible marketing coup. Does
anyone else remember how much of Google's initial strength came from its rabid
fans? This would do a lot of re-energize that base.

Second, are there going to be new startups which can effectively compete with
Google, even given its algorithm? How are they going to provide a better
experience, or serve the volume of traffic, or index a larger amount of pages
with the same algorithm with fewer resources?

Third, who says they have to authorize competitors to use it? This is I think
one of the more interesting points. They could release the algorithm but
license it only for non-commercial use for anyone not themselves. Ridiculous?
Because Microsoft would violate such a deal anyhow? Because the algorithm
could simply be reimplemented? Perhaps. But it would be another roadblock.

Fourth, why do we continue to believe that Google's strength comes from
proprietary code? Why do we not recognize that its strength comes from
mindshare, user experience, and quality of execution? None of which would be
negatively affected by releasing the source.

Perhaps Google would face heightened competition as a result of releasing its
algorithm. Perhaps this would outweigh the benefits of doing so. But there
would be significant good which would result, including public feedback which
would make it significantly harder for spammers to be successful.

But I'm just a foaming-at-the-mouth open source radical. And I'm sure history
is on your collective side. If there's anything we've learnt from a hundred
years or so of Computer Science (overestimate if you only count actual
programming, underestimate if you allow Babbage as you should), it's surely
that secrecy is what drives innovation, right?

------
selven
And what is wrong with commoditizing the complement? Google's working hard to
make operating systems free and we'll probably see Microsoft working hard to
open up search engines. End result: both operating systems and search engines
will be open and the consumer benefits.

------
vicaya
I'm an open source advocate (even for search algorithms in general), but
ranking algo/model is inherently adversarial and gameable until true search AI
(always correctly identify good and original content without false positives)
is feasible.

I think it would be a good revenue strategy, if they charge $dollars to
display top 10 positive features and top 10 negative features of any given
query-document pair without any real weights/scores (maybe re-squashed with
some logistic function). The $dollars amount is computed dynamically to offset
the negative impact of the partial reveal of the ranking algo/model. This can
be a tremendously useful tool for troubleshooting search results by site
owners.

------
russell
A more reasonable proposal would be for someone to pull a netflix; $1 million
or even a $1o million prize for a significant advance in search relevancy. $10
million would be cheap for MS, Yahoo, ASK, a VC, or even DARPA.

------
stuff4ben
I don't get it, is the author talking about PageRank?
<http://en.wikipedia.org/wiki/PageRank> What more does he want?

~~~
minsight
The algorithm used by Google today is light-years away from that described in
the PageRank paper. It uses a form of PageRank, but it also uses many other
parameters.

~~~
spf
If it's secret, then how can you be so sure?

My impression is that PageRank, while initially extremely successful at
filtering signal-from-noise, is very susceptible to manipulation. For example
links farms which make unimportant pages seem important. As a result of such
manipulation, Google started modifying their crawling, in order to reduce the
amount of garbage going into PageRank (less garbage in, less garbage out). For
example, they might look at the age and ownership in DNS records, and exclude
domains that look fishy based on some heuristics. The spammers fought back
with more tricks, and the resulting arms race has come to have an acronym:
SEO. But the underlying idea of using an iterative approximation of the
likelihood of encountering a page via a random walk of the web is preserved.
(Some details here: [http://www.miislita.com/information-retrieval-
tutorial/matri...](http://www.miislita.com/information-retrieval-
tutorial/matrix-tutorial-3-eigenvalues-eigenvectors.html))

If that's right (and I could well be wrong... would like to be convinced...),
then I don't see it as a "light-years away" kind of issue, but rather a series
of hacks to stay one step ahead of the spammers. I don't see any particular
value in open sourcing those.

------
xcombinator
Well, I can't understand this article, there is not "secret sauce". The google
pagerank algorithm is open for all to see and understand. I myself made my own
simple implementation.

Everybody could do it. This man is not asking for code, is asking for DATA
only google has. It's like an atheist saying to a catholic what he must do
because he believes. When this man gives something of value to the world,like
google did(I remember the Altavista-Ad-flickering in your eyes days) then he
could demand to be given by others.

~~~
eli
The documented PageRank algorithm doesn't have very much to do with how
results are actually returned from Google today.

------
ez77
As a matter of fact, I think Google represents the ideal business model: one
which should work even in a patents-free, copyright-free world.

~~~
ez77
Let me elaborate. Not even Stallman would call for this... What can of
totalitarian regime is desired, where businesses (even individuals?) are
forced to publish all of their code? Wouldn't it be good enough to do away
with patents and copyright? As long as I manage to keep my stuff private, let
me (and Google) face the advantages and disadvantages.

Some balance in the force, for Yoda's sake.

------
mlLK
<http://www.google.com/support/webmasters/> easily brings down this bogus
claim. I honestly don't think much would change even if they did open-source
their PageRank algorithm because the people willing and able enough to exploit
it already do, and regardless of how much their PageRank increases Google will
send your domain deep into an internet abyss and out of their index.

------
allenp
I thought open source was about software, not about logic (business or
otherwise) - am I missing something here?

------
buster
opening this algorithm would lead to A LOT of new spammers and SEO-crap.

Google, please keep it a secret, PLEASE!

------
bbsabelli
tl;dr? Author confuses source code with revenue. Welcome to the less-than-free
world.

------
eli
Yeah, and where's my pony?

