Google's PageRank patent has expired

bborud · on June 1, 2019

Myth: PageRank was the secret to Google's success

It wasn't really that simple. For a brief while, perhaps it made a difference, but within perhaps a span of 6 months, every decent search engine implemented page rank in one form or another. It is a cute story for the muggles to focus on. In reality search was already then about balancing a large number of signals into a decent ranking formula. It was much, much harder than just applying some magic algorithm and I think the people who built Google search back in those days deserve a lot more credit. But that wasn't really a sexy story, I guess.

To a much greater degree than any algorithm or formula: Google's ability to execute, and to do so in cultural sympathy with the web, was more important. Much has been said about Google's Not Invented Here, but this made all the difference in the early days: you had to get things to where you could iterate and innovate fast.

And I say that admiringly as someone who worked for one of their competitors at the time. I used to be jealous of Google because they were managed by people who were part of the Internet. Our management was alien to the Internet (and our business model was to power search for portals mainly run by horrible, stupid people in suits).

Google was the only search engine that was properly in tune with its audience: focusing on the user.

(Disclosure: I worked for FAST, then Yahoo, then Google until I quit Google in 2009)

retreatguru · on June 1, 2019

What is the best way to learn how to execute at a very high level as Google does?

We are a 25 person tech startup and while our culture is amazing we struggle with execution. This is an ongoing struggle for us; it’s so easy for a young startup to misunderstand the importance of execution.

Is reading books enough, if so, which? Should we hire a COO from a company with a history of excellent execution (how to tell?). Are there courses to take? Or is it just about prioritizing excellent execution with continuous learning?

Some resources that have helped so far: Scaling Up (book) and First Round Review blog.

revvx · on June 1, 2019

IME the thing that matters the most is focus, both on the micro and macro level. It's way too easy to get caught on things that won't make an iota of difference in your future. Focus is the first thing that goes out the window as soon a a company reach just a bit of momentum.

I've seen several startups suffering from not being able to decide what they were. Engineering teams fractured because the company wanted half of them working on their bottom line and the other half working in some offshoot product.

Even gigantic companies need that: notice how people criticize Google for creating and killing way too many products, and at the same time praising their minimalist webpage (since 1998), early GMail, etc. Same with Apple when Jobs returned to Apple and streamlined their product line, etc.

mikekchar · on June 1, 2019

I agree with this but want to add prioritisation is key. As per Fred Brooks, there is no silver bullet. What he meant by that is that you can't get an order of magnitude more work done using a tool or process because in programming, 25% of your time is spent doing analysis and there is no way to reduce that (To me, that's only if you are executing well!). This generalises to a lot of other pursuits.

Prioritisation is about deciding what not to do. Forget the BS story about putting rocks and sand in a jar, where the secret is to put the big rocks in first so that everything fits. That's not how you need to do prioritisation, because order does not appreciable change the amount of time something takes. The secret is to put only the big rocks in. Period. You go 6 times faster because you have 6 things you could do, but you only do one of them.

Now the real kicker is that the only way to determine which one thing to do is to do analysis on all 6. An HN post is not really a reasonable way to describe this. However, consider a requirements discovery to be like a tree. Requirements are discovered at a particular rate, as you work on something. Discovery of one requirement leads to discovery of new requirements. It's a feedback loop. Pruning the tree as early as you can leads to significant gains later on. So while you can't actually get the 6x development time by avoiding 5/6ths of the requirements, you can pretty easily get 2-3x gains.

BTW, for anyone interested in a more rigorous approach, consider taking something like Littlewood's model of defect discovery and assuming that requirements discovery has a similar curve. Littlewood's model is very naive, but I've found that it still has a lot of value. Again, sorry for cryptic hints here, but I don't have time to write a book on it (which it would certainly take...)

retreatguru · on June 2, 2019

The parent comment on focus and this comment on extreme prioritization are so helpful.

Recently we are focused on just one objective and key result - it took years to get that far - we used to have so many. But even now with just one objective we still picked 25 initiatives to attempt to reach our goal. In retrospect it was an obvious fail because we only executed on a few well. We did some initial analysis but considering your comment I think we could have done much more analysis and cut much deeper and picked just a few or even just one. This is radical thinking!

Thank you for getting deeper into this. Do you have any other hints on where I could learn more about the approach you are describing?

baccheion · on June 1, 2019

Google's success was due to marketing and the opportunity to do so at massive scale (wide reach of and frequent media coverage). Even PageRank's mention was marketing, though the world at large had no idea what it meant.

Most latched on to Google due to marketing surrounding and due to their IPO. Many heard and understood: money, billions, billionaire, slides and bicycles, etc.

After, continued marketing kept it all going. Powerful illusion. Google is the embodiment of candy coated BS. Even now, they mainly continue due to continually marketing themselves as greatness..

..and paying for Chrome, Android, and first placement on iPhones.

They even had internal studies showing that while they are pervasive (most use at least one of their products), there isn't any stickiness. That is, if some other search engine had first placement on iPhones, the majority would be using that one. It's like the site that previously appeared when searching for a definition (dictionary.com?): while many used it and did so frequently, they often didn't even realize where they were.

The world (and internet, even) at large is very different from the handful who think themselves aligned with the masses (ie, source of revenue). It's even funnier, as most thinking anyone cared about PageRank don't buy anything. No spending = you don't exist. Anything else is a coincidental nod, stupidity, or coincidence.

stanfordkid · on June 1, 2019

Disagree. Google just had way better results than every other search engine. Secondly, it had a much more streamlined and simple interface (they had a simple search box with a "Search" Button -- no page directory, no email, no banner ads -- during that time period search engines had giant directories of pages by category when you first opened them up) . People would often refer novice users to Google because there were less points of confusion. By the time the company IPO'd they were already on an exponential tear.

baccheion · on June 1, 2019

I agree they were organic'ish (ie, word of mouth and ZDTV) in the beginning (1998-2000 or so). As for search quality, you're seeing their growth from the point of view of a higher-than-norm-IQ-intuitor-type-rational than from the median/typical/pervasive point of view. The majority of internet/Google users aren't the "early adopter" types. They mainly use Google because that's what's there on there phone.

squarefoot · on June 1, 2019

Look, I'm certainly not a Google fan as of today, but you are so wrong, probably for being very very young or very very misinformed; probably both. I tried Google 1st time when it was an university project that didn't even have its own domain (that is, it was a Stanford subdomain), and I was literally blown away by the huge difference compared to all other search engines. At that time my favorite was Altavista though back then doing multiple searches on different search engines was normal as all of them had their very different crawlers and algorithms, so I usually went at least also through Yahoo and Lycos after Altavista. But when Google came out it set up a huge improvement in search reponses, and I mean orders of magnitude faster, nobody did anything even comparable to that before, and soon it became clear that all of us would end up using just one search engine - guess which one. They developed that from scratch with no funding at all, and of course they got money after that but it came because of the great product they had developed, not the other way around.

Google started as the project any hacker would dream to be part of, even for free. What it became after all that money changed it is a different story.

baccheion · on June 1, 2019

Are you reading the same thing I'm writing? Is that why you think Google is used by the masses, rather than marketing and control of first placement (on Chrome, Android, and iPhones)?

Most people just use what's there and that happened to be Google's search engine. It's especially the case after massive growth of internet users due to mobile (ie, there were only ~300 million internet users in 1998 versus the billions online today).

kinkrtyavimoodh · on June 1, 2019

Have you seen Google's growth chart? It was meteoric. All these things you mention came in or after 2007. Google was already a behemoth by the early 2000s and that was because it was just way better than anything else.

baccheion · on June 1, 2019

All the mass of marketing (ie, going mainstream) happened at around their IPO. 2003'ish.

Also, their growth after 2004 was linear not meteoric/exponentially and they are now in decline: https://trends.google.com/trends/explore?date=all&geo=US&q=%....

bborud · on June 9, 2019

Do you know the parable of the blind men and the elephant?

If you are going to use a metric you have to show that it is relevant. It also helps if one agrees on what we are measuring.

remote_phone · on June 1, 2019

Yeah, this is completely wrong. I watched Google grow up having worked in the area and they didn’t have any marketing at the beginning. They first became the best search engine many years before they started adding ads and well before their IPO.

Twirrim · on June 1, 2019

They did it by focusing on an absolutely minimal experience.

The main search engines of the time, like Altavista, gave you a busy, cluttered, experience as they pursued monetisation.

http://web.archive.org/web/20000308224033/http://www.altavis...

Google gave you a search box: http://web.archive.org/web/20000815052943/http://www.google.... Over time they stripped even that down.

Just as significantly, the search results were also uncluttered, and "good enough".

This approach was absolutely critical when they started out, when people would most often be using 56k modems, where every byte had a real impact on the end user experience.

baccheion · on June 1, 2019

Yes, every byte mattered. I was there. Only "early adopter" types were "in it for the speed." Everyone else used what was there. Everyone that was online, as most weren't.

They got larger maybe because they also kept going (didn't have a choice; tried to sell early on for pennies, though they spin the story to distract from what happened). The other engines were bought or sold out.

The death of many companies during the bust also made room for them and they were likely the face of a group effort to "keep it all moving." Facebook was a similar face.

It's funny how knowing makes it seem you don't know. You sound naive and brainwashed. But all it does is once again show how powerful an illusion can really be. I'd (and did) say similar things if I didn't dig deeper.

baccheion · on June 1, 2019

Being the best search engine was irrelevant to their rise. It's just a thing that happened to also (supposedly) be there.

Supposedly, as its ranking algorithm was so heavily gamed by 2007 (already risen; popularity incentivized effort to game) it was a complete joke. Powerful illusion again, as they just covered it up and moved on. Also, internal tests from around 2009-2010 showed Bing was seen by users as producing better-quality results.

Not only did they not have much marketing at the beginning, they also didn't have many users.

bborud · on June 9, 2019

No, you are entirely wrong. There's no part of your "analysis" that has any relation to reality. I don't understand why you keep insisting.

(I worked for two of Google's competitors in the years where they grew from a student project to a huge business. I had the opportunity to take a peek of the code of two other competitors. Many of my former colleagues helped build Bing. I also worked at Google for a few years. I was there so I would know something about it)

remote_phone · on June 3, 2019

You literally have no idea what you’re talking about and spreading lies. Please stop this.

bborud · on June 2, 2019

No, it had nothing to do with marketing. It had everything to do with focus on building the best search experience.

I don't understand why you are raving on about Chrome and iPhones. The iPhone was almost a decade away when Google started getting traction, and building a browser wasn't even at the idea stage.

utopcell · on June 1, 2019

I did a single search on Google back in the '90s and never left. The quality gap was enormous, and it has stayed like this ever since. What world do you live in ?

dang · on June 2, 2019

> What world do you live in

Please edit such acerbic swipes out of your comments here. They break the site guidelines and lower the signal/noise ratio.

https://news.ycombinator.com/newsguidelines.html

baccheion · on June 1, 2019

You're lumping in your reason for using their search with the reason it's used by the majority.

utopcell · on June 1, 2019

..as is evident but the responses to your comment ?

baccheion · on June 1, 2019

It is naive to think the belief in this skewed, biased, self-referencing, curated group is representative.

The general internet user is different, much less concerned with the underlying tech or anything else, and about other things. The general internet user thinks Facebook is the internet, don't realize they are online, and only use online services to text and take/post pictures.

Android phones have a camera icon. Ever thought about the camera app associated with it? Not really if an Android user. You just use it or use it as a backup. If it were another app, you'd be using that one. Wouldn't notice.

Most people don't spend time digging, unless it's something they are really into. Most also don't read reviews or research, though that's been changing over the years. They just go with whatever, unless important (to them).

askafriend · on June 1, 2019

Nope. Wrong.

utopcell · on June 1, 2019

Not sure what you think you understood while at Yahoo, but Yahoo never used PageRank. Inlink-based signals, obviously, but never PageRank. Also, to state the obvious: Google's own execution of PageRank is orders of magnitude more complicated than what's been published. I fully agree with the greater point you are making however. Google is still at heart an engineering excellence company.

bborud · on June 2, 2019

Yahoo had three search engines when I was there. Inktomi, FAST and Altavista. I came to Yahoo through the acquisition of FAST's web search business, and we had a variant of page rank which was developed some time in the summer/fall of 1999. I shared an office with the guy who wrote it. After the initial implementation it went through lots of evolution. As did everything in search.

What gets people confused is that they tend to think this was the only mechanism in use and that it was a solution that never evolved. Things also get confused by the fact that not everybody knew everything. (For instance, hardly anything in the "official" origin story of FAST is true, and you'll get conflicting stories depending on who you ask).

I don't think the Altavista engine was ever used. I think the people from Altavista ended up on Panama, Vespa and possibly some on the Inktomi-based engine. The FAST and Inktomi engines were both in use for a short while for web search, and then the effort was split so the FAST engine was used in what became Vespa (where pagerank isn't as good a mechanism as for the web). Vespa grew out of work at FAST that started around 2002-2003 to separate out the more infrastructural bits of the search engine into more reusable infrastructure components).

Eventually Inktomi was used for web search at Yahoo. Simply because of geography (well, politics). Since Inktomi didn't really have anywhere to go that pretty much sealed the fate of Yahoo as maker of a web search. You might be thinking of Inktomi.

throwawaymath · on June 1, 2019

If you literally mean Yahoo never used the exact specification of the PageRank algorithm, that's probably correct. But if you mean PageRank conceptually, which includes any general search algorithm based on a discrete time Markov process, that's incorrect.

In 2004 Yahoo acquired several smaller companies which were working on algorithmic search and page linking around the same time that Google was. They released their own ranking algorithm called WebRank which was substantially based on the methods of PageRank.

utopcell · on June 1, 2019

Yahoo's core search engine was based on Inktomi's, which was acquired in 2003. There was no PageRank in there, and there was no infrastructure to execute anything similar either. (I worked for a significant amount of time on link-level features during Marissa's rebirth of Yahoo Search.) PageRank is an algorithm that is trivial to understand and prototype, but hard to scale efficiently to 100's of billions of pages.

bborud · on June 2, 2019

That's largely true. But not quite, as Yahoo also acquired the FAST engine. However, since the team developing this were located in Trondheim, Norway, the FAST-based web search was eventually discontinued (and morphed into the VESPA project).

FAST's web search used Page Rank from late 1999.

utopcell · on June 2, 2019

Fascinating! This explains why I never saw anything like PageRank in YST: I presume it was present in what became Vespa (which, to be fair, probably didn't scale to YST's corpora sizes.) Pity we can't continue this conversation offline..

bborud · on June 9, 2019

Of course we can continue the conversation offline. I'm not hard to find :-).

Yeah, VESPA wouldn't have scaled back then, but the search engine we used was far more scalable than Inktomi since it was the same search engine we used for web search. We did hold the record for largest index for a while (to distract people from the fact that our ranking was lagging behind Google's :-)).

But the search engine itself wasn't really the point for VESPA. Also page rank wasn't really as relevant for the use-cases VESPA was for. In fact, ranking in small, special corpora is very different from ranking in web search. And in the case of small document corpora: surprisingly hard, so one depended on tools to specialize both search, ranking and result processing.

I wrote the first implementation of the VESPA QRS with a couple of other guys, which I think was the second component in VESPA (if you count the fsearch/fdispatch as the first). I think this was the first step towards making easily customizable search. The big initial barrier was to convince people Java would be fast enough for this. (I was prepared for a 30% loss of performance in exchange for ease of extension. What we got was a 200% performance boost over the C++ implementation before even starting to optimize. But it was a bit of work to make it play nice with GC in Java and I remember David Jeske at Google refusing to believe me when I outlined how we'd done it :-))

An interesting question is what would have happened if Yahoo had chosen FAST web search instead of Inktomi. According to Jeff Dean, our search engine was the only competitor he was worried about (mentioned over lunch in may 2005 after i accepted a position at Google). Possibly because he didn't understand why it performed well. We made some fundamentally different design bets than Google (they bet RAM would become cheap fast, we bet that it wouldn't. They were right).

Inktomi was a technological dead end. That was a stupid choice by Yahoo top management based solely on geography and reflecting the ineptitude of top management when it came to technology.

To be quite frank, I think Yahoo would have flubbed web search either way. The only reason VESPA managed to survive at all was because it was being developed in Trondheim Norway - far away from Sunnyvale where we could get away ... well, bullshitting leaders and pretending to obey them while doing our own thing. Not that we weren't in deep doo-doo initially (we were in over our heads), but we had some really great people that were able to orchestrate the mess that was VESPA into something that worked, and then something that worked well.

Without mentioning any names, Yahoo had a problem with technologically inept leaders as well as too many useless middle managers. At the time just before we were acquired by Yahoo, it was quite clear that separating out important bits of the search engine into infrastructure components was key. Google had understood this early and done a few very important things (GFS, Protobuffers, MapReduce, Borg etc).

The funny thing was: our first two versions of our search engine (in 1998 and 1999) essentially used MR for crawling and processing, but we did so with shell scripts and duct tape (it was a mess). Anything that could be turned into "sort and scan", as we thought of it, could be done fast. Including page rank and deduplication - and deduplication was a much, much, much harder problem than page rank.

And when I say shell scripts and duct tape: we used unix sort, pipes, shell scripts and various small programs to do "mapping" and "reduction" :-) (strictly speaking, we used our own version of UNIX sort to have the same sorting order on all platforms, but essentially unix sort). Management were only focused on short term sales to portals, so we just reimplemented the same primitives over and over and over in every piece of technology we made. Wasting a ton of time and effort.

I was working on a storage system at the time that was sort of a combination of GFS, MR and Borg (the design came from before the papers about GFS etc were published). The idea was to have a distributed storage on which you could execute code in a sandboxed environment on each node. Meaning that you send the code to where the data lives and process it locally in a parallel manner and stream output to other nodes in the system. After certain executives felt a need to get involved and dictate technology choices I figured the project was doomed and abandoned it. (It was, for a while, known as "the storage system that can't store stuff").

Today I think that my approach would have been too complex to be sufficiently easy to develop. There were certain things about GFS I really didn't like (too trusting of clients), but slicing the problem into distinct domains was the right thing to do. Also, Google had Chubby and we didn't.

enriquto · on June 1, 2019

Every once and then I remember that you can actually patent algorithms. Terror ensues, and a general feeling of frustration and helplesness. Then I try to forget this fact slowly, just to be able to continue living in this crazy world.

As a mathematician, i see algorithms as examples of theorems, and the idea to "patent" a theorem, a mathematical truth, is so foreign!

pron · on June 1, 2019

> i see algorithms as examples of theorems, and the idea to "patent" a theorem, a mathematical truth, is so foreign!

I'm not defending the patenting of algorithms [1], but what is protected by algorithm patents is not their mathematical truth -- quite the opposite, in fact, as I'll explain -- just as what is protected by mechanical patents is not some physical truth.

You are free to publish a patented algorithm (provided you don't copy your text verbatim from a copyrighted source), teach it, study it, etc. to spread that truth and expand it. What you're not allowed to do without permission from the patent owner is to implement it and run it on a computer; i.e. what is protected is not a truth but a certain human action. This is the same for mechanical inventions, which could be equally said to be "physical truths": a mechanism built in this way would, according to the laws of physics, behave in that particular way etc. Similarly, you are allowed to publish and study that physical truth -- what you're not allowed to do is to build it.

Again, I'm not saying whether this is right or wrong, only pointing out that it is not truth that's protected by patents, but application. In fact, one of the original motivations for patent protection is precisely to encourage people to not keep doscovered truths secret by promising them that profitable applications would be reserved to them for some period of time. So patents were designed to help spread truth in exchange for protecting applications. That this is what patents are intended to do is a fact.

It's fine to object to patents -- there are good arguments both in favor and against -- but completely misunderstanding what patents are and what it is that they protect is not one of them.

[1]: I'm not in principle against that, either, except that in practice few patented algorithms rise to the level of inventiveness that patents are intended to protect.

vanderZwan · on June 1, 2019

> This is the same for mechanical inventions

And just like with software patents, people come up with weird workarounds:

> Sun-and-planet motion. The spur-gear to the right, called the planet-gear, is tied to the center of the other, or sun-gear, by an arm which preserves a constant distance between their centers. This was used as a substitute for the crank in a steam engine by James Watt, after the use of the crank had been patented by another party. Each revolution of the planet-gear, which is rigidly attached to the connecting-rod, gives two to the sun-gear, which is keyed to the fly-wheel shaft.

http://507movements.com/mm_039.html

If you look at the animation at the link, you can see it's just a crank with two extra gears attached

im3w1l · on June 1, 2019

> Note that the axle of the planet gear is tied to the axle of the sun gear by a link that freely rotates around the axis of the sun gear and keeps the planet gear engaged with the sun gear but does not contribute to the drive torque. This link appears, at first sight, to be similar to a crank but the drive is not transmitted through it. Thus, it did not contravene the crank patent.

Pretty interesting stuff. Although visually similar it is actually a different principle. And we can readily tell that the arm doesn't contribute to the drive as the sun is rotating faster than the arm would drive it.

kauffj · on June 1, 2019

This seems like a distinction without a difference.

Being granted exclusive access to put a fact to productive use (or any use) seems roughly equivalent to "owning the truth" to me.

pron · on June 1, 2019

But it is not equivalent. The fear was that without patents people won't be able to teach and spread truths because they wouldn't know them. Patents exchange the ability to study and spread truths for protecting their profitable applications. You may be against that compromise because you think it does not achieve its goals or object to such compromises on principle, but it is a very real compromise between very real alternatives -- at least alternatives envisioned by the creators of the patent systems. Denying that alternatives that people have actually been choosing between for centuries [1], and so ignoring the distinction between keeping some truth as a trade secret and making it public but obtaining a time-limited protection on applications -- both are "owning the truth" (if that's what you want to call it) but in different ways -- is completely missing the entire issue, namely which form of ownership is preferable to the other.

One could argue about effectiveness, but it is a common historical intepretation that the choice between different kinds of IP has had a real impact[2], so the distinction is very much one with a difference.

[1]: https://en.wikipedia.org/wiki/History_of_patent_law

[2]: https://www.repository.law.indiana.edu/cgi/viewcontent.cgi?h...

hyperpallium · on June 1, 2019

> “It has long been accepted that 'intellectual information', a mathematical algorithm, mere working directions and a scheme without effect are not patentable.”

http://manuals.ipaustralia.gov.au/patents/national/patentabl...

oefrha · on June 1, 2019

(Also as a mathematician) Many constructive proofs involve what one could loosely categorize as algorithms. (Not to mention in some computational fields, some proofs can be strictly categorized as algorithms.) Now that I think about it, these can be patented too. Then one can "teach it, study it", but one's not allowed to prove without permission other results using constructions in the same spirit. Geez, thanks for letting people study it, I guess.

pron · on June 1, 2019

> but one's not allowed to prove without permission other results using constructions in the same spirit.

No, that would be allowed. Please, if you want to form an opinion about patents, you should first learn what they are and what they protect.

hyperpallium · on June 1, 2019

To be clear on your application point, you could use the same algorithm (theorem) for a different purpose, and it wouldn't infringe the patent.

Schoolmeister · on June 1, 2019

Does this imply that the PageRank algorithm I wrote for a uni assignment is actually illegal? Or does this fall under "studying it"? Note that I'm from Europe.

zild3d · on June 1, 2019

Only if you made/used it commercially and also that Google did not decide you can use it

("What rights does a patent provide? A patent owner has the right to decide who may – or may not – use the patented invention for the period in which the invention is protected. In other words, patent protection means that the invention cannot be commercially made, used, distributed, imported, or sold by others without the patent owner's consent.")

And being out of the country, the US patent doesn't apply to you

("Is a patent valid in every country? Patents are territorial rights. In general, the exclusive rights are only applicable in the country or region in which a patent has been filed and granted, in accordance with the law of that country or region.")

vanderZwan · on June 1, 2019

Europe doesn't have software patents

JohnStrangeII · on June 1, 2019

Same here(though not a mathematician). Patenting algorithms is so horribly wrong on every level, it's still beyond my comprehension how this could ever be allowed. Some of the patents are ridiculously absurd and luckily rarely enforced. I remember the guy on sci.crypt who had a patent on "cascading ciphers", which covered almost any cipher build out of smaller cryptographic building blocks. It basically concerned almost any stream cipher mode and any attempt of combining more than one building block with XOR somewhere. Luckily, the guy never had the money to enforce it although he might have earned some royalties from doing nothing.

Heck, Siemens used to have a German patent on the Internet, except that it wasn't called Internet but something like "making available of structured textual data representations via long-range data transmission", etc.

jMyles · on June 1, 2019

> As a mathematician

You don't need to be a mathematician to see that the current system of IP is completely contrary to the nature of the universe. It will be over soon.

mikro2nd · on June 1, 2019

I'm curious to know what gives you confidence that it will be over soon?

mrighele · on June 1, 2019

Partially in jest, once/if Chinese companies start to hold more patents than American companies (as in "value", not numbers) you will see the US attitude versus patents to be different.

mikro2nd · on June 1, 2019

You may be correct regarding American response. I'd suggest that alongside that, what we're likely to see is the Chinese government significantly tightening up on IP-rights enforcement because it would then be to their advantage to do so. So a nett result of greater IP-rights enforcement globally rather than the original optimistic/hopeful "it will be over soon" assessment.

Krasnol · on June 1, 2019

Thats quite optimistic.

I'd say that if they steal from the Chinese, they won't talk about it much (or say it was the other way around) and continue just the way it was in case they come up with something patentable again.

taffer · on June 1, 2019

Did the British attitude to patents change when German companies started owning more patents than British companies?

justincormack · on June 1, 2019

Yes. For example the British arranged to cancel all German patents after ww2. This led to eg the development of the Japanese camera industry among many other things.

taffer · on June 1, 2019

I fail to see the parallels between Germany after World War II and China in 2019. China's patent situation is much closer to Germany in the late 19th century and the British didn't throw away their patent system in the 19th century just because Germany patented more.

oblio · on June 1, 2019

Yes, when those patents started pointing guns at British interests :)

dredmorbius · on June 1, 2019

Not a patent example, but there's a reason "asprin" is a generic term in the US.

pron · on June 1, 2019

There is nothing in the nature of the universe that's contrary to trade agreements among people, of which patents are one example. You could say that some agreements are unenforceable under current conditions or perhaps undesirable or even silly, but unless people agree to violate the laws of physics, the universe is agnostic to human contracts. As I wrote in another comment, patents do not protect some truth -- actually, their entire purpose was to help spread truths -- just applications (and human ones, not natural ones).

bryanrasmussen · on June 1, 2019

contrary to the nature of the universe? That seems a few levels lower than merely unworkable, I mean a penny falling slower than a bowling ball is contrary to the nature of the universe, I don't think patents contravene anything as primary as gravity.

neiman · on June 1, 2019

You can't patent math theorems because they are not considered inventions, but rather discovering a logical laws of the universe. I treat algorithms in a similar manner.

mrzool · on June 1, 2019

Hyperbole | /hʌɪˈpəːbəli/ | noun

Exaggerated statements or claims not meant to be taken literally.

— The Oxford Dictionary

londons_explore · on June 1, 2019

The current IP system is pretty decent.... Just all time limits should be adjusted to 1 year.

If I invent something, a 1 year headstart in the market should be plenty of reward.

hannasanarion · on June 1, 2019

Seems short, especially for low-yield ip like music. 10-20 is probably good

enriquto · on June 1, 2019

They are talking about patents here. It has nothing to do with copyright which concerns music and which is perfectly alright.

It certainly does not help that they use the ambiguous term IP that does not really mean anything.

londons_explore · on June 1, 2019

If copyright only lasted a year, you can bet the industry would move to much shorter release cycles and still capture most of the value.

gomox · on June 1, 2019

Actually research would move in the direction of things whose value can be captured within a year. The patent system shapes the IP industry as whole.

daver00 · on June 1, 2019

> It will be over soon

The universe or the patent system?

jlawson · on June 1, 2019

I am not super pro-patent, but I don't think it's totally wild to patent an algorithm.

Algorithms can be extremely hard to discover and extremely valuable. Together these facts are why the patent system can usefully apply, same as traditional patent classes.

And lots of traditional types of patents are "algorithms" too, really.

It's not that different from, say, patenting a drug. The algorithm to manufacture [insert expensive drug here] probably isn't that hard to follow, but it was very hard to discover, and we want to encourage more people to discover more algorithms like this.

The real issue is that the people in the court system can't understand the different between trivial, straightforward, somewhat advanced, and truly sophisticated algorithms. Also, since you don't need to risk health and life to test them, computer algorithms are typically easier to test than drugs, so it's likely that few-to-none of them satisfy the "difficult to invent" standard that drug molecules do. It costs like a billion dollars to develop a drug; PageRank was made by a few guys in a room in a matter of months.

As another interesting point of comparison, a lot of old weird gun designs from the late 19th/early 20th century were made to get around patents. It's dumb to make a "blow-forward" pistol but there was a time when the straightforward "blow-back" system was patented. Whole countries used quite weird and suboptimal weapons for a long time simply to avoid patent entanglements. I'm sure to them it seemed similar - how can you patent putting these 2 pieces of metal together in this shape? These systems were not very complicated.

enriquto · on June 1, 2019

I totally cannot bend my mind around this. It is irrelevant if patents on algorithms are useful and good for society in some cases. They are so much self-evidently wrong that the point is moot. The fact that "trivial" algorithms can be patented is a different question that I am not addressing. I am talking only about the non-trivial algorithms that require a stroke of genius.

Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series? All of these are valuable ideas that are non-trivial to come by. The same argument for software patents applies to each of these cases, and in all these cases it is evident (to me) that these mathematical constructions cannot be patented, and that society cannot grant monopoly of their usage to their discoverers.

Scea91 · on June 1, 2019

> Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series? All of these are valuable ideas that are non-trivial to come by. The same argument for software patents applies to each of these cases, and in all these cases it is evident (to me) that these mathematical constructions cannot be patented, and that society cannot grant monopoly of their usage to their discoverers.

You are forgetting that patents are only granted for limited time. If we applied the current system, those things would be patented only for 20 years after their discovery. Would it still be unimaginable?

Maybe your opinion is still the same but the situation is not as clear-cut as it seems. For example, I would guess that if patents were cancelled the industrial budgets for research would somewhat decrease, because there would be smaller advantage in figuring things out first.

nabla9 · on June 1, 2019

Patents should be understood as an alternative to trade secret.

The original reason for patents is enabling more information to become a public. When the patent system did not exist, the only way to protect inventions was to keep them secret. Patent is way to separate information from the economic use of the information. Without patents you must protect the information if you want exclusivity.

Imagine that some extremely important and new algorithm is not patented but kept a secret instead. Today it's possible use provide software as a service and keep the algorithm within protected servers.

> cannot grant monopoly of their usage to their discoverers.

Current patent system is too restrictive and should be reformed. Instead of monopoly, it would be better to have mandatory licensing for algorithms for short period. For example 5 years with some fancy auctioning system to discover the correct price.

catherd · on June 1, 2019

Patents on algorithms are bad because any sane person would feel revulsion at the though of them? Is that the content of your argument?

If Euclid's algorithm was patented in a system similar to our modern one, he would have... cornered the marked in ancient Greek cryptographic protocols for 20 years or so before other ancient mathematicians were allowed join in?

jlawson · on June 1, 2019

>Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series?

This is an interesting point and I'm not sure it cuts the way you want it to.

The world had to wait thousands of years for each of these algorithms to be developed. There are broad reasons for this, but one of them was probably the fact that nobody had any incentive to develop them. Mankind literally had to wait around for some random nerd to come up with these in his spare time.

Now imagine that they were patentable and exploitable for money (for 20 years). Perhaps they would have been developed centuries or millennia earlier. You would have teams of mathematicians from 1000BC onward, working to try to make money inventing theorems and machines and medicines.

20 years under patent seems like a small cost to pay to have something invented possibly thousands of years earlier (or, not kept under trade secret, as e.g. Damascus steel was).

akarma · on June 1, 2019

While I understand your point of view (though I believe patents that last for a couple decades wouldn’t have ruined mathematics), the strongest point, at least to me, of the argument you’re responding to, was drugs.

Drugs, according to the comment, are like an algorithm, with a simple set of inputs— hard to discover these inputs, but then easy to replicate. Yet we seem to feel differently about patenting drugs?

Do you have a response to that part?

boothby · on June 1, 2019

> Imagine that Euclid's algorithm was patented...

As a mathematician with a patented algorithm / open source advocate... I'm extremely squeamish about this and not super comfortable with it. On the other hand... Cardano's formula was once a trade secret. He's dead and we know the trick now.

As others have pointed out, patents expire in 20 years or less. Nothing is stopping you from building on somebody else's patent (popular among trolls: a patent on applying your patent in a novel area, or a more optimized version, a dual version, etc, of your patent). In that situation you can't use your own patent without paying them... and neither can they use yours. Implementation might be a problem, but you can still prove your theorems.

IfOnlyYouKnew · on June 1, 2019

Euclid and Pythagoras are not, and cannot, be patented. So it seems somewhat illogical to argue against patents with the hypothetical of those algorithms being patented.

vbezhenar · on June 1, 2019

I wonder if some real algorithms were ever patented? I mean, just read Knuth, there are brilliant algorithms there that I would never invent myself. Various sorting algorithms, graph algorithms and so on. They are a foundation for our computing. But I never heard that quick sort was ever patented. Why is that?

cesarb · on June 1, 2019

> But I never heard that quick sort was ever patented. Why is that?

Quick sort is from a time before the USA decided that algorithms could be patented. In fact, most foundational algorithms of computer science are from that time. IMO, the development of computer science would have been hampered for a couple of decades had it been possible to patent algorithms back then; we're very lucky that this wasn't the case.

tchalla · on June 1, 2019

Latent Semantic Indexing (LSI) - used in Inforation Retrieval systems - is one example of a patented algorithm.

https://en.wikipedia.org/wiki/Latent_semantic_analysis

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=H...

justincormack · on June 1, 2019

There are a lot if algorithms that are patented that stop people doing stuff, eg in cryptography, mp3, erasure codes etc.

bboreham · on June 1, 2019

Example: the Lempel-Ziv-Welsh algorithm, used in gif files, was patented by Unisys.

salty_biscuits · on June 1, 2019

"Algorithms can be extremely hard to discover"

Can be, most aren't. Most are specializations of a well known thing, applied to a new area. Page rank is (maybe) a good example of this. Not quite old enough to transfer myself back to the time, but diffusion on graphs and power methods have been around for a while.

nickjj · on June 1, 2019

I run into this scenario a lot as a freelance developer.

Companies often request that any code you write that's a part of their contract work must not leave that project and they own all IP rights to every line of code you deliver.

Which isn't reasonable most of the time because technically if you imported a library from Python's standard library you can't import that library again in another project without violating the contract. Or if you pasted in a snippet of code from a library's documentation, since you did it on their time, you can't use it again on another project.

I always do my best to get things reworded to differentiate generic code from business / trade secrets code, and then for the generic code they either license the code from me as MIT or I license the code from them as MIT.

bartimus · on June 1, 2019

Would you, as a mathematician, prefer a world where innovative theorems become covered in secrecy if it means such things couldn't be patented anymore?

enriquto · on June 1, 2019

I do not understand what you are talking about.

In our world, you cannot patent theorems but people publish them anyway. Why? for the honor of human spirit, or to impress chicks, whatever.

menzoic · on June 1, 2019

> In our world, you cannot patent theorems but people publish them anyway. Why?

You'll never have enough information to know what theorems went unpublished. You can only speak to the probability based on incentives. Public institutions and Universities may be incentivized to publish theorems, but that same incentive doesn't exist in corporations. A valuable money making theorem that could give competitors and advantage would never be published unless a patent could be formed using the technique that circumvents the limitations by including other processes.

Even with the existing patent system, trade secrets still exist: WD-40, Coca-Cola, Twinkies...etc. When it comes to money, the default is to preserve.

vikramkr · on June 1, 2019

I wouldn't take it for granted that people publish. Many mathematicians work on wall street, and they would never publish something they came up with if it wasnt 10000% certain the theorem isnt a commercial secret. Money wins all, especially in mathematics half of which was invented to make counting money easier.

enriquto · on June 1, 2019

Yes, of course. Not all known theorems are published immediately. What is the problem with that?

Hiding a truth you have discovered is OK.

"Owning" a truth that you have told everyone is absurd.

vikramkr · on June 1, 2019

Owning a truth for a limited period of time in return for telling everyone instead of keeping it a secret is not absurd. Society would rather you tell people so people can build on it (at least in noncommercial ways, or via license) while you own it than keep it secret, so we created the patent system. The alternative could easily be trade secrets until you die and never reveal it, and as a society we decided to incentivize you to not do that

taffer · on June 1, 2019

I think what he's referring to is that you have to publish your invention to get it patented. The idea of patents was basically to get a temporary monopoly for publishing the secret sauce so that anyone can use your invention after the protection period is over.

jandrewrogers · on June 1, 2019

Companies have mostly stopped patenting computer algorithms outside of narrow areas where there are strong interoperability requirements e.g. video codecs. Few algorithm are practically enforceable as patents.

I think you seriously overestimate the incentives to publish. In several computer science domains, and certainly the ones I work in, the academically published algorithms are often a decade or two behind the state-of-the-art that is never published. Valuable algorithm advances are often explicitly treated as trade secrets. As an equally common case, the inventor(s) simply have zero incentive, either personal or financial, to spend their time publishing it -- they did it to solve a problem, not to publish, so they prioritize spending time with their family etc. This disadvantages open source software, and academia spends much of its time reinventing algorithms already known in the industry. In my experience, surprisingly little hardcore algorithm R&D happens in academia, so any model of information dissemination that makes this assumption is going to be suboptimal.

As an example that is unrelated to my current work, I developed a set of novel algorithms that massively improved the efficient parallelization of graph traversals in 2009 -- a true step function in both scalability and throughput per CPU (I was working on supercomputers at the time). Fast forward a decade to 2019 and these algorithms still haven't shown up in academic literature even though people built systems based on them and they are superior to what is currently in the literature. In this case, the algorithms are not even secret. I've also learned some brilliant and as yet unpublished algorithms of unknown origin via these same oral traditions over the years. As a social dynamic, it feels inappropriate to publish an algorithm that you learned this way.

This is a challenging problem to solve. Companies spend serious money on algorithm R&D hoping to obtain a commercial advantage. Outside of that, publishing is often an unattractive use of one's personal time if you are not an academic. This reality disservices the software community at large and I'd like to arrive at a better solution, even though the current reality benefits me greatly as an insider who sees loads of amazing, unpublished computer science.

JohnStrangeII · on June 1, 2019

I think the only sane answer would be a clearcut Yes, but the question is also highly misleading by insinuating that software patents actually cover innovative theorems. The vast majority of them don't.

Austin Meyer, the maker of the X-Plane flight simulator, was sued for millions of dollars because his app used an in-app purchase option made available by an existing 3rd part SDK.

There is a whole industry of inventing bogus software patents that take utterly trivial everyday business processes like giving someone money for a product, dress them up in fancy lawyer talk and general descriptions of non-existent, "systems" or "methods" without any example or prototype, and then sue people over it. In-app purchases? Better pay up! You have a fax machine or scanner? Pay! Storing business data on some electronic device for later retrieval? Better pay an arbitrarily high patent license fee!

It's absurd.

jandrewrogers · on June 1, 2019

That is a business method patent, not an algorithm patent. They are completely unrelated.

JohnStrangeII · on June 2, 2019

I also gave an example of an equally silly software patent. Besides, I would not call different sorts of patents "completely unrelated". All of them have trolls, have been granted for silly and trivial things, and the system is completely broken. It´s bizarre enough that there are judges and lawyers who apparently think that "business logic" and "algorithms", as well as "mathematics" and "logic", are different entities that can be distinguished from each other...

tty2300 · on June 1, 2019

The majority of patents are for things that are essentially obvious and would have been replicated if not publicly released. If you put a bit of hardware or software out then people will reverse engineer it and know how it works pretty soon. If you don't put it out then it will be difficult to get any benefit from it.

dredmorbius · on June 1, 2019

False dilemma.

There are multiple alternatives.

derefr · on June 1, 2019

Ah, but you’re not patenting the theorem!

Remember what happens when drug companies re-discover a useful molecule that turns out to have “prior art” as a molecule that people have been consuming in some every-day herb or some-such. They can’t patent that molecule (because they didn’t invent it), and there’s no profit motive to go through FDA compliance for a non-patented drug.

But, there’s nothing stopping them from finding the “theorem” behind the “algorithm”—figuring out what it is about the molecule’s structure that makes it have the effect it has—and then discovering another molecule in the same class (another “algorithm” constructively proving the same “theorem”), and then patenting that.

Same is true for actual algorithms: if PageRank is patented, I can just look at the theorem behind it—efficient eigenvalue derivation—and then come up with a different constructive proof of it. It’s easy once you know it’s possible. And, because there are so many known isomorphisms between algorithms (e.g. between algorithms on pointer machines vs. RAM word machines) there are often “obvious” transformations of the algorithm that aren’t considered the same algorithm from the patent office’s POV. (And, I mean, technically they aren’t; they might have a very slightly higher time-complexity, by a factor of the inverse Ackermann function or something. But these are things that don’t matter in practice, just like the random extra bits that the drug companies tack onto their molecules don’t matter in practice.)

VistaBrokeMyPC · on June 1, 2019

See: Johnson & Johnson/Janssen Pharmaceutica esketamine patent. [0]

The FDA just approved the use of the drug, [1] which is basically a ketamine molecule cut in half (ketamine is a racemic molecule), [2] and J&J is selling it for HUNDREDS of dollars per treatment, while regular Ketalar brand ketamine is damn-near as cheap as saline.

[0] https://patents.justia.com/patent/20140093592

[1] https://www.npr.org/sections/health-shots/2019/03/05/7005099...

[2] Arketamine (R(-) - isomer) https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Ar... Esketamine (S(+) - isomer) https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Es...

jacinabox · on June 1, 2019

Whether or not an invention is really covered by a current patent is not the issue; the issue is that patent holders can keep anyone who tries to exploit an invention in perpetual legal jeopardy, by continually filing patents on _related_ inventions.

jgtrosh · on June 1, 2019

I have helped patent my work for my company against bonuses and it's a source of personal shame even though most people I tell this seem to think it's very honorable.

jandrewrogers · on June 1, 2019

It makes a bit more sense if, like the patent office, you view it through the lens of "subject matter directly reducible to an electronic circuit design". Algorithms that meet this criterion are, by necessity, patentable in every country that allows patenting of electronic circuit designs. Most theorems in mathematics are not like this.

(This is a deep rabbit hole. All mechanical patents have the same theoretical equivalence to computer algorithms, though less obvious. The lines that define patentable subject matter do not have a rigorous basis, it is an arbitrary convention.)

This is also why so-called business method patents (i.e. doing X, but on the Internet), which are often improperly conflated with algorithm patents under the rubric of "software patents", are generally not patentable.

LudwigNagasena · on June 1, 2019

Every patent and even every technology is basically an algorithm. And they are valuable because they are true (not sure how mathematical truth is different from any other type of truth).

rawoke083600 · on June 1, 2019

agreed ! Also their is a lot of hard work and engineering in offering the solution as well... Just knowing the algorithm wont help.

Waterluvian · on June 1, 2019

I think I get what you're saying. It's like patenting the theory of gravity. You're just describing some intrinsic logical fact about this universe.

whymauri · on June 1, 2019

There's a lot of discussion I see about PageRank for CS purposes, so I'd like to give a slightly different perspective most people haven't heard about.

A colleague of mine at CalTech recently used the PageRank Algorithm to model the evolution of in-vivo neural networks [0, page 8]. The concept is pretty good for modeling extensions of simple Hebbian learning and explaining some of the ensemble dynamics (at least in the hippocampus). If you're interested in further reading, there's more work on attractor states in biological learning and associative memory (some of which is cited in the paper).

Brief overview: there's debate on whether brain network dynamics are stable or unstable. In networks related to learning and over the timespan of weeks, this experiment observed that ensemble-level dynamic change during learning but some neurons are remarkably stable in how/when they fire. You can rank stability with methods like PageRank, suggesting that connectivity implies importance (and perhaps stability).

[0] https://www.biorxiv.org/content/biorxiv/early/2019/03/07/559...

sitkack · on June 1, 2019

Are you sure it is PageRank specifically? These fof maps were used and investigated as far back as the early 1900s.

Search for "the page rank book" for a deeper analysis.

whymauri · on June 2, 2019

He used a customized version of the PageRank algorithm that comes with MATLAB. The paper is pretty dense with other interesting observations. I think one of the novel contributions is experimental evidence that centrality is important to learning distinct sequences of behaviors.

Consider three behaviors: moving right on a linear track, turning around, and moving left. A reinforcement signal could be sugar water. If you place sugar water at the right end of the track the mouse learns the sequence of moving right, drinking, turning around, and moving left to leave. Now, in the hippocampus different ensembles of time and place neurons are correlated with each distinct activity. The inter-ensemble connectivity has to be learned, as the sequence of actions become correlated not just in behavior but in the brain as well. The neurons that most strongly inhabit inter-ensemble connectivity, tend to be those 'stable' and 'important' anchor neurons.

Yes, it has been studied before that some neurons are more important than others. But the critical extension here is to long-term stability and learning on an unprecedented time-scale and quantity of simultaneously recorded neurons. The actual microscope itself, is a significant technological advancement in how it stays robust to long- and short-term motion artifacts and was custom designed and built by the first-author.

There's also the experiment done on dynamics after traumatic damage to the neural circuits encoding a learned behavior (this is one of the specialty of the Lois group). But that starts deviating from the observation I wanted to make relating to the thread.

The conclusion: "Overall, our findings suggest a model where the patterns of activity of individual neurons gradually change over time while the activity of groups of synchronously active neurons ensures the persistence of representations."

Seems fairly innocuous, but I'm almost certain it will be controversial to some parties.

Irishsteve · on June 1, 2019

For extra context: back then you would manually submit your site to yahoo and dmoz to end up in their results. They saw themselves as directories.

Google was all about crawling and building up the biggest dataset going.

Both approaches were victim to keyword stuffing (lots of keywords at the bottom of the page and if you were lucky it was in a marquee tag).

Pagerank was a pretty decent extra value with a relevance score to promote trust worthy sites. However there were similar techniques like hubs and authorities from kleinberg.

On a side note his old research students / postdocs ended up leading key initiatives at FB newsfeed and Pinterest discovery.

leoc · on June 1, 2019

Not quite: manually assembled search directories like the old Yahoo! were already passé before Google came out. The increasing size of the Web and the arrival of AltaVista had already made automatically-indexed search engines an established thing. The problem was simply that the AltaVista results were overwhelmed with spam. Early on Google search results were very well-ranked, but quite narrow since the engine only crawled and indexed a relatively small proportion of the Web. AltaVista's coverage was much better, iirc, and Google's limited scope was often remarked on in places like Slashdot.

jacquesm · on June 1, 2019

Altavista and others beside were already crawling the web.

Yahoo and Dmoz were curated but Google definitely wasn't the first crawler.

As for the approaches being victim to keyword stuffing: that was because the algorithms used were exclusively 'on page' without assigning a value to links.

calgoo · on June 1, 2019

Yahoo's content was mostly US based if i remember correctly. The reason i switched to Google from Altavista was because of the reduced ads and clean look on the page. The results where about the same back then.

mercer · on June 1, 2019

IIRC speed played a huge role too in part because of the cleaner page, but also in part because of whatever voodoo they used to deliver results.

sitkack · on June 1, 2019

The use of 'near' queries on Altavista removed the spam and had they done some basic query rewriting it would have cleaned much of the spam up. Spam never affected me on Altavista.

randomdata · on June 1, 2019

Yahoo's directory and dmoz seem orthogonal to Google. Not even in 1994, long before Google and when aptly-named WebCrawler was all the rage, would one turn to Yahoo (dmoz did not exist at the time) to find what they were looking for if they had something specific to look for. Yahoo served as a place to go when you weren't looking for anything in particular and wanted to simply explore the web.

There was a time when Yahoo also tried to get into the indexed search space, but never seemed to be a viable competitor against other players in the market. Once Google established their dominance, all bets were completely off.

bhl · on June 2, 2019

> his old research students

Who’s the subject of this sentence? I didn’t know PageRank had academic descendants.

What’s the current research on PageRank like? I looked at Sergey Brin’s academic page a couple months back and was surprised that people still work on nearest neighbors now and back then.

Irishsteve · on June 3, 2019

Give networks crowds and markets a glance. It's a fun book. https://www.cs.cornell.edu/home/kleinber/networks-book/netwo...

But thats from Kleinberg - the person who I was referring to :-)

After that there is personalised page rank / Salsa which are probably the more widely known approaches to identifying trust worthy nodes in a graph.

kumarvvr · on June 1, 2019

To my knowledge, PR has long been succeeded by more sophisticated models. At one time, perhaps even now, Google had a team of mathematicians constantly tuning its ranking algorithm.

PR itself might be the foundation, but it definitely wouldn't be enough to build another Google scale system.

luckylion · on June 1, 2019

Links are still the single most important factor for ranking, though. I mean, there's a lot of other stuff going on, and the content/information extraction has advanced massively since the early days, but PR (or a similar concept) seems to still be the biggest part of determining how relevant a page is.

kumarvvr · on June 1, 2019

Atleast the future is not links. With the move towards reactive js based clients, restful api based services, links hardly have the requisite data for indexing. Even page content is extremely fickle to index.

Google must be having a js engine as part of its web indexing process.

luckylion · on June 1, 2019

They do, and have been using it for quite some time. Not only will they execute JS, they will also do AJAX requests and index the content that is returned.

Since the overwhelming majority of the web is still static, I believe links will be fine for the foreseeable future.

bborud · on June 1, 2019

How do you know this?

luckylion · on June 1, 2019

It's not a secret, they've never said that it changed.

Also, I do a lot of work for SEO/Affiliates. The exact same tech with pretty much the same content ranks top 2 for highly competitive keywords when hosted on a very high PR domain, and top 100-1000 when hosted on a normal, on-topic domain. This is consistent over multiple content areas, and it's why the large media companies have begun to sell/rent folders/subdomains on their site. Instant ranking (until somebody else buys access to a larger media company), no risk, as Google doesn't consider "paid publishing" as against their guidelines.

bborud · on June 2, 2019

So in other words: you don't know.

skunkworker · on June 1, 2019

It's a unique milestone to see something so integral to the development of the internet as we now interact with it become expired. It's also telling that we might need to rethink the duration of the patent system as a whole because so much can change within 20 years.

pishpash · on June 1, 2019

It just makes me feel old, like the internet is last generation's stuff now.

phkahler · on June 1, 2019

It is. Aside from looking up information and ordering things, the net is almost completely useless to me now. Nothing is discoverable any more.

bredren · on June 1, 2019

The Internet does not feel much like a wild place where you might encounter anything anymore.

On the other hand, reality itself is much faster paced because of the Internet.

You can go into the world see, try, learn, experience, be hurt, recover and do it all again at a rate never before possible.

So real discovery, discovery of what it is to be alive is more accessible than ever.

tanilama · on June 1, 2019

But it is no longer that useful anyway. It is probably of more value to Google as a legend for recruiting purpose other than an important technological asset or advantage.

gigatexal · on June 1, 2019

Cue all the whiteboard interviewers: “implement the page rank algorithm, please.”

colmvp · on June 1, 2019

Followed by “what is the time and space complexity?”

shoo · on June 1, 2019

tangent: if anyone wants to read a good non-fiction book about the history of steam, invention, the patent system, etc -- i can recommend William Rosen's book "The Most Powerful Idea in the World: A Story of Steam, Industry, and Invention"

ykevinator · on June 1, 2019

It's too broad to say all algorithms should not be patentable. Amazon's one click buy and page rank are obvious to "those skilled in the arts" but there are plenty of algorithms that should be patentable. The problem is that the uspto doesn't properly get an expert consensus on obviousness.

savrajsingh · on June 1, 2019

I bet the purpose of this patent was to check the investor box of “yes we have a patent,” they’ve never had enforce it, and competitors were not really discouraged by it. Thoughts?

utopcell · on June 1, 2019

Patents can be defensive. Someone else could had patented it and used it against them. Also (subjective opinion), they were kids: They thought patenting it was important, like they thought asking Yahoo for a mere $1M to buy it was a great deal.

spaceman_2020 · on June 1, 2019

Slightly off topic, but has anyone seen a drop in traffic even with the same rankings? Too many of the keywords I used to rank for now have featured snippets. These snippets basically mean that any result after #2 get little to no traffic

It used to be that beyond page 1 was a waste, but I reckon now if you're not #1 or the featured snippet, you don't really stand to gain much in terms of traffic

Cyclone_ · on June 1, 2019

Would be a difficult patent to enforce against a competitor since it would be difficult to tell if a backend is actually implementing this but I suppose it's in their interest to still file for it.

umanwizard · on June 1, 2019

Pretty amazing that you can patent an idea as simple as multiplying a matrix by itself N times.

gilgoomesh · on June 1, 2019

I realize that not everyone understands how patents work but this is ridiculous. The patent claims don't mention matrices. Any implementation (like matrices) is merely an embodiment (you can implement the patent without matrices).

And even if that weren't true, the foundation of the patent system is applying existing techniques to new applications. The background section of the patent clearly details how this technique has been used in other applications.

I don't dispute that a lot of patents are trash but this is possibly the most important patent of the last 30 years. That doesn't mean it invented computers, mathematics and the internet, it just put some already good ideas together. That is what invention is.

Certhas · on June 1, 2019

I realize that not everyone understands how matrices work, but this is ridiculous. The abstract of the patent:

"... the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document."

And the specific claim is:

"Looked at another way, the importance of a page is directly related to the steady-state probability that a random web surfer ends up at the page after following a large number of links."

This _explicitly_ describes a Markov Chain, which is naturally represented by a matrix. A variety of versions of the linear equation are explicitly given in the patent.

To claim you can implement the patent without matrices is, for all intents and purposes, wrong. You can implement the same equations in a variety of ways, but they are still matrix equations.

They patented the idea to apply random walks to ranking webpages. That's arguably reasonably novel, though Wikipedia lists a number of predecessors. But it was also an inevitable invention, because there is a large number of people familiar with Random Walks/Markov Processes, they are routinely taught to undergrads, and are used to model and analyse a vast number of processes [1].

[1] https://en.wikipedia.org/wiki/Markov_chain#Applications

gilgoomesh · on June 1, 2019

What you quoted is not what the patent “claims”. The claims are the numbered points in the section “Claims”. They are the only legally enforced section of a patent and they are written in a very specific language. Everything else is background or embodiment and has little to no legal value.

roenxi · on June 1, 2019

> but this is possibly the most important patent of the last 30 years

It is an interesting struggle to figure out what my objection to that point is. I think it is that we know exactly how hard it is to apply linear algebra to a problem - not everyone's cup of tea, but easily 10% of software engineers would be able to do it.

The truly groundbreaking part of Google was never the indexing - that was a problem that was going to get solved one way or another. The groundbreaking part is figuring out that search + low latency + advertising is a money printing machine and that tech favours the winner.

The mechanism to achieve search + low latency + advertising is important but to some degree unimpressive. If the other search engines at the time had realised the payoffs and how important latency was they'd have gone short text-only ads too and put more engineering time into the problem - maybe someone other than Google would be the search engine of the day.

And even if PageRank was the difference between Google and a hypothetical runner up, the difference of a better algorithm would be marginal. Decisive, but ultimately marginal.

radicalbyte · on June 1, 2019

Back then google loaded in a few seconds whilst the competition could take over a minute for the 99% of users who were on a 28.8k dial-up.

It's almost as if the people behind Lycos/Excite/Altavista were all using the internet via a T1 connection from their unis..

spiderfarmer · on June 1, 2019

I think a big factor is that Google didn’t have to make any money and could keep their homepage as simple as possible.

NicoJuicy · on June 1, 2019

Altavista was actually just a tech demo of their servers :)

acct1771 · on June 1, 2019

And Google had connections to which large organizations from their start that gave them competitive advantage in multiple areas?

beagle3 · on June 1, 2019

Stationary distributions (what pagerank is) were used for relevancy of scientific references at least 20 years before the pagerank patent - I was sitting in a lecture in 95 or so about the Perron Frobenius theorem, and this was given as an (old but not very old) example of an application at the time.

bjourne · on June 1, 2019

Sure, but the novelty of the patent is The Random Surfer Model. That is, applying that math to the ranking of web pages. The novelty is looking at the problem from the right perspective. After you have read the paper, and seen it demonstrated to work, the "invention" is very obvious. But before that, it really isn't.

beagle3 · on June 1, 2019

That lecture I was in described "The random waterfall model", in which you find a scientific paper, randomly pick one of the references, go to it, and continue -- and IIRC, at a small percentage, jump to a random other paper. The professor was not describing his own work, but one that was published a decade or three before.

As far as I can tell (and could tell the day I heard PageRank described a few years later), there was no difference between that and PageRank, although there is a huge practical difference in that scientific papers can only ever refer to those that were published before them (or at least were in preparation at the time), whereas web pages are edited and can point to any other web page.

The "reference rank" application is not a DAG because of the "in prepataion" links, although it is not very far - so the "jump to random paper" is much more important to produce a useful stationary distribution than in PageRank - but it is otherwise the same.

Page and Brin did a lot of interesting things, many of which weren't trivial, and were hugely rewarded for that by society. But PageRank was an application of an old idea to new medium, not a new idea - in a way that (on its face) should not deserve patent protection.

I remember Google's first days - the main selling point for the majority of people I knew was not "it finds what I want when other search engines dont" - people had learned to direct AltaVista properly more or less. The selling point was "It gives an answer in milliseconds insteads of tens of seconds". In fact, I remember complaints because it lacked the "and/or/not/near" and other features that AltaVista and Lycos already had.

sitkack · on June 1, 2019

You have summarized everything superbly.

umanwizard · on June 1, 2019

> The patent claims don’t mention matrices

What? That’s the entire idea of the patent: using repeated matrix multiplication to compute the relative “importance” of various nodes in a directed graph.

> you can implement the patent without matrices

How?

Matumio · on June 1, 2019

The idea sure was brilliant at the time. But I really doubt that we (as a society) have gained anything by allowing this to be protected. It might have prevented some healthy competition. Certainly anyone thinking about search engines (once this became a thing) would have thought of this. And I doubt that nobody would have bothered to create a search engine just because it couldn't be patented.

enitihas · on June 1, 2019

Well it seems much better when you compare it to patenting of rounded corners.

tshaddox · on June 1, 2019

Or how about performing an action on the World Wide Web by clicking a button?

https://en.wikipedia.org/wiki/1-Click

Fezzik · on June 1, 2019

To be fair, double-clicking was always the default for making a selection in a window... /s

mojuba · on June 1, 2019

It's how the US patent system works. Basically it's almost like copyright: you can patent virtually anything that hasn't been patented yet until challenged in court as either trivial or not original (i.e. prior art exists). Essentially the US patent system achieves two goals: shift the burden of verifications and validations to competitors and courts; and secondly, provide the US some advantage on the international level via the Patent Cooperation Treaty. Any crappy patent may turn out to be important, so just let them register everything and see which one "sticks".

IfOnlyYouKnew · on June 1, 2019

Considering there were search engines before Google, the idea apparently wasn’t quite as obvious.

umanwizard · on June 1, 2019

Simple does not necessarily imply obvious. The chain rule is another simple application of matrix multiplication that most people couldn’t have come up with independently.

ekianjo · on June 1, 2019

Well some companies patented one click to order. Hard to beat.

sideshowb · on June 1, 2019

It's the application to search that was patented. There were plenty prior publications on eigenvalue centrality.

bhl · on June 2, 2019

If we’re talking about finding the steady / convergence state of a matrix, there are better ways than simple repeated matrix multiplication. One would be to solve a linear system of equations. The one PageRank uses if I recall correctly is the power method for eigenvalues.

shoo · on June 1, 2019

applied exponentiation

make3 · on June 1, 2019

person, the question actually is which matrix that is

vietvu · on June 3, 2019

Page Rank, Map Reduce... those infamous tools that once game changing. But now, I think there are many better ones, just less well known

sinzone · on June 1, 2019

White page that would lod fast when everyone had 56kb was key

samcodes · on June 1, 2019

They should have put some round ears on it...

baxtr · on June 1, 2019

Maybe they should revert back to PR to yield better Search results

thrwayxyz · on June 1, 2019

The Google results started going bad for me around when project humming bird came along. We need a distributed open data alternative that can be tweaked for your proferences transparently.

And we also need to start paying for the internet. The malvertising model we have now is unsustainable and crippling the network in favour of Facebook and Google.

kumarvvr · on June 1, 2019

> We need a distributed open data alternative that can be tweaked for your proferences transparently

Wonder if some organization like ACLU or the like take up such a project and release a paid version for power users.

xhgdvjky · on June 1, 2019

I would pay for a personalized search engine. I think if the results are ad free I'd pay 10-15 per year (not much since Google is decent).

bborud · on June 1, 2019

What's stopping you from doing it?

ChrisCinelli · on June 1, 2019

So I am not the only one that get pretty bad search results compared to the ones I used to get. What caused this shift?

ClassyJacket · on June 1, 2019

The problem with this is that your definition of 'good results' and Google's definition of 'good results' are somewhat in conflict since you have different goals.

anticensor · on June 1, 2019

The current algorithm includes PageRank as its component.

antisemiotic · on June 1, 2019

Turning off the recaptcha harrassment of people trying to search repeatedly with refined terms would be a good start. But hey, now you can type a query while piss drunk and Google will understand, progress!

sdan · on June 1, 2019

So what now? Someone copies PageRank?

manigandham · on June 1, 2019

Nothing happens. Similar ranking systems are already in use and there's much more to a modern search engine anyway. It's just a small quirk of this very particular approach no longer being fully protected.

kbumsik · on June 1, 2019

Nothing. Everybody freely has copied PageRank already. It has became one of the basic teaching material in graph theory.

For example, NetworkX, a popular graph library in Python, implements PageRank. [1]

[1] https://networkx.github.io/documentation/networkx-1.10/refer...

kreetx · on June 1, 2019

How does that work in the real world - do google simply not care, or it's "fair use" somehow?

rocqua · on June 1, 2019

The patent is only for web search, so other applications dont infringe the patent.

bbrb · on June 1, 2019

First sentence of the abstract of the patent:

"A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database."

acct1771 · on June 1, 2019

They'd only care if money gained - PR lost == profit.

aitchnyu · on June 1, 2019

Years back, graph dbs bragged about implementing PR in a few lines and ecologists using PR on species.

https://neo4j.com/docs/graph-algorithms/current/algorithms/p...

https://phys.org/news/2009-09-web-page-algorithm-critical-sp...

spiderfarmer · on June 1, 2019

We already have Open Pagerank based on the Common Crawl.

knbknb · on June 1, 2019

Maybe Stanford University no longer gets any license fees from Google for this patent. not sure

pmlnr · on June 1, 2019

Hopefully everyone deprecates everything similar and we return to content based search, where pinterest and there rest freeze in the depth of hell.

jfk13 · on June 1, 2019

> return to content based search

It'll need to be a pretty sophisticated version of "content based search", or it'll just be overwhelmed by keyword stuffing and garbage auto-generated content.

dang · on June 1, 2019

"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

Edit: bad call. Detached from https://news.ycombinator.com/item?id=20067782 and marked off-topic.

umanwizard · on June 1, 2019

Sorry dang, I respect your moderation a lot but I have to disagree here that my comment is a shallow dismissal that can’t teach anyone anything.

PageRank really is a very simple idea based on elementary linear algebra, a fact many people might not have known. Thus my comment could inspire a curious person to go read more about how PageRank works instead of fearing that there is a Ph.D worth of prerequisites.

Furthermore, it is a relevant comment on the US patent system.

By the way, I think PageRank is an incredibly important development in the history of technology, and took a fair bit of ingenuity to think up. I’m not dismissive of it at all. But it is also very simple, which doesn’t contradict any of the above. And I don’t think it should be patentable, just like I don’t think other simple and intuitive applications of matrix multiplication (like the chain rule from calculus, for example) should be patentable either.

dang · on June 1, 2019

I'm persuaded. Sorry!

gingabriska · on June 1, 2019

Is there any page rank library launched on Github? Can exgooglers make such libraries and raise themsleves to open source fame?

visarga · on June 1, 2019

It's been done many times, and it's easy to implement for small graphs.

Example: https://networkx.github.io/documentation/networkx-1.10/refer...

bhl · on June 2, 2019

As mentioned, the difficulty with PageRank isn’t the implementation, but scaling it to billions of webpages. Doing so would require either extensive knowledge of systems or approximation methods.

wodenokoto · on June 1, 2019

There are plenty of implementations and how-to’s Out there already.

If I remember correctly it is basically a way to calculate a random walk through a network without having to do the walk.

dvirsky · on June 1, 2019

Apache Spark has had an implementation of it for a few years as part of its batteries included. https://spark.apache.org/docs/1.6.1/api/java/org/apache/spar...

burtonator · on June 1, 2019

Here's the raw truth that no one in the valley will admit.

It's about two things:

1. Execution.

2. Luck.

... but it's mostly execution.

If you have a BRILLIANT idea but can't execute you're dead.

There are tons of companies in the history books as examples.

BUT... if you can EXECUTE, you can take a shitty idea, abandon it, pivot, measure, and focus BACK on a good idea. Then when you have a good idea you can keep executing it forward.

The rest of luck but this can be worked on as luck is often timing + being prepared.

aerophilic · on June 1, 2019

I don’t think people don’t admit it, it is just too often people think “they are special”.

Admittedly, that is a critical part of being a founder/entrepreneur, otherwise knowing just how “stacked against you” the real world is would make you go running for the hills.

That said, I absolutely agree it is always about execution, with luck.

But to echo someone else’s comment: luck really is about opportunity + preparedness/ability to recognize a true opportunity.

For an analogy on the luck part: in poker, everyone is dealt the same hands in the long run. It is about being able to read both the cards and opponents effectively to routinely “win”.

On the execution part, it is all about focus, learning to know what is important and what really isn’t. Time spent on anything that doesn’t really matter slows you down. Of course the hard part is knowing what matters and what doesn’t. My simple advise here is to ask yourself two questions: What am I not getting done by doing this? What is more important, this or what I am giving up?

One word of caution on the above: understand what is urgent, and what is important, and don’t let things that are important but not urgent slide.

rockinghigh · on June 1, 2019

I wonder where you got the idea that no one will admit this. Execution is the number one priority of founders of early stage companies: finding people who can quickly turn their idea into a working product they can iterate on.

mgkimsal · on June 1, 2019

I'm suspecting it's the 'luck' part the OP thinks people won't/don't admit (or acknowledge enough, perhaps).

ldng · on June 1, 2019

Yes. And even in this thread people are minimizing it.

MegaButts · on June 1, 2019

I strongly disagree that execution matters more than luck. I ask myself would I rather be good or lucky, and I will always choose luck. Because no matter how good you are, if luck isn't on your side you're going to lose. You might be the better programmer, but you still didn't get the job. You might have the better proposal, but you still lost the bid. You might be the better candidate, but you still lost the election. You might have a better product, but your company still failed. You might eat right and exercise, but you still got cancer.

I see this regularly play out when founders who found success on their first startups attempt to repeat it, only to fail miserably. Granted, there are a handful who manage to duplicate their success, but I know that some of these are simply due to who you know from their first success (you could argue that's better execution, but the number of unicorns I know that wouldn't exist if it weren't for a lucky break from an investor previously befriended makes me think otherwise). I do not mean to belittle the hard work and brilliance of many successful founders, only to emphasize that luck was absolutely critical in nearly every success story.

Obviously it's better to have both, but pretending that you can outmaneuver the universe is an act of hubris. I think people pontificating that execution matters more than luck are too arrogant to realize how lucky they are, or want to believe it matters more because it's reassuring to believe that we are in control of our destinies.

There are things you can do to increase your exposure to luck, but it's ultimately something beyond your control. The world isn't meritocratic.