Hacker News new | past | comments | ask | show | jobs | submit login
Google's PageRank patent has expired (patents.google.com)
507 points by dfabulich 47 days ago | hide | past | web | favorite | 220 comments



Myth: PageRank was the secret to Google's success

It wasn't really that simple. For a brief while, perhaps it made a difference, but within perhaps a span of 6 months, every decent search engine implemented page rank in one form or another. It is a cute story for the muggles to focus on. In reality search was already then about balancing a large number of signals into a decent ranking formula. It was much, much harder than just applying some magic algorithm and I think the people who built Google search back in those days deserve a lot more credit. But that wasn't really a sexy story, I guess.

To a much greater degree than any algorithm or formula: Google's ability to execute, and to do so in cultural sympathy with the web, was more important. Much has been said about Google's Not Invented Here, but this made all the difference in the early days: you had to get things to where you could iterate and innovate fast.

And I say that admiringly as someone who worked for one of their competitors at the time. I used to be jealous of Google because they were managed by people who were part of the Internet. Our management was alien to the Internet (and our business model was to power search for portals mainly run by horrible, stupid people in suits).

Google was the only search engine that was properly in tune with its audience: focusing on the user.

(Disclosure: I worked for FAST, then Yahoo, then Google until I quit Google in 2009)


What is the best way to learn how to execute at a very high level as Google does?

We are a 25 person tech startup and while our culture is amazing we struggle with execution. This is an ongoing struggle for us; it’s so easy for a young startup to misunderstand the importance of execution.

Is reading books enough, if so, which? Should we hire a COO from a company with a history of excellent execution (how to tell?). Are there courses to take? Or is it just about prioritizing excellent execution with continuous learning?

Some resources that have helped so far: Scaling Up (book) and First Round Review blog.


IME the thing that matters the most is focus, both on the micro and macro level. It's way too easy to get caught on things that won't make an iota of difference in your future. Focus is the first thing that goes out the window as soon a a company reach just a bit of momentum.

I've seen several startups suffering from not being able to decide what they were. Engineering teams fractured because the company wanted half of them working on their bottom line and the other half working in some offshoot product.

Even gigantic companies need that: notice how people criticize Google for creating and killing way too many products, and at the same time praising their minimalist webpage (since 1998), early GMail, etc. Same with Apple when Jobs returned to Apple and streamlined their product line, etc.


I agree with this but want to add prioritisation is key. As per Fred Brooks, there is no silver bullet. What he meant by that is that you can't get an order of magnitude more work done using a tool or process because in programming, 25% of your time is spent doing analysis and there is no way to reduce that (To me, that's only if you are executing well!). This generalises to a lot of other pursuits.

Prioritisation is about deciding what not to do. Forget the BS story about putting rocks and sand in a jar, where the secret is to put the big rocks in first so that everything fits. That's not how you need to do prioritisation, because order does not appreciable change the amount of time something takes. The secret is to put only the big rocks in. Period. You go 6 times faster because you have 6 things you could do, but you only do one of them.

Now the real kicker is that the only way to determine which one thing to do is to do analysis on all 6. An HN post is not really a reasonable way to describe this. However, consider a requirements discovery to be like a tree. Requirements are discovered at a particular rate, as you work on something. Discovery of one requirement leads to discovery of new requirements. It's a feedback loop. Pruning the tree as early as you can leads to significant gains later on. So while you can't actually get the 6x development time by avoiding 5/6ths of the requirements, you can pretty easily get 2-3x gains.

BTW, for anyone interested in a more rigorous approach, consider taking something like Littlewood's model of defect discovery and assuming that requirements discovery has a similar curve. Littlewood's model is very naive, but I've found that it still has a lot of value. Again, sorry for cryptic hints here, but I don't have time to write a book on it (which it would certainly take...)


The parent comment on focus and this comment on extreme prioritization are so helpful.

Recently we are focused on just one objective and key result - it took years to get that far - we used to have so many. But even now with just one objective we still picked 25 initiatives to attempt to reach our goal. In retrospect it was an obvious fail because we only executed on a few well. We did some initial analysis but considering your comment I think we could have done much more analysis and cut much deeper and picked just a few or even just one. This is radical thinking!

Thank you for getting deeper into this. Do you have any other hints on where I could learn more about the approach you are describing?


Google's success was due to marketing and the opportunity to do so at massive scale (wide reach of and frequent media coverage). Even PageRank's mention was marketing, though the world at large had no idea what it meant.

Most latched on to Google due to marketing surrounding and due to their IPO. Many heard and understood: money, billions, billionaire, slides and bicycles, etc.

After, continued marketing kept it all going. Powerful illusion. Google is the embodiment of candy coated BS. Even now, they mainly continue due to continually marketing themselves as greatness..

..and paying for Chrome, Android, and first placement on iPhones.

They even had internal studies showing that while they are pervasive (most use at least one of their products), there isn't any stickiness. That is, if some other search engine had first placement on iPhones, the majority would be using that one. It's like the site that previously appeared when searching for a definition (dictionary.com?): while many used it and did so frequently, they often didn't even realize where they were.

The world (and internet, even) at large is very different from the handful who think themselves aligned with the masses (ie, source of revenue). It's even funnier, as most thinking anyone cared about PageRank don't buy anything. No spending = you don't exist. Anything else is a coincidental nod, stupidity, or coincidence.


Disagree. Google just had way better results than every other search engine. Secondly, it had a much more streamlined and simple interface (they had a simple search box with a "Search" Button -- no page directory, no email, no banner ads -- during that time period search engines had giant directories of pages by category when you first opened them up) . People would often refer novice users to Google because there were less points of confusion. By the time the company IPO'd they were already on an exponential tear.


I agree they were organic'ish (ie, word of mouth and ZDTV) in the beginning (1998-2000 or so). As for search quality, you're seeing their growth from the point of view of a higher-than-norm-IQ-intuitor-type-rational than from the median/typical/pervasive point of view. The majority of internet/Google users aren't the "early adopter" types. They mainly use Google because that's what's there on there phone.


Look, I'm certainly not a Google fan as of today, but you are so wrong, probably for being very very young or very very misinformed; probably both. I tried Google 1st time when it was an university project that didn't even have its own domain (that is, it was a Stanford subdomain), and I was literally blown away by the huge difference compared to all other search engines. At that time my favorite was Altavista though back then doing multiple searches on different search engines was normal as all of them had their very different crawlers and algorithms, so I usually went at least also through Yahoo and Lycos after Altavista. But when Google came out it set up a huge improvement in search reponses, and I mean orders of magnitude faster, nobody did anything even comparable to that before, and soon it became clear that all of us would end up using just one search engine - guess which one. They developed that from scratch with no funding at all, and of course they got money after that but it came because of the great product they had developed, not the other way around.

Google started as the project any hacker would dream to be part of, even for free. What it became after all that money changed it is a different story.


Are you reading the same thing I'm writing? Is that why you think Google is used by the masses, rather than marketing and control of first placement (on Chrome, Android, and iPhones)?

Most people just use what's there and that happened to be Google's search engine. It's especially the case after massive growth of internet users due to mobile (ie, there were only ~300 million internet users in 1998 versus the billions online today).


Have you seen Google's growth chart? It was meteoric. All these things you mention came in or after 2007. Google was already a behemoth by the early 2000s and that was because it was just way better than anything else.


All the mass of marketing (ie, going mainstream) happened at around their IPO. 2003'ish.

Also, their growth after 2004 was linear not meteoric/exponentially and they are now in decline: https://trends.google.com/trends/explore?date=all&geo=US&q=%....


Do you know the parable of the blind men and the elephant?

If you are going to use a metric you have to show that it is relevant. It also helps if one agrees on what we are measuring.


Yeah, this is completely wrong. I watched Google grow up having worked in the area and they didn’t have any marketing at the beginning. They first became the best search engine many years before they started adding ads and well before their IPO.


They did it by focusing on an absolutely minimal experience.

The main search engines of the time, like Altavista, gave you a busy, cluttered, experience as they pursued monetisation.

http://web.archive.org/web/20000308224033/http://www.altavis...

Google gave you a search box: http://web.archive.org/web/20000815052943/http://www.google.... Over time they stripped even that down.

Just as significantly, the search results were also uncluttered, and "good enough".

This approach was absolutely critical when they started out, when people would most often be using 56k modems, where every byte had a real impact on the end user experience.


Yes, every byte mattered. I was there. Only "early adopter" types were "in it for the speed." Everyone else used what was there. Everyone that was online, as most weren't.

They got larger maybe because they also kept going (didn't have a choice; tried to sell early on for pennies, though they spin the story to distract from what happened). The other engines were bought or sold out.

The death of many companies during the bust also made room for them and they were likely the face of a group effort to "keep it all moving." Facebook was a similar face.

It's funny how knowing makes it seem you don't know. You sound naive and brainwashed. But all it does is once again show how powerful an illusion can really be. I'd (and did) say similar things if I didn't dig deeper.


Being the best search engine was irrelevant to their rise. It's just a thing that happened to also (supposedly) be there.

Supposedly, as its ranking algorithm was so heavily gamed by 2007 (already risen; popularity incentivized effort to game) it was a complete joke. Powerful illusion again, as they just covered it up and moved on. Also, internal tests from around 2009-2010 showed Bing was seen by users as producing better-quality results.

Not only did they not have much marketing at the beginning, they also didn't have many users.


No, you are entirely wrong. There's no part of your "analysis" that has any relation to reality. I don't understand why you keep insisting.

(I worked for two of Google's competitors in the years where they grew from a student project to a huge business. I had the opportunity to take a peek of the code of two other competitors. Many of my former colleagues helped build Bing. I also worked at Google for a few years. I was there so I would know something about it)


You literally have no idea what you’re talking about and spreading lies. Please stop this.


No, it had nothing to do with marketing. It had everything to do with focus on building the best search experience.

I don't understand why you are raving on about Chrome and iPhones. The iPhone was almost a decade away when Google started getting traction, and building a browser wasn't even at the idea stage.

utopcell 46 days ago [flagged]

I did a single search on Google back in the '90s and never left. The quality gap was enormous, and it has stayed like this ever since. What world do you live in ?


> What world do you live in

Please edit such acerbic swipes out of your comments here. They break the site guidelines and lower the signal/noise ratio.

https://news.ycombinator.com/newsguidelines.html


You're lumping in your reason for using their search with the reason it's used by the majority.


..as is evident but the responses to your comment ?


It is naive to think the belief in this skewed, biased, self-referencing, curated group is representative.

The general internet user is different, much less concerned with the underlying tech or anything else, and about other things. The general internet user thinks Facebook is the internet, don't realize they are online, and only use online services to text and take/post pictures.

Android phones have a camera icon. Ever thought about the camera app associated with it? Not really if an Android user. You just use it or use it as a backup. If it were another app, you'd be using that one. Wouldn't notice.

Most people don't spend time digging, unless it's something they are really into. Most also don't read reviews or research, though that's been changing over the years. They just go with whatever, unless important (to them).


Nope. Wrong.


Not sure what you think you understood while at Yahoo, but Yahoo never used PageRank. Inlink-based signals, obviously, but never PageRank. Also, to state the obvious: Google's own execution of PageRank is orders of magnitude more complicated than what's been published. I fully agree with the greater point you are making however. Google is still at heart an engineering excellence company.


Yahoo had three search engines when I was there. Inktomi, FAST and Altavista. I came to Yahoo through the acquisition of FAST's web search business, and we had a variant of page rank which was developed some time in the summer/fall of 1999. I shared an office with the guy who wrote it. After the initial implementation it went through lots of evolution. As did everything in search.

What gets people confused is that they tend to think this was the only mechanism in use and that it was a solution that never evolved. Things also get confused by the fact that not everybody knew everything. (For instance, hardly anything in the "official" origin story of FAST is true, and you'll get conflicting stories depending on who you ask).

I don't think the Altavista engine was ever used. I think the people from Altavista ended up on Panama, Vespa and possibly some on the Inktomi-based engine. The FAST and Inktomi engines were both in use for a short while for web search, and then the effort was split so the FAST engine was used in what became Vespa (where pagerank isn't as good a mechanism as for the web). Vespa grew out of work at FAST that started around 2002-2003 to separate out the more infrastructural bits of the search engine into more reusable infrastructure components).

Eventually Inktomi was used for web search at Yahoo. Simply because of geography (well, politics). Since Inktomi didn't really have anywhere to go that pretty much sealed the fate of Yahoo as maker of a web search. You might be thinking of Inktomi.


If you literally mean Yahoo never used the exact specification of the PageRank algorithm, that's probably correct. But if you mean PageRank conceptually, which includes any general search algorithm based on a discrete time Markov process, that's incorrect.

In 2004 Yahoo acquired several smaller companies which were working on algorithmic search and page linking around the same time that Google was. They released their own ranking algorithm called WebRank which was substantially based on the methods of PageRank.


Yahoo's core search engine was based on Inktomi's, which was acquired in 2003. There was no PageRank in there, and there was no infrastructure to execute anything similar either. (I worked for a significant amount of time on link-level features during Marissa's rebirth of Yahoo Search.) PageRank is an algorithm that is trivial to understand and prototype, but hard to scale efficiently to 100's of billions of pages.


That's largely true. But not quite, as Yahoo also acquired the FAST engine. However, since the team developing this were located in Trondheim, Norway, the FAST-based web search was eventually discontinued (and morphed into the VESPA project).

FAST's web search used Page Rank from late 1999.


Fascinating! This explains why I never saw anything like PageRank in YST: I presume it was present in what became Vespa (which, to be fair, probably didn't scale to YST's corpora sizes.) Pity we can't continue this conversation offline..


Of course we can continue the conversation offline. I'm not hard to find :-).

Yeah, VESPA wouldn't have scaled back then, but the search engine we used was far more scalable than Inktomi since it was the same search engine we used for web search. We did hold the record for largest index for a while (to distract people from the fact that our ranking was lagging behind Google's :-)).

But the search engine itself wasn't really the point for VESPA. Also page rank wasn't really as relevant for the use-cases VESPA was for. In fact, ranking in small, special corpora is very different from ranking in web search. And in the case of small document corpora: surprisingly hard, so one depended on tools to specialize both search, ranking and result processing.

I wrote the first implementation of the VESPA QRS with a couple of other guys, which I think was the second component in VESPA (if you count the fsearch/fdispatch as the first). I think this was the first step towards making easily customizable search. The big initial barrier was to convince people Java would be fast enough for this. (I was prepared for a 30% loss of performance in exchange for ease of extension. What we got was a 200% performance boost over the C++ implementation before even starting to optimize. But it was a bit of work to make it play nice with GC in Java and I remember David Jeske at Google refusing to believe me when I outlined how we'd done it :-))

An interesting question is what would have happened if Yahoo had chosen FAST web search instead of Inktomi. According to Jeff Dean, our search engine was the only competitor he was worried about (mentioned over lunch in may 2005 after i accepted a position at Google). Possibly because he didn't understand why it performed well. We made some fundamentally different design bets than Google (they bet RAM would become cheap fast, we bet that it wouldn't. They were right).

Inktomi was a technological dead end. That was a stupid choice by Yahoo top management based solely on geography and reflecting the ineptitude of top management when it came to technology.

To be quite frank, I think Yahoo would have flubbed web search either way. The only reason VESPA managed to survive at all was because it was being developed in Trondheim Norway - far away from Sunnyvale where we could get away ... well, bullshitting leaders and pretending to obey them while doing our own thing. Not that we weren't in deep doo-doo initially (we were in over our heads), but we had some really great people that were able to orchestrate the mess that was VESPA into something that worked, and then something that worked well.

Without mentioning any names, Yahoo had a problem with technologically inept leaders as well as too many useless middle managers. At the time just before we were acquired by Yahoo, it was quite clear that separating out important bits of the search engine into infrastructure components was key. Google had understood this early and done a few very important things (GFS, Protobuffers, MapReduce, Borg etc).

The funny thing was: our first two versions of our search engine (in 1998 and 1999) essentially used MR for crawling and processing, but we did so with shell scripts and duct tape (it was a mess). Anything that could be turned into "sort and scan", as we thought of it, could be done fast. Including page rank and deduplication - and deduplication was a much, much, much harder problem than page rank.

And when I say shell scripts and duct tape: we used unix sort, pipes, shell scripts and various small programs to do "mapping" and "reduction" :-) (strictly speaking, we used our own version of UNIX sort to have the same sorting order on all platforms, but essentially unix sort). Management were only focused on short term sales to portals, so we just reimplemented the same primitives over and over and over in every piece of technology we made. Wasting a ton of time and effort.

I was working on a storage system at the time that was sort of a combination of GFS, MR and Borg (the design came from before the papers about GFS etc were published). The idea was to have a distributed storage on which you could execute code in a sandboxed environment on each node. Meaning that you send the code to where the data lives and process it locally in a parallel manner and stream output to other nodes in the system. After certain executives felt a need to get involved and dictate technology choices I figured the project was doomed and abandoned it. (It was, for a while, known as "the storage system that can't store stuff").

Today I think that my approach would have been too complex to be sufficiently easy to develop. There were certain things about GFS I really didn't like (too trusting of clients), but slicing the problem into distinct domains was the right thing to do. Also, Google had Chubby and we didn't.


Every once and then I remember that you can actually patent algorithms. Terror ensues, and a general feeling of frustration and helplesness. Then I try to forget this fact slowly, just to be able to continue living in this crazy world.

As a mathematician, i see algorithms as examples of theorems, and the idea to "patent" a theorem, a mathematical truth, is so foreign!


> i see algorithms as examples of theorems, and the idea to "patent" a theorem, a mathematical truth, is so foreign!

I'm not defending the patenting of algorithms [1], but what is protected by algorithm patents is not their mathematical truth -- quite the opposite, in fact, as I'll explain -- just as what is protected by mechanical patents is not some physical truth.

You are free to publish a patented algorithm (provided you don't copy your text verbatim from a copyrighted source), teach it, study it, etc. to spread that truth and expand it. What you're not allowed to do without permission from the patent owner is to implement it and run it on a computer; i.e. what is protected is not a truth but a certain human action. This is the same for mechanical inventions, which could be equally said to be "physical truths": a mechanism built in this way would, according to the laws of physics, behave in that particular way etc. Similarly, you are allowed to publish and study that physical truth -- what you're not allowed to do is to build it.

Again, I'm not saying whether this is right or wrong, only pointing out that it is not truth that's protected by patents, but application. In fact, one of the original motivations for patent protection is precisely to encourage people to not keep doscovered truths secret by promising them that profitable applications would be reserved to them for some period of time. So patents were designed to help spread truth in exchange for protecting applications. That this is what patents are intended to do is a fact.

It's fine to object to patents -- there are good arguments both in favor and against -- but completely misunderstanding what patents are and what it is that they protect is not one of them.

[1]: I'm not in principle against that, either, except that in practice few patented algorithms rise to the level of inventiveness that patents are intended to protect.


> This is the same for mechanical inventions

And just like with software patents, people come up with weird workarounds:

> Sun-and-planet motion. The spur-gear to the right, called the planet-gear, is tied to the center of the other, or sun-gear, by an arm which preserves a constant distance between their centers. This was used as a substitute for the crank in a steam engine by James Watt, after the use of the crank had been patented by another party. Each revolution of the planet-gear, which is rigidly attached to the connecting-rod, gives two to the sun-gear, which is keyed to the fly-wheel shaft.

http://507movements.com/mm_039.html

If you look at the animation at the link, you can see it's just a crank with two extra gears attached


> Note that the axle of the planet gear is tied to the axle of the sun gear by a link that freely rotates around the axis of the sun gear and keeps the planet gear engaged with the sun gear but does not contribute to the drive torque. This link appears, at first sight, to be similar to a crank but the drive is not transmitted through it. Thus, it did not contravene the crank patent.

Pretty interesting stuff. Although visually similar it is actually a different principle. And we can readily tell that the arm doesn't contribute to the drive as the sun is rotating faster than the arm would drive it.


This seems like a distinction without a difference.

Being granted exclusive access to put a fact to productive use (or any use) seems roughly equivalent to "owning the truth" to me.


But it is not equivalent. The fear was that without patents people won't be able to teach and spread truths because they wouldn't know them. Patents exchange the ability to study and spread truths for protecting their profitable applications. You may be against that compromise because you think it does not achieve its goals or object to such compromises on principle, but it is a very real compromise between very real alternatives -- at least alternatives envisioned by the creators of the patent systems. Denying that alternatives that people have actually been choosing between for centuries [1], and so ignoring the distinction between keeping some truth as a trade secret and making it public but obtaining a time-limited protection on applications -- both are "owning the truth" (if that's what you want to call it) but in different ways -- is completely missing the entire issue, namely which form of ownership is preferable to the other.

One could argue about effectiveness, but it is a common historical intepretation that the choice between different kinds of IP has had a real impact[2], so the distinction is very much one with a difference.

[1]: https://en.wikipedia.org/wiki/History_of_patent_law

[2]: https://www.repository.law.indiana.edu/cgi/viewcontent.cgi?h...


> “It has long been accepted that 'intellectual information', a mathematical algorithm, mere working directions and a scheme without effect are not patentable.”

http://manuals.ipaustralia.gov.au/patents/national/patentabl...


(Also as a mathematician) Many constructive proofs involve what one could loosely categorize as algorithms. (Not to mention in some computational fields, some proofs can be strictly categorized as algorithms.) Now that I think about it, these can be patented too. Then one can "teach it, study it", but one's not allowed to prove without permission other results using constructions in the same spirit. Geez, thanks for letting people study it, I guess.


> but one's not allowed to prove without permission other results using constructions in the same spirit.

No, that would be allowed. Please, if you want to form an opinion about patents, you should first learn what they are and what they protect.


To be clear on your application point, you could use the same algorithm (theorem) for a different purpose, and it wouldn't infringe the patent.


Does this imply that the PageRank algorithm I wrote for a uni assignment is actually illegal? Or does this fall under "studying it"? Note that I'm from Europe.


Only if you made/used it commercially and also that Google did not decide you can use it

("What rights does a patent provide? A patent owner has the right to decide who may – or may not – use the patented invention for the period in which the invention is protected. In other words, patent protection means that the invention cannot be commercially made, used, distributed, imported, or sold by others without the patent owner's consent.")

And being out of the country, the US patent doesn't apply to you

("Is a patent valid in every country? Patents are territorial rights. In general, the exclusive rights are only applicable in the country or region in which a patent has been filed and granted, in accordance with the law of that country or region.")


Europe doesn't have software patents


Same here(though not a mathematician). Patenting algorithms is so horribly wrong on every level, it's still beyond my comprehension how this could ever be allowed. Some of the patents are ridiculously absurd and luckily rarely enforced. I remember the guy on sci.crypt who had a patent on "cascading ciphers", which covered almost any cipher build out of smaller cryptographic building blocks. It basically concerned almost any stream cipher mode and any attempt of combining more than one building block with XOR somewhere. Luckily, the guy never had the money to enforce it although he might have earned some royalties from doing nothing.

Heck, Siemens used to have a German patent on the Internet, except that it wasn't called Internet but something like "making available of structured textual data representations via long-range data transmission", etc.


> As a mathematician

You don't need to be a mathematician to see that the current system of IP is completely contrary to the nature of the universe. It will be over soon.


I'm curious to know what gives you confidence that it will be over soon?


Partially in jest, once/if Chinese companies start to hold more patents than American companies (as in "value", not numbers) you will see the US attitude versus patents to be different.


You may be correct regarding American response. I'd suggest that alongside that, what we're likely to see is the Chinese government significantly tightening up on IP-rights enforcement because it would then be to their advantage to do so. So a nett result of greater IP-rights enforcement globally rather than the original optimistic/hopeful "it will be over soon" assessment.


Thats quite optimistic.

I'd say that if they steal from the Chinese, they won't talk about it much (or say it was the other way around) and continue just the way it was in case they come up with something patentable again.


Did the British attitude to patents change when German companies started owning more patents than British companies?


Yes. For example the British arranged to cancel all German patents after ww2. This led to eg the development of the Japanese camera industry among many other things.


I fail to see the parallels between Germany after World War II and China in 2019. China's patent situation is much closer to Germany in the late 19th century and the British didn't throw away their patent system in the 19th century just because Germany patented more.


Yes, when those patents started pointing guns at British interests :)


Not a patent example, but there's a reason "asprin" is a generic term in the US.


There is nothing in the nature of the universe that's contrary to trade agreements among people, of which patents are one example. You could say that some agreements are unenforceable under current conditions or perhaps undesirable or even silly, but unless people agree to violate the laws of physics, the universe is agnostic to human contracts. As I wrote in another comment, patents do not protect some truth -- actually, their entire purpose was to help spread truths -- just applications (and human ones, not natural ones).


contrary to the nature of the universe? That seems a few levels lower than merely unworkable, I mean a penny falling slower than a bowling ball is contrary to the nature of the universe, I don't think patents contravene anything as primary as gravity.


You can't patent math theorems because they are not considered inventions, but rather discovering a logical laws of the universe. I treat algorithms in a similar manner.


Hyperbole | /hʌɪˈpəːbəli/ | noun

Exaggerated statements or claims not meant to be taken literally.

— The Oxford Dictionary


The current IP system is pretty decent.... Just all time limits should be adjusted to 1 year.

If I invent something, a 1 year headstart in the market should be plenty of reward.


Seems short, especially for low-yield ip like music. 10-20 is probably good


They are talking about patents here. It has nothing to do with copyright which concerns music and which is perfectly alright.

It certainly does not help that they use the ambiguous term IP that does not really mean anything.


If copyright only lasted a year, you can bet the industry would move to much shorter release cycles and still capture most of the value.


Actually research would move in the direction of things whose value can be captured within a year. The patent system shapes the IP industry as whole.


> It will be over soon

The universe or the patent system?


I am not super pro-patent, but I don't think it's totally wild to patent an algorithm.

Algorithms can be extremely hard to discover and extremely valuable. Together these facts are why the patent system can usefully apply, same as traditional patent classes.

And lots of traditional types of patents are "algorithms" too, really.

It's not that different from, say, patenting a drug. The algorithm to manufacture [insert expensive drug here] probably isn't that hard to follow, but it was very hard to discover, and we want to encourage more people to discover more algorithms like this.

The real issue is that the people in the court system can't understand the different between trivial, straightforward, somewhat advanced, and truly sophisticated algorithms. Also, since you don't need to risk health and life to test them, computer algorithms are typically easier to test than drugs, so it's likely that few-to-none of them satisfy the "difficult to invent" standard that drug molecules do. It costs like a billion dollars to develop a drug; PageRank was made by a few guys in a room in a matter of months.

As another interesting point of comparison, a lot of old weird gun designs from the late 19th/early 20th century were made to get around patents. It's dumb to make a "blow-forward" pistol but there was a time when the straightforward "blow-back" system was patented. Whole countries used quite weird and suboptimal weapons for a long time simply to avoid patent entanglements. I'm sure to them it seemed similar - how can you patent putting these 2 pieces of metal together in this shape? These systems were not very complicated.


I totally cannot bend my mind around this. It is irrelevant if patents on algorithms are useful and good for society in some cases. They are so much self-evidently wrong that the point is moot. The fact that "trivial" algorithms can be patented is a different question that I am not addressing. I am talking only about the non-trivial algorithms that require a stroke of genius.

Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series? All of these are valuable ideas that are non-trivial to come by. The same argument for software patents applies to each of these cases, and in all these cases it is evident (to me) that these mathematical constructions cannot be patented, and that society cannot grant monopoly of their usage to their discoverers.


> Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series? All of these are valuable ideas that are non-trivial to come by. The same argument for software patents applies to each of these cases, and in all these cases it is evident (to me) that these mathematical constructions cannot be patented, and that society cannot grant monopoly of their usage to their discoverers.

You are forgetting that patents are only granted for limited time. If we applied the current system, those things would be patented only for 20 years after their discovery. Would it still be unimaginable?

Maybe your opinion is still the same but the situation is not as clear-cut as it seems. For example, I would guess that if patents were cancelled the industrial budgets for research would somewhat decrease, because there would be smaller advantage in figuring things out first.


Patents should be understood as an alternative to trade secret.

The original reason for patents is enabling more information to become a public. When the patent system did not exist, the only way to protect inventions was to keep them secret. Patent is way to separate information from the economic use of the information. Without patents you must protect the information if you want exclusivity.

Imagine that some extremely important and new algorithm is not patented but kept a secret instead. Today it's possible use provide software as a service and keep the algorithm within protected servers.

> cannot grant monopoly of their usage to their discoverers.

Current patent system is too restrictive and should be reformed. Instead of monopoly, it would be better to have mandatory licensing for algorithms for short period. For example 5 years with some fancy auctioning system to discover the correct price.


Patents on algorithms are bad because any sane person would feel revulsion at the though of them? Is that the content of your argument?

If Euclid's algorithm was patented in a system similar to our modern one, he would have... cornered the marked in ancient Greek cryptographic protocols for 20 years or so before other ancient mathematicians were allowed join in?


>Imagine that Euclid's algorithm was patented? Or Pythagoras theorem? Or the definition of exterior derivative? Or the expression of a function as Fourier series?

This is an interesting point and I'm not sure it cuts the way you want it to.

The world had to wait thousands of years for each of these algorithms to be developed. There are broad reasons for this, but one of them was probably the fact that nobody had any incentive to develop them. Mankind literally had to wait around for some random nerd to come up with these in his spare time.

Now imagine that they were patentable and exploitable for money (for 20 years). Perhaps they would have been developed centuries or millennia earlier. You would have teams of mathematicians from 1000BC onward, working to try to make money inventing theorems and machines and medicines.

20 years under patent seems like a small cost to pay to have something invented possibly thousands of years earlier (or, not kept under trade secret, as e.g. Damascus steel was).


While I understand your point of view (though I believe patents that last for a couple decades wouldn’t have ruined mathematics), the strongest point, at least to me, of the argument you’re responding to, was drugs.

Drugs, according to the comment, are like an algorithm, with a simple set of inputs— hard to discover these inputs, but then easy to replicate. Yet we seem to feel differently about patenting drugs?

Do you have a response to that part?


> Imagine that Euclid's algorithm was patented...

As a mathematician with a patented algorithm / open source advocate... I'm extremely squeamish about this and not super comfortable with it. On the other hand... Cardano's formula was once a trade secret. He's dead and we know the trick now.

As others have pointed out, patents expire in 20 years or less. Nothing is stopping you from building on somebody else's patent (popular among trolls: a patent on applying your patent in a novel area, or a more optimized version, a dual version, etc, of your patent). In that situation you can't use your own patent without paying them... and neither can they use yours. Implementation might be a problem, but you can still prove your theorems.


Euclid and Pythagoras are not, and cannot, be patented. So it seems somewhat illogical to argue against patents with the hypothetical of those algorithms being patented.


I wonder if some real algorithms were ever patented? I mean, just read Knuth, there are brilliant algorithms there that I would never invent myself. Various sorting algorithms, graph algorithms and so on. They are a foundation for our computing. But I never heard that quick sort was ever patented. Why is that?


> But I never heard that quick sort was ever patented. Why is that?

Quick sort is from a time before the USA decided that algorithms could be patented. In fact, most foundational algorithms of computer science are from that time. IMO, the development of computer science would have been hampered for a couple of decades had it been possible to patent algorithms back then; we're very lucky that this wasn't the case.


Latent Semantic Indexing (LSI) - used in Inforation Retrieval systems - is one example of a patented algorithm.

https://en.wikipedia.org/wiki/Latent_semantic_analysis

http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO1&Sect2=H...


There are a lot if algorithms that are patented that stop people doing stuff, eg in cryptography, mp3, erasure codes etc.


Example: the Lempel-Ziv-Welsh algorithm, used in gif files, was patented by Unisys.


"Algorithms can be extremely hard to discover"

Can be, most aren't. Most are specializations of a well known thing, applied to a new area. Page rank is (maybe) a good example of this. Not quite old enough to transfer myself back to the time, but diffusion on graphs and power methods have been around for a while.


I run into this scenario a lot as a freelance developer.

Companies often request that any code you write that's a part of their contract work must not leave that project and they own all IP rights to every line of code you deliver.

Which isn't reasonable most of the time because technically if you imported a library from Python's standard library you can't import that library again in another project without violating the contract. Or if you pasted in a snippet of code from a library's documentation, since you did it on their time, you can't use it again on another project.

I always do my best to get things reworded to differentiate generic code from business / trade secrets code, and then for the generic code they either license the code from me as MIT or I license the code from them as MIT.


Would you, as a mathematician, prefer a world where innovative theorems become covered in secrecy if it means such things couldn't be patented anymore?


I do not understand what you are talking about.

In our world, you cannot patent theorems but people publish them anyway. Why? for the honor of human spirit, or to impress chicks, whatever.


> In our world, you cannot patent theorems but people publish them anyway. Why?

You'll never have enough information to know what theorems went unpublished. You can only speak to the probability based on incentives. Public institutions and Universities may be incentivized to publish theorems, but that same incentive doesn't exist in corporations. A valuable money making theorem that could give competitors and advantage would never be published unless a patent could be formed using the technique that circumvents the limitations by including other processes.

Even with the existing patent system, trade secrets still exist: WD-40, Coca-Cola, Twinkies...etc. When it comes to money, the default is to preserve.


I wouldn't take it for granted that people publish. Many mathematicians work on wall street, and they would never publish something they came up with if it wasnt 10000% certain the theorem isnt a commercial secret. Money wins all, especially in mathematics half of which was invented to make counting money easier.


Yes, of course. Not all known theorems are published immediately. What is the problem with that?

Hiding a truth you have discovered is OK.

"Owning" a truth that you have told everyone is absurd.


Owning a truth for a limited period of time in return for telling everyone instead of keeping it a secret is not absurd. Society would rather you tell people so people can build on it (at least in noncommercial ways, or via license) while you own it than keep it secret, so we created the patent system. The alternative could easily be trade secrets until you die and never reveal it, and as a society we decided to incentivize you to not do that


I think what he's referring to is that you have to publish your invention to get it patented. The idea of patents was basically to get a temporary monopoly for publishing the secret sauce so that anyone can use your invention after the protection period is over.


Companies have mostly stopped patenting computer algorithms outside of narrow areas where there are strong interoperability requirements e.g. video codecs. Few algorithm are practically enforceable as patents.

I think you seriously overestimate the incentives to publish. In several computer science domains, and certainly the ones I work in, the academically published algorithms are often a decade or two behind the state-of-the-art that is never published. Valuable algorithm advances are often explicitly treated as trade secrets. As an equally common case, the inventor(s) simply have zero incentive, either personal or financial, to spend their time publishing it -- they did it to solve a problem, not to publish, so they prioritize spending time with their family etc. This disadvantages open source software, and academia spends much of its time reinventing algorithms already known in the industry. In my experience, surprisingly little hardcore algorithm R&D happens in academia, so any model of information dissemination that makes this assumption is going to be suboptimal.

As an example that is unrelated to my current work, I developed a set of novel algorithms that massively improved the efficient parallelization of graph traversals in 2009 -- a true step function in both scalability and throughput per CPU (I was working on supercomputers at the time). Fast forward a decade to 2019 and these algorithms still haven't shown up in academic literature even though people built systems based on them and they are superior to what is currently in the literature. In this case, the algorithms are not even secret. I've also learned some brilliant and as yet unpublished algorithms of unknown origin via these same oral traditions over the years. As a social dynamic, it feels inappropriate to publish an algorithm that you learned this way.

This is a challenging problem to solve. Companies spend serious money on algorithm R&D hoping to obtain a commercial advantage. Outside of that, publishing is often an unattractive use of one's personal time if you are not an academic. This reality disservices the software community at large and I'd like to arrive at a better solution, even though the current reality benefits me greatly as an insider who sees loads of amazing, unpublished computer science.


I think the only sane answer would be a clearcut Yes, but the question is also highly misleading by insinuating that software patents actually cover innovative theorems. The vast majority of them don't.

Austin Meyer, the maker of the X-Plane flight simulator, was sued for millions of dollars because his app used an in-app purchase option made available by an existing 3rd part SDK.

There is a whole industry of inventing bogus software patents that take utterly trivial everyday business processes like giving someone money for a product, dress them up in fancy lawyer talk and general descriptions of non-existent, "systems" or "methods" without any example or prototype, and then sue people over it. In-app purchases? Better pay up! You have a fax machine or scanner? Pay! Storing business data on some electronic device for later retrieval? Better pay an arbitrarily high patent license fee!

It's absurd.


That is a business method patent, not an algorithm patent. They are completely unrelated.


I also gave an example of an equally silly software patent. Besides, I would not call different sorts of patents "completely unrelated". All of them have trolls, have been granted for silly and trivial things, and the system is completely broken. It´s bizarre enough that there are judges and lawyers who apparently think that "business logic" and "algorithms", as well as "mathematics" and "logic", are different entities that can be distinguished from each other...


The majority of patents are for things that are essentially obvious and would have been replicated if not publicly released. If you put a bit of hardware or software out then people will reverse engineer it and know how it works pretty soon. If you don't put it out then it will be difficult to get any benefit from it.


False dilemma.

There are multiple alternatives.


Ah, but you’re not patenting the theorem!

Remember what happens when drug companies re-discover a useful molecule that turns out to have “prior art” as a molecule that people have been consuming in some every-day herb or some-such. They can’t patent that molecule (because they didn’t invent it), and there’s no profit motive to go through FDA compliance for a non-patented drug.

But, there’s nothing stopping them from finding the “theorem” behind the “algorithm”—figuring out what it is about the molecule’s structure that makes it have the effect it has—and then discovering another molecule in the same class (another “algorithm” constructively proving the same “theorem”), and then patenting that.

Same is true for actual algorithms: if PageRank is patented, I can just look at the theorem behind it—efficient eigenvalue derivation—and then come up with a different constructive proof of it. It’s easy once you know it’s possible. And, because there are so many known isomorphisms between algorithms (e.g. between algorithms on pointer machines vs. RAM word machines) there are often “obvious” transformations of the algorithm that aren’t considered the same algorithm from the patent office’s POV. (And, I mean, technically they aren’t; they might have a very slightly higher time-complexity, by a factor of the inverse Ackermann function or something. But these are things that don’t matter in practice, just like the random extra bits that the drug companies tack onto their molecules don’t matter in practice.)


See: Johnson & Johnson/Janssen Pharmaceutica esketamine patent. [0]

The FDA just approved the use of the drug, [1] which is basically a ketamine molecule cut in half (ketamine is a racemic molecule), [2] and J&J is selling it for HUNDREDS of dollars per treatment, while regular Ketalar brand ketamine is damn-near as cheap as saline.

[0] https://patents.justia.com/patent/20140093592

[1] https://www.npr.org/sections/health-shots/2019/03/05/7005099...

[2] Arketamine (R(-) - isomer) https://upload.wikimedia.org/wikipedia/commons/thumb/d/df/Ar... Esketamine (S(+) - isomer) https://upload.wikimedia.org/wikipedia/commons/thumb/a/a5/Es...


Whether or not an invention is really covered by a current patent is not the issue; the issue is that patent holders can keep anyone who tries to exploit an invention in perpetual legal jeopardy, by continually filing patents on _related_ inventions.


I have helped patent my work for my company against bonuses and it's a source of personal shame even though most people I tell this seem to think it's very honorable.


It makes a bit more sense if, like the patent office, you view it through the lens of "subject matter directly reducible to an electronic circuit design". Algorithms that meet this criterion are, by necessity, patentable in every country that allows patenting of electronic circuit designs. Most theorems in mathematics are not like this.

(This is a deep rabbit hole. All mechanical patents have the same theoretical equivalence to computer algorithms, though less obvious. The lines that define patentable subject matter do not have a rigorous basis, it is an arbitrary convention.)

This is also why so-called business method patents (i.e. doing X, but on the Internet), which are often improperly conflated with algorithm patents under the rubric of "software patents", are generally not patentable.


Every patent and even every technology is basically an algorithm. And they are valuable because they are true (not sure how mathematical truth is different from any other type of truth).


agreed ! Also their is a lot of hard work and engineering in offering the solution as well... Just knowing the algorithm wont help.


I think I get what you're saying. It's like patenting the theory of gravity. You're just describing some intrinsic logical fact about this universe.


There's a lot of discussion I see about PageRank for CS purposes, so I'd like to give a slightly different perspective most people haven't heard about.

A colleague of mine at CalTech recently used the PageRank Algorithm to model the evolution of in-vivo neural networks [0, page 8]. The concept is pretty good for modeling extensions of simple Hebbian learning and explaining some of the ensemble dynamics (at least in the hippocampus). If you're interested in further reading, there's more work on attractor states in biological learning and associative memory (some of which is cited in the paper).

Brief overview: there's debate on whether brain network dynamics are stable or unstable. In networks related to learning and over the timespan of weeks, this experiment observed that ensemble-level dynamic change during learning but some neurons are remarkably stable in how/when they fire. You can rank stability with methods like PageRank, suggesting that connectivity implies importance (and perhaps stability).

[0] https://www.biorxiv.org/content/biorxiv/early/2019/03/07/559...


Are you sure it is PageRank specifically? These fof maps were used and investigated as far back as the early 1900s.

Search for "the page rank book" for a deeper analysis.


He used a customized version of the PageRank algorithm that comes with MATLAB. The paper is pretty dense with other interesting observations. I think one of the novel contributions is experimental evidence that centrality is important to learning distinct sequences of behaviors.

Consider three behaviors: moving right on a linear track, turning around, and moving left. A reinforcement signal could be sugar water. If you place sugar water at the right end of the track the mouse learns the sequence of moving right, drinking, turning around, and moving left to leave. Now, in the hippocampus different ensembles of time and place neurons are correlated with each distinct activity. The inter-ensemble connectivity has to be learned, as the sequence of actions become correlated not just in behavior but in the brain as well. The neurons that most strongly inhabit inter-ensemble connectivity, tend to be those 'stable' and 'important' anchor neurons.

Yes, it has been studied before that some neurons are more important than others. But the critical extension here is to long-term stability and learning on an unprecedented time-scale and quantity of simultaneously recorded neurons. The actual microscope itself, is a significant technological advancement in how it stays robust to long- and short-term motion artifacts and was custom designed and built by the first-author.

There's also the experiment done on dynamics after traumatic damage to the neural circuits encoding a learned behavior (this is one of the specialty of the Lois group). But that starts deviating from the observation I wanted to make relating to the thread.

The conclusion: "Overall, our findings suggest a model where the patterns of activity of individual neurons gradually change over time while the activity of groups of synchronously active neurons ensures the persistence of representations."

Seems fairly innocuous, but I'm almost certain it will be controversial to some parties.


For extra context: back then you would manually submit your site to yahoo and dmoz to end up in their results. They saw themselves as directories.

Google was all about crawling and building up the biggest dataset going.

Both approaches were victim to keyword stuffing (lots of keywords at the bottom of the page and if you were lucky it was in a marquee tag).

Pagerank was a pretty decent extra value with a relevance score to promote trust worthy sites. However there were similar techniques like hubs and authorities from kleinberg.

On a side note his old research students / postdocs ended up leading key initiatives at FB newsfeed and Pinterest discovery.


Not quite: manually assembled search directories like the old Yahoo! were already passé before Google came out. The increasing size of the Web and the arrival of AltaVista had already made automatically-indexed search engines an established thing. The problem was simply that the AltaVista results were overwhelmed with spam. Early on Google search results were very well-ranked, but quite narrow since the engine only crawled and indexed a relatively small proportion of the Web. AltaVista's coverage was much better, iirc, and Google's limited scope was often remarked on in places like Slashdot.


Altavista and others beside were already crawling the web.

Yahoo and Dmoz were curated but Google definitely wasn't the first crawler.

As for the approaches being victim to keyword stuffing: that was because the algorithms used were exclusively 'on page' without assigning a value to links.


Yahoo's content was mostly US based if i remember correctly. The reason i switched to Google from Altavista was because of the reduced ads and clean look on the page. The results where about the same back then.


IIRC speed played a huge role too in part because of the cleaner page, but also in part because of whatever voodoo they used to deliver results.


The use of 'near' queries on Altavista removed the spam and had they done some basic query rewriting it would have cleaned much of the spam up. Spam never affected me on Altavista.


Yahoo's directory and dmoz seem orthogonal to Google. Not even in 1994, long before Google and when aptly-named WebCrawler was all the rage, would one turn to Yahoo (dmoz did not exist at the time) to find what they were looking for if they had something specific to look for. Yahoo served as a place to go when you weren't looking for anything in particular and wanted to simply explore the web.

There was a time when Yahoo also tried to get into the indexed search space, but never seemed to be a viable competitor against other players in the market. Once Google established their dominance, all bets were completely off.


> his old research students

Who’s the subject of this sentence? I didn’t know PageRank had academic descendants.

What’s the current research on PageRank like? I looked at Sergey Brin’s academic page a couple months back and was surprised that people still work on nearest neighbors now and back then.


Give networks crowds and markets a glance. It's a fun book. https://www.cs.cornell.edu/home/kleinber/networks-book/netwo...

But thats from Kleinberg - the person who I was referring to :-)

After that there is personalised page rank / Salsa which are probably the more widely known approaches to identifying trust worthy nodes in a graph.


To my knowledge, PR has long been succeeded by more sophisticated models. At one time, perhaps even now, Google had a team of mathematicians constantly tuning its ranking algorithm.

PR itself might be the foundation, but it definitely wouldn't be enough to build another Google scale system.


Links are still the single most important factor for ranking, though. I mean, there's a lot of other stuff going on, and the content/information extraction has advanced massively since the early days, but PR (or a similar concept) seems to still be the biggest part of determining how relevant a page is.


Atleast the future is not links. With the move towards reactive js based clients, restful api based services, links hardly have the requisite data for indexing. Even page content is extremely fickle to index.

Google must be having a js engine as part of its web indexing process.


They do, and have been using it for quite some time. Not only will they execute JS, they will also do AJAX requests and index the content that is returned.

Since the overwhelming majority of the web is still static, I believe links will be fine for the foreseeable future.


How do you know this?


It's not a secret, they've never said that it changed.

Also, I do a lot of work for SEO/Affiliates. The exact same tech with pretty much the same content ranks top 2 for highly competitive keywords when hosted on a very high PR domain, and top 100-1000 when hosted on a normal, on-topic domain. This is consistent over multiple content areas, and it's why the large media companies have begun to sell/rent folders/subdomains on their site. Instant ranking (until somebody else buys access to a larger media company), no risk, as Google doesn't consider "paid publishing" as against their guidelines.


So in other words: you don't know.


It's a unique milestone to see something so integral to the development of the internet as we now interact with it become expired. It's also telling that we might need to rethink the duration of the patent system as a whole because so much can change within 20 years.


It just makes me feel old, like the internet is last generation's stuff now.


It is. Aside from looking up information and ordering things, the net is almost completely useless to me now. Nothing is discoverable any more.


The Internet does not feel much like a wild place where you might encounter anything anymore.

On the other hand, reality itself is much faster paced because of the Internet.

You can go into the world see, try, learn, experience, be hurt, recover and do it all again at a rate never before possible.

So real discovery, discovery of what it is to be alive is more accessible than ever.


But it is no longer that useful anyway. It is probably of more value to Google as a legend for recruiting purpose other than an important technological asset or advantage.


Cue all the whiteboard interviewers: “implement the page rank algorithm, please.”


Followed by “what is the time and space complexity?”


tangent: if anyone wants to read a good non-fiction book about the history of steam, invention, the patent system, etc -- i can recommend William Rosen's book "The Most Powerful Idea in the World: A Story of Steam, Industry, and Invention"


It's too broad to say all algorithms should not be patentable. Amazon's one click buy and page rank are obvious to "those skilled in the arts" but there are plenty of algorithms that should be patentable. The problem is that the uspto doesn't properly get an expert consensus on obviousness.


I bet the purpose of this patent was to check the investor box of “yes we have a patent,” they’ve never had enforce it, and competitors were not really discouraged by it. Thoughts?


Patents can be defensive. Someone else could had patented it and used it against them. Also (subjective opinion), they were kids: They thought patenting it was important, like they thought asking Yahoo for a mere $1M to buy it was a great deal.


Slightly off topic, but has anyone seen a drop in traffic even with the same rankings? Too many of the keywords I used to rank for now have featured snippets. These snippets basically mean that any result after #2 get little to no traffic

It used to be that beyond page 1 was a waste, but I reckon now if you're not #1 or the featured snippet, you don't really stand to gain much in terms of traffic


Would be a difficult patent to enforce against a competitor since it would be difficult to tell if a backend is actually implementing this but I suppose it's in their interest to still file for it.


Page Rank, Map Reduce... those infamous tools that once game changing. But now, I think there are many better ones, just less well known


Pretty amazing that you can patent an idea as simple as multiplying a matrix by itself N times.


I realize that not everyone understands how patents work but this is ridiculous. The patent claims don't mention matrices. Any implementation (like matrices) is merely an embodiment (you can implement the patent without matrices).

And even if that weren't true, the foundation of the patent system is applying existing techniques to new applications. The background section of the patent clearly details how this technique has been used in other applications.

I don't dispute that a lot of patents are trash but this is possibly the most important patent of the last 30 years. That doesn't mean it invented computers, mathematics and the internet, it just put some already good ideas together. That is what invention is.


I realize that not everyone understands how matrices work, but this is ridiculous. The abstract of the patent:

"... the rank of a document is calculated from a constant representing the probability that a browser through the database will randomly jump to the document."

And the specific claim is:

"Looked at another way, the importance of a page is directly related to the steady-state probability that a random web surfer ends up at the page after following a large number of links."

This _explicitly_ describes a Markov Chain, which is naturally represented by a matrix. A variety of versions of the linear equation are explicitly given in the patent.

To claim you can implement the patent without matrices is, for all intents and purposes, wrong. You can implement the same equations in a variety of ways, but they are still matrix equations.

They patented the idea to apply random walks to ranking webpages. That's arguably reasonably novel, though Wikipedia lists a number of predecessors. But it was also an inevitable invention, because there is a large number of people familiar with Random Walks/Markov Processes, they are routinely taught to undergrads, and are used to model and analyse a vast number of processes [1].

[1] https://en.wikipedia.org/wiki/Markov_chain#Applications


What you quoted is not what the patent “claims”. The claims are the numbered points in the section “Claims”. They are the only legally enforced section of a patent and they are written in a very specific language. Everything else is background or embodiment and has little to no legal value.


> but this is possibly the most important patent of the last 30 years

It is an interesting struggle to figure out what my objection to that point is. I think it is that we know exactly how hard it is to apply linear algebra to a problem - not everyone's cup of tea, but easily 10% of software engineers would be able to do it.

The truly groundbreaking part of Google was never the indexing - that was a problem that was going to get solved one way or another. The groundbreaking part is figuring out that search + low latency + advertising is a money printing machine and that tech favours the winner.

The mechanism to achieve search + low latency + advertising is important but to some degree unimpressive. If the other search engines at the time had realised the payoffs and how important latency was they'd have gone short text-only ads too and put more engineering time into the problem - maybe someone other than Google would be the search engine of the day.

And even if PageRank was the difference between Google and a hypothetical runner up, the difference of a better algorithm would be marginal. Decisive, but ultimately marginal.


Back then google loaded in a few seconds whilst the competition could take over a minute for the 99% of users who were on a 28.8k dial-up.

It's almost as if the people behind Lycos/Excite/Altavista were all using the internet via a T1 connection from their unis..


I think a big factor is that Google didn’t have to make any money and could keep their homepage as simple as possible.


Altavista was actually just a tech demo of their servers :)


And Google had connections to which large organizations from their start that gave them competitive advantage in multiple areas?


Stationary distributions (what pagerank is) were used for relevancy of scientific references at least 20 years before the pagerank patent - I was sitting in a lecture in 95 or so about the Perron Frobenius theorem, and this was given as an (old but not very old) example of an application at the time.


Sure, but the novelty of the patent is The Random Surfer Model. That is, applying that math to the ranking of web pages. The novelty is looking at the problem from the right perspective. After you have read the paper, and seen it demonstrated to work, the "invention" is very obvious. But before that, it really isn't.


That lecture I was in described "The random waterfall model", in which you find a scientific paper, randomly pick one of the references, go to it, and continue -- and IIRC, at a small percentage, jump to a random other paper. The professor was not describing his own work, but one that was published a decade or three before.

As far as I can tell (and could tell the day I heard PageRank described a few years later), there was no difference between that and PageRank, although there is a huge practical difference in that scientific papers can only ever refer to those that were published before them (or at least were in preparation at the time), whereas web pages are edited and can point to any other web page.

The "reference rank" application is not a DAG because of the "in prepataion" links, although it is not very far - so the "jump to random paper" is much more important to produce a useful stationary distribution than in PageRank - but it is otherwise the same.

Page and Brin did a lot of interesting things, many of which weren't trivial, and were hugely rewarded for that by society. But PageRank was an application of an old idea to new medium, not a new idea - in a way that (on its face) should not deserve patent protection.

I remember Google's first days - the main selling point for the majority of people I knew was not "it finds what I want when other search engines dont" - people had learned to direct AltaVista properly more or less. The selling point was "It gives an answer in milliseconds insteads of tens of seconds". In fact, I remember complaints because it lacked the "and/or/not/near" and other features that AltaVista and Lycos already had.


You have summarized everything superbly.


> The patent claims don’t mention matrices

What? That’s the entire idea of the patent: using repeated matrix multiplication to compute the relative “importance” of various nodes in a directed graph.

> you can implement the patent without matrices

How?


The idea sure was brilliant at the time. But I really doubt that we (as a society) have gained anything by allowing this to be protected. It might have prevented some healthy competition. Certainly anyone thinking about search engines (once this became a thing) would have thought of this. And I doubt that nobody would have bothered to create a search engine just because it couldn't be patented.


Well it seems much better when you compare it to patenting of rounded corners.


Or how about performing an action on the World Wide Web by clicking a button?

https://en.wikipedia.org/wiki/1-Click


To be fair, double-clicking was always the default for making a selection in a window... /s


It's how the US patent system works. Basically it's almost like copyright: you can patent virtually anything that hasn't been patented yet until challenged in court as either trivial or not original (i.e. prior art exists). Essentially the US patent system achieves two goals: shift the burden of verifications and validations to competitors and courts; and secondly, provide the US some advantage on the international level via the Patent Cooperation Treaty. Any crappy patent may turn out to be important, so just let them register everything and see which one "sticks".


Considering there were search engines before Google, the idea apparently wasn’t quite as obvious.


Simple does not necessarily imply obvious. The chain rule is another simple application of matrix multiplication that most people couldn’t have come up with independently.


Well some companies patented one click to order. Hard to beat.


It's the application to search that was patented. There were plenty prior publications on eigenvalue centrality.


If we’re talking about finding the steady / convergence state of a matrix, there are better ways than simple repeated matrix multiplication. One would be to solve a linear system of equations. The one PageRank uses if I recall correctly is the power method for eigenvalues.


applied exponentiation


person, the question actually is which matrix that is


Is there any page rank library launched on Github? Can exgooglers make such libraries and raise themsleves to open source fame?


It's been done many times, and it's easy to implement for small graphs.

Example: https://networkx.github.io/documentation/networkx-1.10/refer...


As mentioned, the difficulty with PageRank isn’t the implementation, but scaling it to billions of webpages. Doing so would require either extensive knowledge of systems or approximation methods.


There are plenty of implementations and how-to’s Out there already.

If I remember correctly it is basically a way to calculate a random walk through a network without having to do the walk.


Apache Spark has had an implementation of it for a few years as part of its batteries included. https://spark.apache.org/docs/1.6.1/api/java/org/apache/spar...


White page that would lod fast when everyone had 56kb was key


They should have put some round ears on it...


Maybe they should revert back to PR to yield better Search results


The Google results started going bad for me around when project humming bird came along. We need a distributed open data alternative that can be tweaked for your proferences transparently.

And we also need to start paying for the internet. The malvertising model we have now is unsustainable and crippling the network in favour of Facebook and Google.


> We need a distributed open data alternative that can be tweaked for your proferences transparently

Wonder if some organization like ACLU or the like take up such a project and release a paid version for power users.


I would pay for a personalized search engine. I think if the results are ad free I'd pay 10-15 per year (not much since Google is decent).


What's stopping you from doing it?


So I am not the only one that get pretty bad search results compared to the ones I used to get. What caused this shift?


The problem with this is that your definition of 'good results' and Google's definition of 'good results' are somewhat in conflict since you have different goals.


The current algorithm includes PageRank as its component.


Turning off the recaptcha harrassment of people trying to search repeatedly with refined terms would be a good start. But hey, now you can type a query while piss drunk and Google will understand, progress!


So what now? Someone copies PageRank?


Nothing happens. Similar ranking systems are already in use and there's much more to a modern search engine anyway. It's just a small quirk of this very particular approach no longer being fully protected.


Nothing. Everybody freely has copied PageRank already. It has became one of the basic teaching material in graph theory.

For example, NetworkX, a popular graph library in Python, implements PageRank. [1]

[1] https://networkx.github.io/documentation/networkx-1.10/refer...


Years back, graph dbs bragged about implementing PR in a few lines and ecologists using PR on species.

https://neo4j.com/docs/graph-algorithms/current/algorithms/p...

https://phys.org/news/2009-09-web-page-algorithm-critical-sp...


How does that work in the real world - do google simply not care, or it's "fair use" somehow?


The patent is only for web search, so other applications dont infringe the patent.


First sentence of the abstract of the patent:

"A method assigns importance ranks to nodes in a linked database, such as any database of documents containing citations, the world wide web or any other hypermedia database."


They'd only care if money gained - PR lost == profit.


We already have Open Pagerank based on the Common Crawl.


Maybe Stanford University no longer gets any license fees from Google for this patent. not sure


Hopefully everyone deprecates everything similar and we return to content based search, where pinterest and there rest freeze in the depth of hell.


> return to content based search

It'll need to be a pretty sophisticated version of "content based search", or it'll just be overwhelmed by keyword stuffing and garbage auto-generated content.


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html

Edit: bad call. Detached from https://news.ycombinator.com/item?id=20067782 and marked off-topic.


Sorry dang, I respect your moderation a lot but I have to disagree here that my comment is a shallow dismissal that can’t teach anyone anything.

PageRank really is a very simple idea based on elementary linear algebra, a fact many people might not have known. Thus my comment could inspire a curious person to go read more about how PageRank works instead of fearing that there is a Ph.D worth of prerequisites.

Furthermore, it is a relevant comment on the US patent system.

By the way, I think PageRank is an incredibly important development in the history of technology, and took a fair bit of ingenuity to think up. I’m not dismissive of it at all. But it is also very simple, which doesn’t contradict any of the above. And I don’t think it should be patentable, just like I don’t think other simple and intuitive applications of matrix multiplication (like the chain rule from calculus, for example) should be patentable either.


I'm persuaded. Sorry!


Here's the raw truth that no one in the valley will admit.

It's about two things:

1. Execution.

2. Luck.

... but it's mostly execution.

If you have a BRILLIANT idea but can't execute you're dead.

There are tons of companies in the history books as examples.

BUT... if you can EXECUTE, you can take a shitty idea, abandon it, pivot, measure, and focus BACK on a good idea. Then when you have a good idea you can keep executing it forward.

The rest of luck but this can be worked on as luck is often timing + being prepared.


I don’t think people don’t admit it, it is just too often people think “they are special”.

Admittedly, that is a critical part of being a founder/entrepreneur, otherwise knowing just how “stacked against you” the real world is would make you go running for the hills.

That said, I absolutely agree it is always about execution, with luck.

But to echo someone else’s comment: luck really is about opportunity + preparedness/ability to recognize a true opportunity.

For an analogy on the luck part: in poker, everyone is dealt the same hands in the long run. It is about being able to read both the cards and opponents effectively to routinely “win”.

On the execution part, it is all about focus, learning to know what is important and what really isn’t. Time spent on anything that doesn’t really matter slows you down. Of course the hard part is knowing what matters and what doesn’t. My simple advise here is to ask yourself two questions: What am I not getting done by doing this? What is more important, this or what I am giving up?

One word of caution on the above: understand what is urgent, and what is important, and don’t let things that are important but not urgent slide.


I wonder where you got the idea that no one will admit this. Execution is the number one priority of founders of early stage companies: finding people who can quickly turn their idea into a working product they can iterate on.


I'm suspecting it's the 'luck' part the OP thinks people won't/don't admit (or acknowledge enough, perhaps).


Yes. And even in this thread people are minimizing it.


I strongly disagree that execution matters more than luck. I ask myself would I rather be good or lucky, and I will always choose luck. Because no matter how good you are, if luck isn't on your side you're going to lose. You might be the better programmer, but you still didn't get the job. You might have the better proposal, but you still lost the bid. You might be the better candidate, but you still lost the election. You might have a better product, but your company still failed. You might eat right and exercise, but you still got cancer.

I see this regularly play out when founders who found success on their first startups attempt to repeat it, only to fail miserably. Granted, there are a handful who manage to duplicate their success, but I know that some of these are simply due to who you know from their first success (you could argue that's better execution, but the number of unicorns I know that wouldn't exist if it weren't for a lucky break from an investor previously befriended makes me think otherwise). I do not mean to belittle the hard work and brilliance of many successful founders, only to emphasize that luck was absolutely critical in nearly every success story.

Obviously it's better to have both, but pretending that you can outmaneuver the universe is an act of hubris. I think people pontificating that execution matters more than luck are too arrogant to realize how lucky they are, or want to believe it matters more because it's reassuring to believe that we are in control of our destinies.

There are things you can do to increase your exposure to luck, but it's ultimately something beyond your control. The world isn't meritocratic.


To get lucky is something different from being a person who "has luck". The latter doesn't exist. Unless you believe in supernatural things.

Everyone has the same chance to get lucky. You can position yourself to optimize your chances. The question remains if you have the necessary abilities to take advantage of it.


Luck is random. Nobody has luck, but the guy who won the lottery got lucky.


"I am a strong believer in lucks, as I know the harder you work, the more luck you will get".


I don’t remember if it was chess or tennis or something else: “The good player is always lucky.”


Agree on timing and execution, from my own founding experience, being early is as the same as being wrong.

https://biblia.com/bible/niv/Eccles%209.11


You forgot timing, which is critically important.


How is timing different from luck? If you can predict the correct timing, it's just better execution. If you can't predict the correct timing, it's just luck.


Luck means you landed a big client because they just happened to be in litigation with a competitor, or the NYTimes wrote about you because you happened to run into a reporter at the airport, and that’s what cascaded you forwards.

Timing doesn’t have to be luck — it can come from a deep understanding of trends and knowing when the right time to strike is...


If you believe timing isn't luck, then isn't it just a subset of execution?


Execution is about how you solve the problem. Just because it’s solved well doesn’t mean the market is ready to receive it.


I feel like you're talking about executing on the product, which I actually consider to be significantly smaller than executing on the market. Bringing a product to market is almost always a larger task than creating the product in the first place. I consider both of those to be execution.


To me execution is how you build the business. Timing would mean deciding not to do it at all for another 10 years, or that you already missed the opportunity and thus give up before starting, which seems like something completely different. I’m also far from the only one — google “execution luck timing” and read up from others who can possibly getter differentiate for you.


It's hard for me to say how much we agree and are simply not communicating properly, but my main point is that the people giving advice do not properly acknowledge how lucky they were to succeed.

https://xkcd.com/1827/

I largely consider execution to be what you can control, and luck to be what you can't. You can break those down much further, but I really feel those are the important categories.

Anyway, I guess agree to disagree. Cheers.


Success = Idea * Product * Team * Execution * Luck (where luck is any number between 0-10000)


Yeah, but you could say that about the entire universe. It's all execution and randomness.


Because they dont need it anymore.


Back to keywords in meta and h1 then ? /s


That's still table stakes for SEO. Although not sure being in <h1> matters, but doesn't do any harm.


For better or worse, pagerank destroyed the value of inbound links and by extension it killed the web. Now Google, not links determine how your page is found and while I'm sure links still have some value that is only one of a very large number of inputs. I believe it is impossible to come up with some way of ranking web results that does not in the end lead to a destruction of the metric that one uses to rank with, if the metric initially gives great results. Spammers will figure it out and will drown out the signal with noise.


a bit exaggerated, but sadly there's a core of truth to it.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: