Hacker News new | past | comments | ask | show | jobs | submit login

For extra context: back then you would manually submit your site to yahoo and dmoz to end up in their results. They saw themselves as directories.

Google was all about crawling and building up the biggest dataset going.

Both approaches were victim to keyword stuffing (lots of keywords at the bottom of the page and if you were lucky it was in a marquee tag).

Pagerank was a pretty decent extra value with a relevance score to promote trust worthy sites. However there were similar techniques like hubs and authorities from kleinberg.

On a side note his old research students / postdocs ended up leading key initiatives at FB newsfeed and Pinterest discovery.




Not quite: manually assembled search directories like the old Yahoo! were already passé before Google came out. The increasing size of the Web and the arrival of AltaVista had already made automatically-indexed search engines an established thing. The problem was simply that the AltaVista results were overwhelmed with spam. Early on Google search results were very well-ranked, but quite narrow since the engine only crawled and indexed a relatively small proportion of the Web. AltaVista's coverage was much better, iirc, and Google's limited scope was often remarked on in places like Slashdot.


Altavista and others beside were already crawling the web.

Yahoo and Dmoz were curated but Google definitely wasn't the first crawler.

As for the approaches being victim to keyword stuffing: that was because the algorithms used were exclusively 'on page' without assigning a value to links.


Yahoo's content was mostly US based if i remember correctly. The reason i switched to Google from Altavista was because of the reduced ads and clean look on the page. The results where about the same back then.


IIRC speed played a huge role too in part because of the cleaner page, but also in part because of whatever voodoo they used to deliver results.


The use of 'near' queries on Altavista removed the spam and had they done some basic query rewriting it would have cleaned much of the spam up. Spam never affected me on Altavista.


Yahoo's directory and dmoz seem orthogonal to Google. Not even in 1994, long before Google and when aptly-named WebCrawler was all the rage, would one turn to Yahoo (dmoz did not exist at the time) to find what they were looking for if they had something specific to look for. Yahoo served as a place to go when you weren't looking for anything in particular and wanted to simply explore the web.

There was a time when Yahoo also tried to get into the indexed search space, but never seemed to be a viable competitor against other players in the market. Once Google established their dominance, all bets were completely off.


> his old research students

Who’s the subject of this sentence? I didn’t know PageRank had academic descendants.

What’s the current research on PageRank like? I looked at Sergey Brin’s academic page a couple months back and was surprised that people still work on nearest neighbors now and back then.


Give networks crowds and markets a glance. It's a fun book. https://www.cs.cornell.edu/home/kleinber/networks-book/netwo...

But thats from Kleinberg - the person who I was referring to :-)

After that there is personalised page rank / Salsa which are probably the more widely known approaches to identifying trust worthy nodes in a graph.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: