The dominating cost is not hiring smart people to work on the problem, it's hiring enough smart people to work on the problem.
Consider. Here at Microsoft, we employ a search relevance staff of -- what, 20% the size of Google's? And Google's engineers are rock solid. MS might be able to make up a small margin, but our search relevance staff -- good though they are -- cannot compete with a brilliant workforce that is 5 times larger than them. If we are going to actually compete with Google, the problem is not hiring smart people, it's hiring 5 times the current number of smart people employed for this task at MS.
Never mind buying such a team. MS can afford that. How do you even find that many people in 2-3 years?
When you consider the rest of the data, the case for building something like Google really begins to look grim. For example, how much does it really cost to build a search engine? We've poured at least tens of millions of dollars into just search relevance (I'm not even counting infrastructure). That's good mileage, considering that this way is littered with the corpses of companies like Cuil, who made investments in this area and failed miserably. Still, while this has been a great deal for us, we're still not there yet, and it's not clear we will be in the very near future. And so it's worth wondering: if MS can't buy something like that, realistically, who can? DDG? lol no.
In the end, this is the true dominating cost of building a search engine: people capital. Other bottlenecks, like engineering debt, politics, etc. pale against the sheer, awe-inspiring investment of Google in people capital.
I also reject this idea that Google's got a lock on every smart person ever. That they don't have any politics or wasted effort. That they're the snowflake they think they are. This kind of thinking is pathetically misguided, and a huge part of their marketing. Google is not perfect. They are staggeringly weak in the face of real competition (witness Facebook vs. G+ and Apple vs. Android).
Your argument is a replay of the prevailing attitude in the mid nineties regarding operating systems (from the same company, surprise!). And then some college student named Torvalds pushed out the most influential operating system ever, and did it for free, with the help of the world.
Nobody and nothing is invincible, and people who say that a situation is unassailable are always, categorically wrong, given enough time.
* MS is not a good example of a strong engineering org. Further, it is not a good example because Windows sucks.
* Google doesn't employ every smart person ever.
* Google can't actually compete. (ed note: in any field, or just against iOS and Fb?)
* My argument is like the argument that no one would supplant Windows.
* Given enough time, everything dies.
Points 2 and 3 are about Google. Let's start there. You're right that Google doesn't employ every smart person ever, but then, who said they did? :) But there are a limited number of search relevance engineers, and finding enough to keep up with Google is a monumental, and maybe insurmountable feat. If you want to compete with Google, you will need to have a serious advantage that supercedes this. That is just a fact.
Point 3 is about competition. I'll grant you that G+ is no Facebook, but Android is the most widely adopted mobile OS on the planet, and by a huge margin -- iOS is basically not even competitive, except for the top 5% of the market. Further, a billion smartphones will be bought this year, the most of which will be Internet enabled Android devices, and most of which will be bought in developing countries by people coming on the Internet for the first time. You tell me who's forward thinking there, because when that market comes online, it will be huge. The fact that you mention this as being non-competitive indicates to me that you might not know what you're talking about. :(
Point 1 is about the MS org. What can I say, OP, I work here, so maybe I'm not the best person to have this discussion with. But FWIW I chose this place over some much sexier jobs because the team I work with is arguably the best of its type in the world. There are bad neighborhoods, but the disparity between a good team in MS and a good team at Google is basically negligible. Also I think Windows is one of the great engineering feats of CS, so ... :| (Note that I still use UNIX at home.)
Point 4 is probably the result of confusion. I don't think no one can compete with Google. Bing has 20% market share! Clearly we can. But I do think it will be hard to compete with Google on search quality. I don't see how you can argue that.
And point 5 is obviously true but not relevant.
EDIT: Actually I now see your point 1 as saying "MS couldn't pull this off because they're not a good engineering org, but someone else could". Maybe someone else can build a better search engine, but I think what MS has pulled off with Bing is a monumental feat.
For starters, we built the entire Bing stack from scratch. No OSS. No common platforms like the JVM. Nothing like that. We started from nothing, and invented the server infrastructure, the data pipeline, the runtime that would support the site, the ML tools, everything. The fact that the site runs at all is a small miracle, but the site does not "just" run: the most remarkable thing by far is that the quality of our tooling is quite incredible, generally an order of magnitude better than the OSS equivalents. For example, the largest deployment of an OSS NoSQL datastore seems to be a few thousand nodes. The small NoSQL cluster backing our MapReduce implementation is stably deployed on a cluster an order of magnitude larger than this. This is something you only really see at companies like Amazon, Google, or MS.
I understand that the consumer market is not something MS is strong at, but I am hoping this gives you a taste of the scale and quality of what's happening behind the scenes. Happy to talk more about this if you drop me a line or skype me at `mrclemmer` :)
You lost me here. Not building on OSS seems like setting yourself up or failure from the start, particularly when you are fighting a manpower war, which is where OSS is beating every proprietary entity. OSS already powers Google and Amazon and OSS db's will scale to billions of nodes, not a few thousand.
Wasn't much of the Bing stack built off Powerset, whose core technology was licensed from Xerox PARC?
about the search and relevance they are updated quite fast, sometimes with big code replacement. This means that even if powerset tech was used as a start (and I do think it was not, it was possibly adapted and integrated -- i believe the base code was msn/live) it is now changed in most of its pieces.
>Windows is one of the great engineering feats of CS
Do I have to point out all the lame exploits and bugs which take so long to get fixed, the terrible design, the "things which should have been there years ago but we still don't have" like a decent file manager, task manager, copy utility? And AFAIK Windows made few major contribution to the theory of OSs (semaphores, threads, paging, scheduling and so on) so there's really nothing to be amazed at.
I'll spare you my opinions on Bing. Reinventing the wheel is not worth describing, no matter how beautiful that wheel is, although I'm happy for you to be a part of the team making that wheel. If only Bing had more ambition than just being a clone of Google Search, some people under 70 would actually consider migrating. But if you're happy with your default-search-engine-bundled-with-IE market share at 20%, good for you (eh, it does bring a lot of ad money). You may not call the shots at MS, but you can at least admit all the shortcomings.
Are you going to argue about the famous general instability of Windows compared to its large competitors though? It seems like a good indicator of bad design in low level implementations.
* Windows actually sucks.
* Reinventing the wheel is not worth talking about no matter what.
* Bing sucks.
* I should admit the shortcomings of MS.
Regarding the last point, I'm kind of shocked that you think I'm a shill. I feel like I've been pretty honest about my feelings.
re: Windows sucks, that's sort of OT, but if you want to have the discussion, drop me a line. email@example.com
re: Bing sucks, I don't see what your point about market share or unoriginality is, I already conceded that we have a lot of work to do with search relevance. I'm sort of annoyed about the negativity of your post, as from my perspective I've been pretty candid about what I think our strengths and weaknesses are. :(
re: reinventing the wheel, I don't think this is going to be something that we agree on. The way in which engineers here pull together and simply build what needs to be built is nothing short of breathtaking. I don't see how building, e.g., Cosmos shouldn't be considered an accomplishment. Should Amazon's Dynamo? What about Yahoo's Hadoop? What about anything in OSS, for that matter? I think you're not being fair here.
why do you believe this? Have you used Bing and given it a fair chance?
The Bing search you see has differentiated itself from Google in many ways with Twitter/Linkedin/Facebook/Yelp integration.
Bing Image search's format has been copied by Google. Bing search in windows 8 is a different experience to Google Search on android.
> Market Share at 20%
Bing search, with the yahoo searchs that it powers constitutes over 30% of the market share of US market.
Furthermore, Bing serves as the backbone for much of the ML, NLP and IR that occurs across MSFT.
Change your chrome's default search to bing, and you will be suprised by the things it does differently, it changed my perspective (http://blog.samirism.com/experiments/bing-experiment.html)
Other companies no one could unseat:
* Apple (a couple of times)
* IBM (a couple times in a couple fields)
* And on and on.
The point is that given a bit of time, Google will mess up, someone will come up with some new tech, and/or google will implode under it's own crushing weight.
Of course, whether we survive is another question entirely! I did not speculate on this, nor would I. Who knows what the future holds, we've just this year basically bet the company on some fairly risky things.
That said, I'm all for hating on large corporations, but the idea that Cisco and Oracle -- literally the market leaders in their respective domains -- are "on their way out" because they don't innovate fast enough is not a very convincing argument. :( Precisely who poses them an existential threat at this point? I see no one at all.
If you have actual questions for me I'm happy to answer them. But what you've asked is not a question. You're just saying that to be mean. :(
Sure, just making an OS doesn't mean it takes over the world. But guess what: you have the same problem with search engines.
If you want search engine competition you have to take Google's Ad business away.
Its weird I know but Google's search ads pay $80 - $100 RPMs and other guys ads pay $30 RPMs. If Microsoft could use Google's ad network they would be a solid contributor to the bottom line of Microsoft (which is why they can't of course).
If you could magically peel that Ad network/agency into its own entity and require it to give non-discriminatory terms to everyone, I believe we would have a pretty vibrant search space. My reasoning there is that the money associated with search advertising would fall into buckets that were much more closely aligned with market share, as opposed to today where someone like Microsoft can have a large share of the search 'eyeballs' but only a fraction of the revenue because their Ads don't have the RPM numbers.
I have run ad campaigns on Adwords continuously for 7+ years. Throughout that time I have run ad campaigns on Yahoo, now Microsoft Adcenter, on and off. The last I checked their ad platform was about where Adwords was in 2005.
Yahoo had some very reprehensible things on their platform. I had to shut off all of my campaigns because someone at their company was changing what my ads said without my permission. Besides being a legal issue for Yahoo at the time, it put me in a position of unlimited liability. Fortunately, I never witnessed that behavior after Microsoft took over. Yet, Microsoft's platform was just too difficult to get working.
That was probably $5 million + in missed advertising revenue from me, just a tiny advertiser. I can only imagine the billions of dollars of revenue Microsoft and Yahoo lost for failing to take seriously the search advertising marketplace.
The correct way, I believe, is to compete on non-consumption and capture those users, therefore you start small and execute well and scale up as users grow. You avoid the comparison to Google and can sneak in the back door without the crazy upfront costs. Imagine if microsoft focused on Knowledge graph before google did? It would be very interesting times, but every google competitor just competes head on. At the end of the day its about getting users and they don't care about the slight change in your algo, only in the way they feel when using your product i.e frustration, delight, anger, surprise, etc!
From MS's perspective (again: I work here, but my opinions my own blah blah) the problem is this. People use computers to access the Internet. MS can't just be the OS and the browser used to access the Internet to maintain its lofty position as a field leader -- if MS's job is to supply the Internet as a service to people on MS devices, then it is mission-critical that it also be the landing page of the Internet. If MS gives up Bing, it might as well give up all consumer investments IMHO.
It is (IMHO) more important that Bing exists and is mostly functional than it is that Bing is equal or better to Google in every way.
Of course it is a huge priority to make Bing a viable threat in its own right, but what I'm saying is that this is not the only consideration.
I think the main problem of MS is that it is copying Google search and not providing anything substantially different or better. In other words, fighting against incumbent is not about spending more money than incumbent and doing the same. It is providing service for "niches" where incumbent is not willing or not able to provide service.
Here is an example:
Google shuts down Google Code search. I'm not happy. But, I was thinking... There is Bing... They will maybe jump in and provide state of art code search. But... nothing happen.
Google could simply stay ahead of the curve, but this assumes all of that human capital is being dedicated specifically to building the most effective search engine.
What I see now is an advertising search engine. The friends who don't believe me are the ones who haven't turned Adblock Plus off in the past 3 years.
Companies are collecting more data about online behaviour than ever before. And no-one has the same online reach as Google. From analytics, to apps, to fonts to jquery - there's barely a site that doesn't link in one form or another back to Google in some way. Google's digital fingerprints reach into every corner of the web.
I've said this before, but I was hoping that 2014 would be the year we become more privacy-conscious, but I don't actually think that will be the case. Google get an incredibly easy ride on the subject of privacy and online tracking from the tech community. They're probably salivating at the prospect of capturing even more precise user behaviour through an OS (Chrome) that potentially captures everything you do online. Google aren't capturing this data anonymously either. The tech community's response to this seems largely to be - so what? For anyone who cares about privacy, that's pretty depressing.
The first one is a easily-dismissed fallacy, the second is not limited to Google or any other company. I have yet to see a convincing argument that Google is misusing this data or doing anything bad with it.
On the other hand, a service that knows you intimately enough can provide some very cool things that are otherwise impossible. The cards on Google Now, for instance, rely completely on the search history on your account and the location data from your phone. I get up in the morning and my usual route to work is plotted out with an ETA. I search for a nearby restaurant on my computer and the directions appear on the phone complete with ratings. Things like that.
My philosophy is to deal with any abuses if/when they occur (and mitigate the forseeable ones), instead of walling yourself off from the ever-more connected world. I'm starting to think there's a fundamental shift happening in what "privacy" is, why it's necessary, and what it means nowadays. And as usual, the choices are hop on, get out of the way, or get run over. For better or worse.
If they plan on continuing to make money from tracking users, then they'd better figure out a way to do it very securely and without invading user privacy, otherwise they're going to feel an increasingly bigger pain in terms of public perception of Google over this, which could lead to them losing money in the long term, too.
I would also forget about tracking "everything possible" until then, and encrypt end-to-end stuff like chatting and video-calls (maybe they can do this one less costly through P2P or a hybrid system). I very much doubt they see a ton of money as a return from tracking and data mining people's chats online. And the downside is quite huge, because those online chats can be abused by the governments. So why not secure them properly? A little cost to them, huge privacy benefit to their users.
The goal should be to only track public, and not private information (at least in the short term, and then they should use homomorphic encryption for public information, too, as that will become increasingly more revealing, too, in the future).
I would also pay a lot of attention to what the Dark Mail Alliance is doing with e-mail encryption, and I would at the very least implement their protocol as an option for people who want to talk securely with others, from inside Gmail. They don't necessarily have to make all of Gmail encrypted by default, although that would obviously be very nice, but probably not very practical until they figure out homomorphic encryption.
There are also other things they could do to make e-mails a little more secure against abusive governments. In the US, ECPA allows the government to take the e-mail after 180 days without a warrant. So how about you ask people to give a password to G-mail, that's locally stored, and can automatically encrypt emails older than 179 days. If you need to access your own 6+ months old e-mails, then you're just asked to insert your password to access them. I don't see this as a huge issue for convenience, since the vast majority of 6+ months old e-mail is never accessed again by most people anyway.
The memories of a privacy-focused Google are certainly getting blurry, but I seem to remember there was a time when Google really cared about user privacy, and didn't have the same mentality as NSA for "collecting it all", storing it forever, and using it forever with data-mining.
Google has a lot of very smart people working for them. I'm sure they can come up with many more and much better solutions than even I proposed here. The problem is they have to want it. If it doesn't come as an objective from the top, then it's not going to happen.
From Wired, last year:
> Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.
Google needs to lose that attitude. Adapt or die, Google. And by adapt, I mean having their business incentives once again aligned with those of their users, or it's not going to end well for them.
Actually, the tech is extremely complex, and PageRank is only a tiny part of that. Try building a search engine sometime. To hackers inspired by this article: here be dragons.
DuckDuckGo took the only sane approach and aggregated results from existing search engine apis, then gradually mixed in some secret sauce. Even then, is DDG viable competition for Google? Will it ever be?
All the same, I join you in wishing for some Innovators Dilemma-style disruption here. A "toy" service comes along one day that's a substitute for Google search, but only within a tiny niche. Google doesn't take it seriously enough until it's too late...and we have a real ballgame on our hands again.
Thought experiment: what could that disruptive niche be?
Someone needs to write/buy a search engine and then build a really rich API into the internals of it that lets 3rd parties write customized search engines. How about an auto parts search engine, or a search engine for spanish-speaking people living in southern california? What about a Christian search engine, or a meme search engine?
When someone can empower 3rd party developers to make the same kinds of decisions as Google does, but with different tradeoffs, and they really put the full institution behind supporting that, I think:
A) Google will have a hard time competing because they won't be able to give the personal attention to the needs of what is, effectively, a community of modmakers.
B) This new company will capture the long tail of search. Only a sliver of that is covered right now, with a scattering of niche search engines (Google Scholar, etc).
C) The number of users could be VERY large. It could be the cable to Google's broadcast television.
D) Getting started wouldn't require any massive technological achievements. Just find an underserved niche where even a really really stupid search engine would work better than Google. Write it, figure out how to make money off it it, grow. Start with something that requires only a small index. Slowly expand into additional niches according to what will help keep the company and the tech moving forward.
Google's bet is that any information you can glean from someone coming to a niche search engine can be reasonably approximated with contextual information in the query. That's proved true for many queries, but the key is to find the queries where it's really not.
Essentially, you get the Yahoo results, and decorate, filter, reorganise, improve, combine as you wish. So you get the organic results, which you can then innovate upon.
And it has survived the transition from Yahoo-powered search results, to Bing powered search results:
Which means you have API access to the second most-used search engine in the industry. So what better way to voice your discontent with Google by supporting a competitor.
Running a successful search engine is expensive, it needs continuous investment into R&D. That's why Yahoo took a step back and partnered with Bing instead. The level of investment needed just to hold status quo with the existing market runs into billions of dollars a year, something Yahoo baulked at. Microsoft, however, were still strongly inclined to invest that every year.
But doesn't seem to allow you to do write your own indexing, ranking, and querying algorithms, which is the kind of thing I'm talking about.
I'd say "yes". I've switched to DDG as my primary search engine, and I'd say they return a good result about 80-90% of the time. They're definitely not as good as Google, but you can always add "g!" to redirect to Google results.
I'd say their best selling point is their focus on privacy, but oddly they don't seem to be touting it very loudly or trying to make hay from the recent NSA revelations.
That if is actually Google's biggest strategic advantage though: the resources to keep a near-live and nicely deduplicated copy of all relevant data on the internet. [PS1]
Google's demise will eventually come in the form of the adoption of protocols which allow you to efficiently maintain a live view of a service's public resources.
[PS1] And as per https://news.ycombinator.com/item?id=7011816 click-throughs of course, although I don't see a way to side-step that with Google having implemented [not provided]
For search? Absolutely, it is today.
However, Google figured out a long time ago that they could lock people in by offering a hundred other services too. DDG has no maps, no mail, no image search, no video search, no drive, no docs, no news, no book search, no academic paper search, no patent search, no stock graphs, no language translation, and so on. For some of those, it can link off to other services, none of which are viable competitors to the Google equivalents.
An irritation for me with DDG is them formatting result URLs incorrectly, and then ignoring any feedback about it. Two examples are them adding spurious www. prefixes to Google code results, and leaving out slashes when creating Apple developer URLs.
DDG could distinguish themselves by playing bazaar to Google's cathedral, but they don't appear to. A search engine that uses crowd sourcing and feedback would be disruptive IMO.
The examples. https://duckduckgo.com/?q=apsw - note the second link is on google hosting and shows code.google.com/p/apsw/ but clicking on it gives page not found because it goes to WWW.code.google.com/p/apsw/. https://duckduckgo.com/?q=NSString and note infobox at top which goes to Apple developer but clicking gives an error. The link should have a slash between Reference and NSString at the end.
Some kind of bitcoin-esque system where you bought sponsored positions by doing search engine scoring related number crunching could be interesting though. Since, if you ever matter, you'll have people devoting resources to trying to game your search engine results you need some system to deal with that. Of course it's sooo "out of the box" to suggest a bitcoin inspired solution to things right now ...
On the data side I think http://commoncrawl.org/ can help with creating vertical search engines. Their crawl is much smaller than Google or Bing but it is web scale (2 billion pages of 2013 data). Data recency is still a problem but it can help with finding which sites belong to a niche. Some smaller scale crawling of these sites would then be a much more achievable task.
Internal search engines could get a lot better. I don't know how people who make websites make them, what tech they use etc., if it's mostly incompetence that does this, but the fact that I'd rather use plain Google than an internal search most of the time tells me that Google is just too good.
The problem isn't even one that responds to market forces -- the "victims" (sites that should rank highly in organic results) are no one's customers. If you want to flip the script, they're the product. They have no leverage, and having more search engines leaves them with still no leverage. Only the users of the search engine have leverage because they can switch to another search engine -- but they can do that today. The problem is the alternatives are no better.
What we need is a cost-effective accurate way to identify spammers and exclude them from high rankings in search results. Someone who could do that more effectively could challenge Google, because there is a market for spam-free search results. But if that was an easy problem to solve then why hasn't Google solved it? They have the right incentives. It's just not easy, because spammers adapt. Whenever a search engine does something to thwart spammers, the spammers do something different.
Having more search engines doesn't make it easier, it makes it harder because each search engine has less resources to dedicate to it and they have to duplicate each other's work, and the cost to sites of legitimate optimization for a larger number of search engines increases which creates an even larger advantage for major institutions over small timers who can't afford the higher cost.
It's silly to say "why doesn't Google do X" and list some abstract thing you think would solve it which they've already considered and declined to do. There is probably a reason. Maybe manually curating every website is too expensive. Maybe arbitration proceedings would be overrun with spammers trying to challenge legitimate removals of their spam. And if they're wrong, don't speculate about it on a blog, prove it by building a better search engine. So far no one has been able to do it.
But then, I completely agree that just complaining about the issue is useless. Anybody that (thinks that he) can build a competitor for Google will try it or not based on his odds of success, tolerance to risk, etc, not because somebody is complaining.
I just tried the same query with DDG and Bing and they don't even come close to giving me the link I want.
Needless to say, Bing fails here.
Worked for me.
Edit: Different query from yours, my bad. I mixed both your links up.
That just feels wrong.
But I guess now is a good time to install it again, help scraping the internet, and maybe hack at the code.
How are rules against paid linking scams or procedure generated content farms considered arbitrary? It's clearly trying to game the system, and the rules are explicitly laid out to tell you NOT to do it.
>2. Google needs these rules, because Google’s rankings are apparently trivial to game.
If this were true, you can make millions executing your plan for any number of websites. It's not.
The whole case with the delisting/penalization and subsequent(and extremely speedy) re-listing of RapGenius is a great example of Google's current arbitrary practices. I covered this is some details on another thread in a couple of replies.
One man's advertising is another man's paid linking scam. Do you know where to draw the line? Google doesn't - or at least they can't create an algorithm that knows the difference, even after significant investment in the problem.
Google has gotten so rich, entrenched and popular that IMHO no competitor can dislodge it. I say this as a thoroughly-disappointed user who's tried nearly all the alternatives to Google in the various segments that it operates in. I've managed to stop using nearly all Google services except Android (running CM) and Search.
As others have pointed out in the thread, DDG is nowhere near as good, especially if you're not in the US (I'm in India). After having forced myself to use DDG for a month, I've now resigned myself to Google searches with a couple of extra steps:
- all searches performed while logged out of all Google services
- browser plugins to rewrite all Google tracking URLs from search results
So sure, we need viable search engine competition. But don't wait up for it either.
Isn't this the definition of a monopoly? And if so, isn't that reason enough to consider search as a public good or a publicly regulated means of accessing information?
Judging from the various anti-trust cases that have been brought on against Google around the world it's clear that proving that is nearly impossible too. More so because Google operates in a sector (Internet software) that is theoretically open to infinite competition and zero switching costs.
Google insists that its current search mix is based on a lot more than just PageRank, but it seems that PageRank probably contributes the foundation of their business. I don't see a way of competing with Google's results unless we are allowed to use something like PageRank, which we can't do unless we pay royalties or wait until it expires.
I'm not saying that PageRank is the be all end all of search algorithms, and certainly someone somewhere could come up with a different method of ranking superior to Google's. Ranking pages by how many other pages cite them seems like a pretty fundamental insight, and where I would start with any new search engine.
What is the most important signal? Click-throughs. This is why any new search engine is at a massive disadvantage.
a) PageRank is just one of several hundred factors used for determining ranking. Search engine ranking is a lot more complicated than even most information retrieval-programmers tend to think.
b) There may have been a window of a few weeks or months in 99/00 where not every major search engine used some form of link-based ranking, but it was, as noted, a very brief period.
It always amazes me that people for almost 15 years now have believed in the myth of PageRank's uniqueness and power. I would like to challenge people to think about two things:
- how useful do you think a pure static ranking of web tens of billions of web pages is? Think about what it represents. What does it mean to assign one page higher rank than another?
- do you really believe that other search engines would not have implemented PageRank or something similar? Do you really think that search engine designers do not read papers and apply every trick in the book that they can manage to implement in a scalable manner?
There are lots of hard problems you need to solve if you want to build a web scale search engine. If ranking becomes your biggest problem: that would be a luxury. The biggest hurdle today is money to buy computing power and storage. The time when you could build a competitive web scale search engine for regular startup-money is over. It has been over for close to a decade.
It saddened me greatly when Yahoo threw in the towel because it meant that search in the western world was effectively a two-horse race. And once you get off that horse there is no getting back on it again without some seriously heavy lifting.
I don't know what other search engines have implemented because none of them will let me see their code :-) I do believe if they could implement a link-based ranking system without fear of being sued by Stanford they would. I don't know how different a link-based ranking system would have to be from PageRank to avoid getting sued by Stanford, or whether Stanford litigates this when they suspect unlicensed use. I'm guessing they do sue to defend the patent, because Google's royalties for using the algorithm number in the hundreds of millions, a significant amount towards Stanford's endowment.
So yes, anyone can license PageRank now.
The issue is that PageRank is a general factor which doesn't have much to do with the question of ("is page A relevant for topic B?") If PageRank causes a popular but irrelevant page to rank above an unpopular but relevant page it is part of the problem, not the solution.
Also, is PR really the best, only way to do this? Seems like there are all kinds of better (more modern) signals we could use other than links, which were kind of the only game in town 10ish years ago.
Edit: Woops. So none of this is true.
I'm starting by building the search engine I want to use while writing code, which does parallel searches of different parts of the web dynamically based on the query.
I have an initial (ugly and very limited) prototype here: http://gigglebang.com/
I'm looking for co-founders. If this is an interesting problem to you, email me: firstname.lastname@example.org
My setup encourages me to at least use duckduckgo and/or bing when I am using firefox.
What's the best search engine where buyers go? Not google.com and never really been google, it's amazon.com and ebay.com.
What's the best search engine to find answers on technical questions?
Stackoverflow.com is approaching google very fast.
So the answer would be for sharp minds to develop sharper and better vertical portals/platforms and pick google apart piece by piece.
Aggregating services are successful only where there are millions of equal service providers (like hotels or restaurants) and the one needs a one-stop place (aggregator) to search.
When Google (and Bing and others are far secondary) aggregator model couldn't really do much.
So what is it good for? Any subject where you want depth or breadth over surface results. So collectors and researchers can build a portal for themselves. And there are some interesting possibilities for specific needs, like a search list for many places to find a cached URL: http://nuggety.com/u/nuggety/cached-webpages or a search list to search any of Google's 190 international websites: http://nuggety.com/u/nuggety/international-google
Search for "cms" does not return "sitecore" anywhere.
This Google thing is WAY out of control.
Nine results were about the space-partitioning data structure. Also, there is a tree-trimming service on Long Island called K&D Tree Masters.
Surprisingly, pleasantly -- all ten items are useful to me.
"Besides emphasizing Office, Elop would be prepared to sell or shut down major businesses to sharpen the company’s focus, the people said. He would consider ending Microsoft’s costly effort to take on Google with its Bing search engine, and would also consider selling healthy businesses such as the Xbox game console if he determined they weren’t critical to the company’s strategy, the people said."
I haven't used dogpile in forever - actually thought they were defunct, but your comment brought it to mind. Seems like what you're after?
* Blow away my competitor's links on the topic I'm related to
* Always put my links at the top on my topic page
* Sneak my link into other pages not directly related to my topic (viagra spam on comp.lang.*)
Maybe you could require a tiny amount of bitcoin to edit a page to prevent spammers?
I simply do not know how people, and there is a lot of them, come to the conclusion that DDG results are in any way better than google, apart from the fact that DDG is not google. It would not totally surprise me if people were more making do with DDG because its not google to make a point, and as a result being over generous about its utility.
Maybe for Americans, DDG does give better results, I cant say, but I'd like to know on what criteria that is based. But for me, a non American, like I say, DDG is regrettably mostly useless.
And of course:
Way too often, Google will try to be "smart", and search for some "intelligent" interpretation of my search, where no such thing was called for. Especially for technical, precise searches for "strange" strings, this can get really annoying.
Also, those bang-searches are just genius. I regularly search Wikipedia, programming language docs, or maps using those. And yes, sometimes Google as well, mostly for fuzzy "how do I" searches or searches that I don't know precise keywords for.
Wow, so it's not just me then - I was describing my exact same feeling to someone (perhaps less eloquently than you!) just three days ago!
Even when I go and set it to 'Verbatim' search (which is well hidden), it still often gives me useless results for these kinds of technical queries...
We need an open-source, or decentralised (i wish) solution. That is as close as you can get to a pipe-dream at the moment. Doesn't mean we won't get there eventually though.
EDIT: early days open src p2p search: http://yacy.net/en/index.html
That said, the little helper widgets are brilliant. I find them more useful more often than the Google result widgets.
I've already switched back to G on pretty much all my machines.
ctrl-t !w cyclocross
ctrl-t !g site:ohio.gov filetype:pdf economic development
ctrl-t !image sad reaction gif