Great idea, but I think Google already uses this data.
Google Toolbar and now Chrome report this data back to Google, and most search pros believe "serp bounce back" and "time on site" are key signals Google uses.
PageRank and DwellRank are not either-or choices.
Here's my theory: Google uses PageRank to decide what pages to "try out" for a query (i.e. display a page in the SERP for a sampling of queries). If the page gets clicks AND has good "DwellRank" then it gets progressively better and better rankings. If a new page enters that beats it, it falls.
This approach is very Googly -- they love to test. They love to decide if product features are good or not by giving them a sampling of traffic. It would be insane of them not to extend this approach to search.
So the upshot is, use "PageRank" to decide which pages deserve an audition, and use "DwellRank" to decide the winners.
Since 40% of the clicks go to #1, 10% to #2, 8% to #3 etc,Google can audition pages using DwellRank without affecting the experience of the majority of their users.
I was a little surprised that they haven't included anything about spam or gamification. One core advantage of pagerank is that it's (relatively) hard to get links from high-authority websites. I can't force whithouse.gov or cnn.com to link to me. If you rely on time-spent-on-page from millions of users and treat everyone equally, how to you stop spammers from faking millions of hours spent reading their content using spoofing or bots?
Yes that could be a potential issue, expecially if you want to keep the data of the submitters as anonymous as possible. We are monitoring this closely, but honestly we don't have a single hammer solution against spam as always it needs my small steps. I will post some thoughts about it on our blog in the days.
Disclosure: I am the CTO of archify/Blippex.
Hmm, this is an interesting algorithm, but I'd challenge its major assumption for a lot of searches. I don't have metrics, so of course my own assumptions can be challenged also, feel free to.
I think that a lot of search engine enquiries are essentially questions, with an answer that can be considered correct. Absolutely not all, but I think enough that they should certainly be considered. In that case, a site which immediately and clearly answers the question should be given, I want my answer within seconds, not minutes. If you give me the site that answers my question and that users spend the most time on, that's the exact opposite of what I want in this case.
Here's an example, I search "Population of America",
your site's top result is sporcle.com, a quiz site. I bet people spend ages on there guessing the population of various countries etc, but I'd prefer to just get my answer.
That said, it appears such queries are handled outside the main algorithm by your competitors. Both Google and DuckDuckGo will give a card, at the top of the result, answering my query - I don't even have to visit a website.
I guess the tl;dr is that it's awesome that this is ambitious, but I challenge the assumption your algorithm is desirable for the majority of search results. Neither is Google's really though, so maybe this is an overly harsh criticism of something Google probably did very poorly early on too.
I think you guesses are absolutely right. We never intended to compete against other engines in the field of "Answers". I think you will always get a better result for ""Population of America" if you search for ai at DuckDuckGo or Google. But in the other hand if you search for example for "NSA" on Blippex (https://www.blippex.org/?q=NSA), we are assuming that you will get those articles about the NSA which is currently the most interesting or the most read.
Disclosure: I am the CTO of archify/Blippex.
That's fair enough, it seems like it would be really useful for article searching, like the NSA one you gave (great, example, blew Google out the water!). I can see it being great for research too, assuming academics spend the most time on a good source which seems reasonable. I'll definitely be giving this a go for some searches, really nice!
Oh, and a quick suggestion: Have you considered a Firefox search engine addon? I do most searches from the omnibar, and I think more people would switch search engine up there than manually go via blippex.com
This algorightm might highly impact discoverability. It gives mover visibility to already popular websites making them even popular, while not very well known websites will never be discovered because very few people spend time on them.
Also I don't like the idea of having to install a plugin on my browser so that the urls I visit and how much time I spent on them is tracked, even if suposedly my identity is never tracked. Once the plugin is installed how can I know if a new version of the plugin won't track more parameters?
When I read the title I though it was referring to a distributed search engine like YacY or Seeks.
2. Often best sites are the ones I spend the least amount of time on—because I got an answer quickly. Would hate to not be able to find those site. Seems link the traditional form of ranking should still be an important part of your solution.
Cool. The converse of this is that perhaps I spend a lot of time on a site because it's difficult to use (and there isn't a better alternative for me). How about I instead give you access to my camera so you can measure my mood (through facial expression). If I look happy or intrigued, must be a good site!
I think this is a great idea! However, I am worried about privacy. I also feel like this algorithm may inflate the importance of certain types of content over others. For example, just because I spend more time on a news or social media website does not mean that it has higher quality content, it just means that the content takes longer to consume. Within content categories, however, I think this could do a good job of weeding out the quality content from the spam.
Thanks, that is a very interesting input. We should think about running additional semantic analysis and relate them to the time spent on sites. We are very sure that our algorithm needs a lot of fine tuning and this could be a very important part of it.
Disclosure: I am the CTO of archify/Blippex.
I had a similar idea in college which was to take the actual traffic for pages into account for search ranking (this was before Google bought whatever Analytics had been called before, I can't remember.) I had thought of it as a server side app which would benefit the hosts while feeding the search engine traffic data.
After talking with friends we explored the idea of a user side traffic tracking app as a way to feed the search engine, but I couldn't get enough traction and no one wanted to challenge not only Google but also IE/Firefox/Safari etc. because we felt it would be its own browser.
Now a days I am more concerned about possible privacy issues, I feel for them launching a search engine that actively asks you to be tracked (even if anonymously), it's a hard sell during this current resistance to that entire idea.
At the time FF was still behind IE and IE hadn't really adopted extensions yet, I think. It's a bit fuzzy how we got there, this wasn't like a formal business plan and analysis, this was some college guys in the dorms chewing on an idea for a few weeks.
How do you differentiate between useful dwell and useless dwell? I often need to spend some time on a page before I realize this is not what I'm looking for. How will you tell?
And now that we're talking about search, I had an experience on google that I found very odd. I was looking for the Richard Marx song, "Suspicion" from the album "My Own Best Enemy". I knew the song and the album, but I couldn't remember the name Richard Marx. Problem was instead of typing "My Own Best Enemy", I was typing "My Own WORST Enemy". Google had no clue. Shouldn't a good search engine be able to tell it's just one word wrong?
Differentiating the quality of a dwell would be nice, but that would mean to track search trails of our users, which is too much of a privacy issue. But we are thinking about semantically evaluating the DwellRank. For example a useful dwell would be for tutorials. But this is just an assumption, we simply need more data about that.
It needs some sort of fallback for search results or it's useless to a specialized user. My Google search history looks like random bits of consciousness spread out across months. Half of those search terms bring 0 results on Blippex, and while I understand that they're early, it's hard to beat something like PageRank when it's already got established experience.
It's a catch 22: the results won't get better unless people use the service, but people aren't going to use it if the results are bad in the first place. If I install the extension but use Google, it's a one-way relationship that only they get data out of. Not very good for me.
Thanks for building this. We need more stuff like this.
We have some rate limits at our API in place, but of course it not that difficult to change an IP-address. But most important we wrote some algorithms which checks submitted URLs and domains for suspicious or accelerating behaviour. If that happens we simply suspend that domain for some time. We are also planning to publish those suspensions.
All of these seraches were not successful. There was no Facebook link in the first search, no Gmail link in the second one , no news.ycombinator in the 3rd one, and the only wikipedia link I got in the last search was :
I don't think that those "generic" term are the main advantage of Blippex. But if you search for example for "NSA" on Blippex (https://www.blippex.org/?q=NSA), we are assuming that you will get those articles about the NSA which is currently the most interesting or the most read.
-Gerald (Disclosure: I am the CTO of archify/Blippex.)
I'm sorry for the offtopic, but on a page that's supposed to get people involved, shouldn't you at least mind the difference between its and it's? In the very first paragraph it goes wrong already. I'm not a native English speaker, but these mistakes always jump out for some reason.
Well that was a quick response, at least that's a positive thing :). I installed the add-on. Testing the search engine, searches seem to take forever. It keeps displaying the spinning icon in the orange square next to my query.
If you want to track the issue down, my IP is 188.8.131.52 or 2001:980:1f44::/48 if you support IPv6. Timestamp around 17:51 UTC+2.
Doesn't work at all without cookies. Meaning, it doesn't work, and doesn't tell you why. If you're targeting people who are looking for an alternative to the major search engines, there's a better-than-average chance that they'll have cookies disabled.
It's some kind of AI or even some kind of neural network, people are involved to train the search engine, so, more data users will contribute to the search server - more proper and relevant results they will get. Good idea