Hacker News new | comments | show | ask | jobs | submit login
Show HN: A search engine that doesn't track you, where users vote for results (github.com)
62 points by brentadamson 80 days ago | hide | past | web | favorite | 63 comments



The idea is solid, but I think you need to do some of domain weighting until you get users. For example, I tried searching "League of Legends", and I got nothing but a whole page of links to various hacking websites and gold sellers (which makes no sense in the context of the game), all of them at zero votes. This is pretty discouraging, and I don't think it makes much sense to make the first 10 or 20 people to search for a topic sort through (and vote on) the entire Internet worth of keyword related links until things are in some semblance of order.


I've noticed the same. Probably need to re-weight the fields: https://github.com/jivesearch/jivesearch/blob/master/search/...


This is an excellent idea. I'm assuming HN is getting an early preview, but if not I would definitely put a one liner on the landing page that explains what it does a bit.

I do think in the future you should market it with the two differentiators more separate. "Show HN: A Reddit-style search engine where users upvote and downvotes links. It also doesn't track you!"

IMO thats a better product position.


Yeah, that's what I meant...wish I could edit the title.

EDIT: I'm not going to edit the landing page as not all users know what reddit is. The HN community knows reddit but post title could clearly be better.


OK I took a crack at reworking the title to express both your points. Email us if you want it changed again. Good luck with the project!

(Submitted title was "Show HN: A Reddit-style search engine that doesn't track you".)


thank you!


FWIW, I just spent a minute on the Github page (a normal amount of time for something like this) and had no idea anyone could upvote and downvote links until I read the GP comment. In fact, I just searched the Github page for "reddit" and for "vote", and found nothing.

I don't know what this product is or does. Does it run locally? Is there a hosted version available? How is it different than DuckDuckGo or Startpage? Is there a voting feature, as the GP says?

A voting feature would make it interesting to me with one major caveat: It depends on who is doing the voting. For example, if I could restrict voting to HN users then it could be a very useful way to search for tech or for many intellectual topics but not for housing in Tokyo. A random subset of the general public's votes on tech topics would not be useful to me as a professional. Political topics seem like a minefield.


There's a link at the top of the GH repo: https://www.jivesearch.com/. You can also run it locally.

This is different than DDG/Startpage, etc in that everything is open source. I think I have a good start on the instant answers: "shaquille o'neal height", "weather", "matisyahu discography", etc. To my knowledge neither has Wikidata instant answers.


You can ask moderators to edit the title for you.

hn@ycombinator.com


I think the indexing needs some work. For the query "Kozai", Google, DDG, and Bing all return Wikipedia's page on the Kozai mechanism as either the top result or the second result. The lone result I get from Jive is a link to what appears to be a Japanese porn site. I can't even upvote the Wikipedia page because it doesn't show up.


We need more links indexed. The crawler that I wrote is pretty slow and right now we don't have a lot of documents indexed. I will be replacing the crawler I wrote with a faster one soon. Until then I have been using !bangs.


Have you ever heard of https://commoncrawl.org/? Maybe it helps you to bootstrap.

It's cool to see some new things happening in the search engine world.


I've considered that but didn't really know anything about the organization and how long they'll be around. We're gonna give Colly a try: https://github.com/gocolly/colly.


Not sure how upvoting will make search better. Most of the search happens in a specific context, where users are just desperate to go to the desired result and not come back and vote on stuffs if it is relevant. Plus, even if it is done there is no way to know if the page is still relevant since the upvoting.


This looks super interesting.

Somehow similar to Searx[1], except that Jive even has its own crawler, whereas Searx really just is a metasearch engine.

I haven't tried setting up an instance myself, but it appears to be quite simple.

One thing: I found it a bit unclear what the voting does.

Thanks for sharing this. We need more truly free search engines, and not just non-free ones trying to appear "open source friendly", like DuckDuckGo[2].

[1]: https://asciimoo.github.io/searx/ [2]: https://duck.co/help/open-source/opensource-overview


The voting reorders the search results. https://www.jivesearch.com/.

You will need Elasticsearch, Redis, PostgreSQL as well as run the Wikipedia & MusicBrainz dumps. Let me know if you run into any issues.


judging by the amount of hateful comments here, I'd say you are really onto something :)

seems like a neat idea I'd love to see proven out, keep it up!


I think the comments have been very helpful!


Neat idea, scary to implement though. As everyone mentions, manipulation is a massive concern.

With that said, perhaps peer-promoted links would solve this issue. If you built a network of peers (my brother, his friend, his friends friend) and modulated upvoted by their abstraction to me, it could make for a nice up voting system.

Anyone know if that style of voting has been implemented by anyone? Does it suffer fatal flaws? Sounds difficult to implement, for starters.


The rate limit applies to everyone equally so you can write a bot to upvote all your links but someone can then write a bot to downvote them. The default scoring system is meant to be overwritten and can help weed out the bot attempts: https://github.com/jivesearch/jivesearch/blob/master/search/....

Reddit has tons of manipulation by users yet relevancy seems to be very good there.


It does? Reddit was manipulated to a massive degree and, as far as many are concerned, throws away any credibility on the relevancy of popularity.


I visit my subscribed subreddits frequently b/c I find posts that are highly relevant to my interests.


So did (and still do) a lot of political discussions on Reddit. Then we find out that they're heavily being manipulated by Russia. Those people thought those posts were organically, communally upvoted and vetted. Turns out, they were manipulated heavily.

Any heavily manipulated system could have organic and true content for some people. I think the goal would be to have trust in the platform. Reddit, as of late, has no trust with anything of meaning. It has been shown to be manipulated by companies and governments.

I could hire a firm to push my content in your favorite subreddits.. so I struggle to see how Reddit is remotely a good platform for non-manipulated discovering content.

Many people don't like that Google can alter their searches, providing bad incentives. A Reddit-like search now opens up the ability to manipulate your searches to ad agencies, governments, bad actors, etc.


What's the alternative? As others have pointed out the results as they stand aren't that good and, as you point out, other search engines have their own incentives to push stuff to the top.

Since the scoring system can be customized I think we can eventually figure out a way to deal with bots but it will always be done in a way that has to treat all domains equally. I think it would be appropriate to have our scoring system regularly audited (along with the rest of the code) to ensure that we treat all domains equally and that no modifications are being made to the rest of the code in regards to privacy.

If, after all of that, you aren't satisfied then by all means you can run your own instance and order the results as you see fit while blocking others from upvoting/downvoting.

EDIT: BTW, I am not suggesting that the voting system is set in stone. If there is a better way then I'd love to hear it! I do understand the drawbacks to the users voting and I hope I am not coming across as being completely inflexible on this point. I just haven't come up with a better alternative and haven't gotten much in terms of concrete suggestions.


> What's the alternative? As others have pointed out the results as they stand aren't that good and, as you point out, other search engines have their own incentives to push stuff to the top.

That's the million dollar question isn't it? If I had an answer, I'd have a product ;)

That doesn't change the fact though that I don't trust Reddit any further than I can throw it. It's fun for memes, but beyond that it's been proven to having been manipulated multiple times now.

> Since the scoring system can be customized I think we can eventually figure out a way to deal with bots but it will always be done in a way that has to treat all domains equally. I think it would be appropriate to have our scoring system regularly audited (along with the rest of the code) to ensure that we treat all domains equally and that no modifications are being made to the rest of the code in regards to privacy.

Yea, I hope. I'm not trying to knock your product at all btw. If you read my first comment, I even said I love the idea of a voting based search, but I just can't trust the public to do the voting. If it was based on a peer network (brothers friend, brother's friend's friend, etc), maybe that would be better. Obviously though there are challenges with that sort of method, I'm not saying it's perfect or even good - it was just an idea after 30s of thinking haha.

> EDIT: BTW, I am not suggesting that the voting system is set in stone. If there is a better way then I'd love to hear it! I do understand the drawbacks to the users voting and I hope I am not coming across as being completely inflexible on this point. I just haven't come up with a better alternative and haven't gotten much in terms of concrete suggestions.

You haven't gotten concrete suggestions because I think no one knows the answer. Information manipulation is, in my opinion, the biggest problem facing humanity right now. Literally. So it's no surprised that our ideas to prevent manipulation at scale are lacking.


I really like the idea. However, let's think about abuse patterns: what prevents CEO agency from creating 1k t2.micro instances and clicking upvote for a link of interest and downvote all the competitors?


What prevents their competitor from downvoting them? The rate limit applies equally to everyone.

EDIT: The scoring function can also be customized. https://github.com/jivesearch/jivesearch/blob/master/search/...


The local mom and pop shop aren’t going to be able to compete with Walmart. This only works if both entities have similar technical and financial status.


But there's an army of users that will ultimately decide.


There's a lot fewer users than there will be bots.


Use Stack Overflow’s voting system.


What's your reputation, and how does that work with not tracking you?


My reputation? None, really. My background is in finance and I'm a self-taught developer. Just tired of not having anything I like. If you don't trust me you can run it on your own server.


No, I mean..how do you use the StackOverflow system? It's based on a concept of reputation. What's the reputation in the search engine?


I searched for "porn". Got bad results like it's 2004. Neext :D

Cool idea to mash up though. Have you calculated how many people need to actively vote on stuff for the good results to actually get to the top?


Well, if there are no other votes for a given query then an upvoted link will go to the top. Once it's there then that link is now more visible to others who can either agree or disagree.

EDIT: Like I've pointed out in other comments the default scoring system can be customized and that behavior could be changed.


The whole design of 51% percent of users downvoting an entry and sending it to oblivion is NOT a design pattern we should be proliferating in the industry.


A bit useless as a generic web search engine but I could see this being used as the search engine of a community website.


https://www.jivesearch.com/?q=0xbitcoin

No results for 0xbitcoin

Suggestions:

Learn how to spell. Try something else.


We need to run the crawler longer...and we are going to be replacing the crawler with something that should be a lot faster.


The message "learn to spell" isn't the best one for a search engine


Obvious question, how do you intend to make money?


Advertising that doesn't track users. Perhaps a freemium model for API usage with a very generous free tier.


Lordy lord. Now we don't only have to pay for links and SEO, now we also have to hire a shitload of Filipinos to up-vote sites....


the search engine doesn't return anything for "quantum cutting phosphor"


"about us" returns 500


I tried four searches. 'smbc', 'xkcd', 'reddit', and 'google'. None returned what you'd expect. The search for 'google' was especially weird. I'm surprised to see a search engine that can't find the most visited page on the internet.


This is very early stage...


Reddit is known for having an appalling search function.

When I see "Reddit-style search...", I immediately think of search that doesn't work.


I meant that users can upvote/downvote the results.

EDIT: To clarify - I've just referenced Reddit in this post. I don't plan to use it anywhere else.


I'd drop any references to reddit - they are far from the only site with upvotes and downvotes, and while they have a rabid fan base, they also have a rabid detractor base. Better to just state your value props than to deliberately inherit the baggage from a contentious site.


Seconded, my initial impression was 'bad idea' until I read a bit more about how it worked.

Seems like an interesting idea - could be quite useful if you can get a critical mass of user data.


Call it hacker news style instead.


Call it stack overflow style instead.


Sorry, another bad idea - SO's search engine is notoriously bad.


Google is SO’s search engine. He’s making google with SO voting.


That makes a lot more sense. My initial reaction was the same as the grandparent comment.

I would suggest rewording the tagline to "Privacy respecting search engine with reddit style voting".


Dude. Not a good response. Better to say "good point, thanks for the tip" since you're the one who suffers the bottom line - either you're understood or you aren't.


How is this "Reddit-style"?


You upvote/downvote results. Crowdsourcing search ranking, what could go wrong?

Edit: typo


Ideal for those times when the echo chamber is not enough.


If you search you'll find that you can upvote the results, I guess that's what he means by reddit style!


Users upvote/downvote the results.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: