
Show HN: A search engine that doesn't track you, where users vote for results - brentadamson
https://github.com/jivesearch/jivesearch
======
dehugger
The idea is solid, but I think you need to do some of domain weighting until
you get users. For example, I tried searching "League of Legends", and I got
nothing but a whole page of links to various hacking websites and gold sellers
(which makes no sense in the context of the game), all of them at zero votes.
This is pretty discouraging, and I don't think it makes much sense to make the
first 10 or 20 people to search for a topic sort through (and vote on) the
entire Internet worth of keyword related links until things are in some
semblance of order.

~~~
brentadamson
I've noticed the same. Probably need to re-weight the fields:
[https://github.com/jivesearch/jivesearch/blob/master/search/...](https://github.com/jivesearch/jivesearch/blob/master/search/elasticsearch.go#L34-L50)

------
soared
This is an excellent idea. I'm assuming HN is getting an early preview, but if
not I would definitely put a one liner on the landing page that explains what
it does a bit.

I do think in the future you should market it with the two differentiators
more separate. "Show HN: A Reddit-style search engine where users upvote and
downvotes links. It also doesn't track you!"

IMO thats a better product position.

~~~
brentadamson
Yeah, that's what I meant...wish I could edit the title.

EDIT: I'm not going to edit the landing page as not all users know what reddit
is. The HN community knows reddit but post title could clearly be better.

~~~
dang
OK I took a crack at reworking the title to express both your points. Email us
if you want it changed again. Good luck with the project!

(Submitted title was "Show HN: A Reddit-style search engine that doesn't track
you".)

~~~
brentadamson
thank you!

------
antognini
I think the indexing needs some work. For the query "Kozai", Google, DDG, and
Bing all return Wikipedia's page on the Kozai mechanism as either the top
result or the second result. The lone result I get from Jive is a link to what
appears to be a Japanese porn site. I can't even upvote the Wikipedia page
because it doesn't show up.

~~~
brentadamson
We need more links indexed. The crawler that I wrote is pretty slow and right
now we don't have a lot of documents indexed. I will be replacing the crawler
I wrote with a faster one soon. Until then I have been using !bangs.

~~~
ericst
Have you ever heard of [https://commoncrawl.org/](https://commoncrawl.org/)?
Maybe it helps you to bootstrap.

It's cool to see some new things happening in the search engine world.

~~~
brentadamson
I've considered that but didn't really know anything about the organization
and how long they'll be around. We're gonna give Colly a try:
[https://github.com/gocolly/colly](https://github.com/gocolly/colly).

------
palerdot
Not sure how upvoting will make search better. Most of the search happens in a
specific context, where users are just desperate to go to the desired result
and not come back and vote on stuffs if it is relevant. Plus, even if it is
done there is no way to know if the page is still relevant since the upvoting.

------
hjek
This looks super interesting.

Somehow similar to Searx[1], except that Jive even has its own crawler,
whereas Searx really just is a metasearch engine.

I haven't tried setting up an instance myself, but it _appears_ to be quite
simple.

One thing: I found it a bit unclear what the voting does.

Thanks for sharing this. We need more truly free search engines, and not just
non-free ones trying to appear "open source friendly", like DuckDuckGo[2].

[1]: [https://asciimoo.github.io/searx/](https://asciimoo.github.io/searx/)
[2]: [https://duck.co/help/open-source/opensource-
overview](https://duck.co/help/open-source/opensource-overview)

~~~
brentadamson
The voting reorders the search results.
[https://www.jivesearch.com/](https://www.jivesearch.com/).

You will need Elasticsearch, Redis, PostgreSQL as well as run the Wikipedia &
MusicBrainz dumps. Let me know if you run into any issues.

------
ulyssesgrant
judging by the amount of hateful comments here, I'd say you are really onto
something :)

seems like a neat idea I'd love to see proven out, keep it up!

~~~
brentadamson
I think the comments have been very helpful!

------
notheguyouthink
Neat idea, scary to implement though. As everyone mentions, manipulation is a
massive concern.

With that said, perhaps peer-promoted links would solve this issue. If you
built a network of peers (my brother, his friend, his friends friend) and
modulated upvoted by their abstraction to me, it could make for a nice up
voting system.

Anyone know if that style of voting has been implemented by anyone? Does it
suffer fatal flaws? Sounds difficult to implement, for starters.

~~~
brentadamson
The rate limit applies to everyone equally so you can write a bot to upvote
all your links but someone can then write a bot to downvote them. The default
scoring system is meant to be overwritten and can help weed out the bot
attempts:
[https://github.com/jivesearch/jivesearch/blob/master/search/...](https://github.com/jivesearch/jivesearch/blob/master/search/vote/postgresql.go#L80-L94).

Reddit has tons of manipulation by users yet relevancy seems to be very good
there.

~~~
notheguyouthink
It does? Reddit was manipulated to a massive degree and, as far as many are
concerned, throws away any credibility on the relevancy of popularity.

~~~
brentadamson
I visit my subscribed subreddits frequently b/c I find posts that are highly
relevant to my interests.

~~~
notheguyouthink
So did (and still do) a lot of political discussions on Reddit. Then we find
out that they're heavily being manipulated by Russia. Those people thought
those posts were organically, communally upvoted and vetted. Turns out, they
were manipulated heavily.

Any heavily manipulated system could have organic and true content for _some_
people. I think the goal would be to have _trust_ in the platform. Reddit, as
of late, has no trust with anything of meaning. It has been shown to be
manipulated by companies and governments.

I could hire a firm to push my content in your favorite subreddits.. so I
struggle to see how Reddit is remotely a good platform for non-manipulated
discovering content.

Many people don't like that Google can alter their searches, providing bad
incentives. A Reddit-like search now opens up the ability to manipulate your
searches to ad agencies, governments, bad actors, etc.

~~~
brentadamson
What's the alternative? As others have pointed out the results as they stand
aren't that good and, as you point out, other search engines have their own
incentives to push stuff to the top.

Since the scoring system can be customized I think we can eventually figure
out a way to deal with bots but it will always be done in a way that has to
treat all domains equally. I think it would be appropriate to have our scoring
system regularly audited (along with the rest of the code) to ensure that we
treat all domains equally and that no modifications are being made to the rest
of the code in regards to privacy.

If, after all of that, you aren't satisfied then by all means you can run your
own instance and order the results as you see fit while blocking others from
upvoting/downvoting.

EDIT: BTW, I am not suggesting that the voting system is set in stone. If
there is a better way then I'd love to hear it! I do understand the drawbacks
to the users voting and I hope I am not coming across as being completely
inflexible on this point. I just haven't come up with a better alternative and
haven't gotten much in terms of concrete suggestions.

~~~
notheguyouthink
> What's the alternative? As others have pointed out the results as they stand
> aren't that good and, as you point out, other search engines have their own
> incentives to push stuff to the top.

That's the million dollar question isn't it? If I had an answer, I'd have a
product ;)

That doesn't change the fact though that I don't trust Reddit any further than
I can throw it. It's fun for memes, but beyond that it's been proven to having
been manipulated multiple times now.

> Since the scoring system can be customized I think we can eventually figure
> out a way to deal with bots but it will always be done in a way that has to
> treat all domains equally. I think it would be appropriate to have our
> scoring system regularly audited (along with the rest of the code) to ensure
> that we treat all domains equally and that no modifications are being made
> to the rest of the code in regards to privacy.

Yea, I hope. I'm not trying to knock your product at all btw. If you read my
first comment, I even said I love the idea of a voting based search, but I
just can't trust the public to do the voting. If it was based on a peer
network (brothers friend, brother's friend's friend, etc), _maybe_ that would
be better. Obviously though there are challenges with that sort of method, I'm
not saying it's perfect or even good - it was just an idea after 30s of
thinking haha.

> EDIT: BTW, I am not suggesting that the voting system is set in stone. If
> there is a better way then I'd love to hear it! I do understand the
> drawbacks to the users voting and I hope I am not coming across as being
> completely inflexible on this point. I just haven't come up with a better
> alternative and haven't gotten much in terms of concrete suggestions.

You haven't gotten concrete suggestions because I think no one knows the
answer. Information manipulation is, in my opinion, the biggest problem facing
humanity right now. Literally. So it's no surprised that our ideas to prevent
manipulation at scale are lacking.

------
sgdread
I really like the idea. However, let's think about abuse patterns: what
prevents CEO agency from creating 1k t2.micro instances and clicking upvote
for a link of interest and downvote all the competitors?

~~~
brentadamson
What prevents their competitor from downvoting them? The rate limit applies
equally to everyone.

EDIT: The scoring function can also be customized.
[https://github.com/jivesearch/jivesearch/blob/master/search/...](https://github.com/jivesearch/jivesearch/blob/master/search/vote/postgresql.go#L80-L94)

~~~
jakear
The local mom and pop shop aren’t going to be able to compete with Walmart.
This only works if both entities have similar technical and financial status.

~~~
brentadamson
But there's an army of users that will ultimately decide.

~~~
hyperpape
There's a lot fewer users than there will be bots.

~~~
username3
Use Stack Overflow’s voting system.

~~~
hyperpape
What's your reputation, and how does that work with not tracking you?

~~~
brentadamson
My reputation? None, really. My background is in finance and I'm a self-taught
developer. Just tired of not having anything I like. If you don't trust me you
can run it on your own server.

~~~
hyperpape
No, I mean..how do you use the StackOverflow system? It's based on a concept
of reputation. What's the reputation in the search engine?

------
kristerv
I searched for "porn". Got bad results like it's 2004. Neext :D

Cool idea to mash up though. Have you calculated how many people need to
actively vote on stuff for the good results to actually get to the top?

~~~
brentadamson
Well, if there are no other votes for a given query then an upvoted link will
go to the top. Once it's there then that link is now more visible to others
who can either agree or disagree.

EDIT: Like I've pointed out in other comments the default scoring system can
be customized and that behavior could be changed.

------
andrewmcwatters
The whole design of 51% percent of users downvoting an entry and sending it to
oblivion is NOT a design pattern we should be proliferating in the industry.

------
Raphmedia
A bit useless as a generic web search engine but I could see this being used
as the search engine of a community website.

------
snissn
[https://www.jivesearch.com/?q=0xbitcoin](https://www.jivesearch.com/?q=0xbitcoin)

No results for 0xbitcoin

Suggestions:

Learn how to spell. Try something else.

~~~
brentadamson
We need to run the crawler longer...and we are going to be replacing the
crawler with something that should be a lot faster.

~~~
snissn
The message "learn to spell" isn't the best one for a search engine

------
known
Obvious question, how do you intend to make money?

~~~
brentadamson
Advertising that doesn't track users. Perhaps a freemium model for API usage
with a very generous free tier.

------
jk2323
Lordy lord. Now we don't only have to pay for links and SEO, now we also have
to hire a shitload of Filipinos to up-vote sites....

------
DoctorOetker
the search engine doesn't return anything for "quantum cutting phosphor"

------
lazyant
"about us" returns 500

------
imh
I tried four searches. 'smbc', 'xkcd', 'reddit', and 'google'. None returned
what you'd expect. The search for 'google' was especially weird. I'm surprised
to see a search engine that can't find the most visited page on the internet.

~~~
brentadamson
This is very early stage...

------
jwilk
Reddit is known for having an appalling search function.

When I see "Reddit-style search...", I immediately think of search that
doesn't work.

~~~
brentadamson
I meant that users can upvote/downvote the results.

EDIT: To clarify - I've just referenced Reddit in this post. I don't plan to
use it anywhere else.

~~~
codingdave
I'd drop any references to reddit - they are far from the only site with
upvotes and downvotes, and while they have a rabid fan base, they also have a
rabid detractor base. Better to just state your value props than to
deliberately inherit the baggage from a contentious site.

~~~
c3534l
Call it hacker news style instead.

~~~
username3
Call it stack overflow style instead.

~~~
GordonS
Sorry, another bad idea - SO's search engine is notoriously bad.

~~~
username3
Google is SO’s search engine. He’s making google with SO voting.

------
Johnie
How is this "Reddit-style"?

~~~
groceryheist
You upvote/downvote results. Crowdsourcing search ranking, what could go
wrong?

Edit: typo

~~~
ASalazarMX
Ideal for those times when the echo chamber is not enough.

