
Show HN: Chrome/FF plugin brings the HN and Reddit conversation to you as you surf - spenvo
http://www.metafruit.com/kiwi
======
ejcx
There are some pretty serious privacy issues with this extension.

There should be a way to block domains that you don't want searched, otherwise
secret URLs such as youtube private links, google docs, etc are exposed. One
scary thing is you are potentially exposing them to third parties, by
searching reddit, HN, and google.

This is a pretty good example of why people need to be wary of chrome
extensions they install. They sit in an advantageous position where violating
things like SOP, CSP Rules (depending on the browser) and more is okay by
using the background page.

~~~
spenvo
That was a major concern of mine. You can set the Research mode to "Off" by
default and check URLs on a case-by-case basis and still do custom searches.
Admittedly, it takes some of the serendipity out of it, but it addresses your
concern.

I considered (and was planning on) adding a "block URL" feature - but the
issue of how to store those sensitive URLs (to block) came up. Because
localStorage and sync storage in Chrome is not sandboxed or encrypted, the
blocked list would be "in the open" to other extensions. Yes, you could hash
the urls you'd hope to block, but then there would be no way to read that list
back to the user at a later point in time, and slight mismatches in URL
schemes would lead to an imperfect system. So simply toggling Research Mode
and researching pages of interest is the best option, IMO.

I don't cache any personal info in localStorage or sync storage (which at
least Chrome does not encrypt :< ). The api results are stored in a local
variable within the scope of the extension. And the "history" is a hashed and
padded blob.

This is also why I released it for both Chrome and FF, since some people
assign different use cases to different browsers. The code is also
public/open-source.

~~~
Schwolop
Maybe I missed it, but as far as I can tell, this comment is the only
documentation of what "research mode" means. You might want to address that
somewhere...

~~~
spenvo
No, you're right. Adding it now (thanks!) (update added to landing page and
pushing out to the Chrome/FF stores)

------
jagira
I built a kind of similar stuff as a side project in 2011 - [redacted]. It
allowed you to leave notes for a URL which your friends / followers can see
when they visit that URL.

In the beginning when I was testing this with my friends and colleagues, I
sent every URL a user visited to the server to check if any of his friends
have left any notes and then alert him via notification badges. I disabled it
when I started seeing a lot of private URLs (like Google Docs links with share
access) in server logs. I then changed the extension to query server only when
a user clicks on extension button.

This made it a bit safer, but the extension still needed access to all the
sites a user visits. And with Chrome's auto updation of extensions, one may
never know if the extension author has started sending every URL back to
server again.

After developing such extension, I am quite suspicious such extensions and
only install extensions from trusted authors (Buffer, Pocket, etc).

~~~
spenvo
I agree and will say that I'm as pleasantly surprised by the review process
Mozilla has for its add-ons -- as I am dismayed that Chrome has no equivalent
process. I'm in-queue of the Firefox review (takes on average 10 days) and
have exchanged emails with their volunteer-team on best practices to adopt.

Ultimately it comes down to winning the user's trust, and I'm trying to
address as many questions as I can up front.

 __In response to another comment, I 've also un-minified the Chrome extension
code and will keep it un-minified going forward (will take up to an hour to
propagate [update: fresh installs are now un-minified / and the current-
install base will get the update within 6 hours]) __

------
nichochar
How does this make money? What’s the plan?

It won’t. I figure the amazing APIs made available by Algolia, Reddit, Google
News (and hopefully more) are incurring the only ongoing expense, and I’ve
done my best to design the extension to respect their needs. My only need is
that people get something out of the effort. :)

Are you against making money on projects?

I hope to sell some funny stickers at some point! But in all seriousness, I’d
be happy to charge for a product that doesn’t depend on others’ APIs.
Ambitious projects that aren’t concerned with revenue are (more than likely)
destined to fail -> certainly there terrific exceptions to this...

I really like this idea of not charging for a service heavily based on other
people's APIs. How many nodejs programmers are out there trying to make dirty
money off other people's work + CSS ? It's shameful

~~~
nsgi
I disagree. Unless they're breaking licenses or taking credit for someone
else's work, it's not dirty money. Is it shameful for Apple to make money from
an OS based on FreeBSD?

~~~
bad_user
> _Is it shameful for Apple to make money from an OS based on FreeBSD?_

Yes, thankfully they aren't doing it.

~~~
nr0mx
They were, weren't they?

------
alistproducer2
I had an extension called Deeper History which I shut down out
security/privacy concerns very similar to yours. The solution I came up with
half worked. I used
[https://github.com/travist/jsencrypt](https://github.com/travist/jsencrypt)
to encrypt the sensitive data before storing it in IndexedDB.

The problem was I couldn't get it to work with public keys I created locally.
According to jsencrypt's github it should be possible. If you could get it to
work you could give security conscious people a way to safely cache stuff
locally.

Anyways if it would help to store user info on the client, I jut wanted to say
there is a viable way forward on that. I have the code to chunk and encrypt
stuff on the client if you're interested.

~~~
spenvo
Thanks for sharing your approach. In the end, I decided the API results didn't
need to be persisted in storage -- they get stored in a local variable and
jettisoned with Javascript's garbage collection. My primary concern was: to-
what scope did the data belong? (Check my other comments to see my
dissatisfaction with Chrome's dev-doc's language on this topic.) I decided
that the user history did not need to be precisely-known, so - my strategy
here was to hash the url, cut the hash to a much shorter string (to increase
likelihood of collisions), and then pad the shortened-hash with a random
number of characters on either side, and concatenate that string to a large
blob of text. The extension would then be able to perform an indexOf search on
that blob to be reasonably sure that the user had been to that URL. This uses
localStorage. (This keeps the extension from repeatedly querying commonly-
visited URLs but also does not prove you were at any given URL since
collisions are expected to happen.)

~~~
alistproducer2
No problem. Nice work and good luck!

~~~
spenvo
Thanks! BTW, I love the idea behind your extension -- because Chrome's history
search is almost always a frustrating let-down. Provided you can solve the
privacy issue, I'd use it in a heartbeat. I'd also be very curious to see your
final implementation

~~~
alistproducer2
Thanks. I'm currently spending my free time learning ML and strengthening up
my math. I'm not sure I would want to commit to supporting DH again. Maybe one
day though.

When I shut it down, DH had ~5K users and supporting it was work then. Not to
mention no employers really seemed to care about it so it kind of made me
wonder what I was doing it for. I made no money and I had people hitting me up
to fix this or that.

Be careful what you wish for!

------
lqdc13
I really liked StumbleUpon exactly for this feature until they ruined the
product during the monetization phase.

Edit: Having comments is really important. Maybe marked by color of source
(blue for reddit and orange for hacker news) and separated into submission
sections. Also important is the preservation of the original tree structure of
the submission comments.

~~~
spenvo
faceyspacey echoed that sentiment below too -- I've added expandable inline
comments to the todo list (thanks!)

------
fpgaminer
Great idea! I immediately thought "Why didn't I think of that!?"

With regards to the privacy concerns of Research mode, there may be a way
solution. For sites like Reddit, it should be possible to build a bloom
filter. Have the metafruit server actively spidering Reddit for new, popular
threads and add them to a bloom filter. The plugin would download the bloom
filter from the metafruit server at some regular interval. That way checking
whether any particular URL has an associated conversation is just a local
operation. Plus, it's faster than pinging an API, and burns less of the target
API's resources.

That would also provide a way to monetize, by giving out the metafruit bloom
filter to subscribers only. Or perhaps the free plugin can update its bloom
filter once a day, but subscribers can update once per hour.

~~~
atotic
I might enjoy using this. But, PRIVACY! Sending back every visited URL has
never been ok for any reason, first time I saw this idea shot down was in '93.

But there might be a way out:

I'd be willing to give up privacy of URL hashes. This is how I'd do it:

\- you already track a set of URLs that have discussions (I assume). If not,
you need to figure out how to seed these. Volunteers, APIs....

\- hash these URLs on server, and use a not-too-unique hash function. You want
to end up with a high collision rate, but not too high.

\- now, the client can query for conversations without revealing the URL it
has visited: \- ask server whether there are any conversations for a
particular hash. \- if server finds any, it returns { pageUrl: '',
conversationsUrls[]} \- now client can decide whether the url really matches,
or it was just a random hash collision.

\- I know this is not perfect. A privacy-busting determined enemy could
generate hashes of large number of public sites and use statistics to infer
what sites you've visited just from your hashes. But it'd be good enough for
me.

Bonus money-making idea: \- offer your plugin as a paid service to different
web communities. Increases their "community engagement".

I seriously contemplated starting an "annotations" startup in the 90s. Someone
else did, and they folded after a few years.

~~~
fowl2
This is kinda how Google's Safe Browsing[1] works, although with a few extra
layers, such as (IIRC) always requesting some random hashes when confirming
matches.

I read a better explanation on a mozilla mailing list once, but the key point
is that it tries /really/ hard not to disclose private data.

[1] [https://developers.google.com/safe-
browsing/developers_guide...](https://developers.google.com/safe-
browsing/developers_guide_v3#Overview)

------
jostmey
Funny, the plugin works on every website that I've tested except for just one
case - the plugins own homepage ;-)

~~~
spenvo
Ha, took a few minutes, but now it's showing up in the API results. :)

------
Houshalter
I have been using Reddit Check
([https://chrome.google.com/webstore/detail/reddit-
check/mllce...](https://chrome.google.com/webstore/detail/reddit-
check/mllceaiaedaingchlgolnfiibippgkmj?utm_source=chrome-app-launcher-info-
dialog)) for over a year and it's very nice. I love finding discussion on
random websites I find.

This extension appears to be better made and has more features. It's nice to
see discussion on HN too.

It does not work reliably. I clicked on a bunch of links from the HN front
page. Reddit check did find that they had been posted to reddit before, but
Kiwi did not. However all of those links had only been posted once, and had no
discussion. Still it seems strange it would say they had never been posted
before.

It was also unable to find youtube videos that had been posted before. Youtube
is terrible at unique URLs, and I don't blame it. However Reddit Check is able
to find all the different places youtube videos have been posted.

I found a link that had been posted to reddit hundreds of times. It only found
11 results. There was also an option for "fuzzy matches" which included a few
more links to the exact same URL, but also links that had nothing to do with
it. Reddit Check also has a problem where it only returns the first 25
results.

Clicking on any of the links closes the menu, so you can't open many links in
new tabs at once. This is also a problem with Reddit Check.

It does not find http versions of https links. Also a problem with Reddit
Check.

Clicking on the "submit to reddit" option opened a submission page, but not
with the URL in it.

I tried to look at the code but it was all squished together. It does not
appear possible to modify it anyway.

Anyway none of these are dealbreakers. I will be using this extension
alongside Reddit Check due to the extra features it has. I am concerned about
sending so many requests to reddit every time I open a new tab though.

~~~
spenvo
Terrific comment, thank you. First, I've un-minified the Chrome extension code
(only added 2kb in size), and it will remain that way going forward. It will
take up to an hour for it to propagate to the Chrome Web Store, but the
Firefox extension code is un-minified currently :
[https://addons.mozilla.org/en-US/firefox/addon/kiwi-
conversa...](https://addons.mozilla.org/en-US/firefox/addon/kiwi-
conversations/) (thanks to their review process, which requires it not be
minified). The Chrome code is also available on Github :
[https://github.com/sdailey/kiwi](https://github.com/sdailey/kiwi)

As for the results vs Reddit check -- maybe Reddit check uses a home-rolled
API that crawls more frequently than Reddit's official API? Could you either
tweet me the specific links or reply to this comment?

~~~
Houshalter
Some links that have been posted to reddit, but it can't see:

[http://www.symmetrymagazine.org/article/july-2015/one-
higgs-...](http://www.symmetrymagazine.org/article/july-2015/one-higgs-is-the-
loneliest-number)

[https://medium.com/recreating-megaman-2-using-js-
webgl](https://medium.com/recreating-megaman-2-using-js-webgl)

[http://calvertjournal.com/features/show/4458/owen-
hatherley-...](http://calvertjournal.com/features/show/4458/owen-hatherley-
postcards-landscapes-of-communism)

[https://github.com/MuseumofModernArt/collection/](https://github.com/MuseumofModernArt/collection/)

(If you search reddit for any of these URLs it goes straight to where they
have been posted.)

~~~
spenvo
Thanks. Maybe reddit's API filters out some results that have zero comments. I
looked for an attribute in their API that would ask for all results but
couldn't find anything. Also - I'd like to inspect Reddit Check's code but
can't find an open-source repo
[https://www.reddit.com/dev/api#GET_search](https://www.reddit.com/dev/api#GET_search)

~~~
Houshalter
I looked at Reddit Check's background page and checked the network requests it
was sending. It sends a request like this:

[https://www.reddit.com/api/info.json?url=http%3A%2F%2Fwww.sy...](https://www.reddit.com/api/info.json?url=http%3A%2F%2Fwww.symmetrymagazine.org%2Farticle%2Fjuly-2015%2Fone-
higgs-is-the-loneliest-number)

Weirdly it seems to send the same request several times. And it should also
use an &limit=100, so it gets 100 results instead of 25.

------
spenvo
3 days later: I have responded with an update to the extension that addresses
the privacy concerns here. Whitelists have been implemented, privacy defaults
have changed to start with Research Mode 'off', and commenter/Houshalter's
problems were fixed. Now Kiwi can fetch Reddit posts that have been hidden by
moderators. Full changelog report here:
[http://www.metafruit.com/kiwi/changelog/2015/08/06/kiwi-
conv...](http://www.metafruit.com/kiwi/changelog/2015/08/06/kiwi-
conversations-graduates-to-version-1-0-0-after-user-feedback/)

------
faceyspacey
i'd like to be able to see comments in the widget without having to go to
hacker news.

------
curiousjorge
does the searching happen on your machine (scrape google search results by
crafting a url query) or does it get routed to a central server that we are
forced to trust? If the latter, no way in hell this is going to be popular
around here.

~~~
spenvo
It uses Reddit's API, Algolia's HN API, and the Google News Api. Any of these
services can be toggled individually in settings.

[0] - Reddit -
[https://github.com/reddit/reddit/wiki/API](https://github.com/reddit/reddit/wiki/API)
[1] - HN - [https://hn.algolia.com/api](https://hn.algolia.com/api) [2] -
[https://developers.google.com/news-
search/v1/devguide#gettin...](https://developers.google.com/news-
search/v1/devguide#getting-started)

Also, it can be set to search a-la-carte by toggling Research Mode.

------
mingus68040
Wow, so now I can read uninformed, ignorant opinions anywhere I go on the web?
This is surely progress.

