Hacker News new | comments | ask | show | jobs | submit login
Show HN: Chrome/FF plugin brings the HN and Reddit conversation to you as you surf (metafruit.com)
55 points by spenvo on Aug 2, 2015 | hide | past | web | favorite | 53 comments

There are some pretty serious privacy issues with this extension.

There should be a way to block domains that you don't want searched, otherwise secret URLs such as youtube private links, google docs, etc are exposed. One scary thing is you are potentially exposing them to third parties, by searching reddit, HN, and google.

This is a pretty good example of why people need to be wary of chrome extensions they install. They sit in an advantageous position where violating things like SOP, CSP Rules (depending on the browser) and more is okay by using the background page.

That was a major concern of mine. You can set the Research mode to "Off" by default and check URLs on a case-by-case basis and still do custom searches. Admittedly, it takes some of the serendipity out of it, but it addresses your concern.

I considered (and was planning on) adding a "block URL" feature - but the issue of how to store those sensitive URLs (to block) came up. Because localStorage and sync storage in Chrome is not sandboxed or encrypted, the blocked list would be "in the open" to other extensions. Yes, you could hash the urls you'd hope to block, but then there would be no way to read that list back to the user at a later point in time, and slight mismatches in URL schemes would lead to an imperfect system. So simply toggling Research Mode and researching pages of interest is the best option, IMO.

I don't cache any personal info in localStorage or sync storage (which at least Chrome does not encrypt :< ). The api results are stored in a local variable within the scope of the extension. And the "history" is a hashed and padded blob.

This is also why I released it for both Chrome and FF, since some people assign different use cases to different browsers. The code is also public/open-source.

Maybe I missed it, but as far as I can tell, this comment is the only documentation of what "research mode" means. You might want to address that somewhere...

No, you're right. Adding it now (thanks!) (update added to landing page and pushing out to the Chrome/FF stores)

Nice work. How about a whitelist mode? Basically, the research mode + the ability to add sites (e.g., news sites, blogs, etc.) to the whitelist.

I like the idea of adding a whitelist to complement using the extension with Research Mode turned "off". I'll add that to the todo list! Thanks!

I want to echo that sentiment (maybe to push it up higher on your list!) -- a white list and/or black list would be very helpful.

I would also like to express my appreciation for making a firefox addon as well. I feel like ff is often left by the wayside.

In my extension (thinkcontext) to get around leaking browsing behavior to 3rd party sites I download the complete data set. That way all queries are local which has the added benefit of low latency.

Sounds fantastic for your extension, but it would be too heavy for Kiwi. The entire Reddit, HN, and Google News data sets would (easily) be too large

> Because...sync storage in Chrome is not sandboxed or encrypted, the blocked list would be "in the open" to other extensions.

Unless things have changed from a couple of years ago (highly doubt it), I don't think this is the case. https://groups.google.com/a/chromium.org/d/msg/chromium-exte...

chrome.storage docs: "Confidential user information should not be stored! The storage area isn't encrypted." https://developer.chrome.com/extensions/storage

I was intending to just dispute that 'the blocked list would be "in the open" to other extensions'.

Why couldn't you use chrome.storage.local or the background page's localStorage? Either should protect the data from other extensions and not expose it to third parties.

I wanted explicit language from Chrome's docs about how it scopes storage.

Firefox provided it: "The simple storage module exports an object called storage that is persistent and scoped to your add-on." https://developer.mozilla.org/en-US/Add-ons/SDK/High-Level_A...

But all Chrome's docs said were the previously mentioned warning about not using it to store confidential user info.

So, to answer your question: caution

Actually, if you go back to my init commit on Github - you can see that I originally used the chrome.storage API. I extricated it from the code - except for the non-sensitive settings

It doesn't help in this case since they're third-party APIs, but if you controlled the backend it seems like a good application for bloom filters.

Still, the problem inherent to the concept is that it leaks the users' history to third-parties.

Its default usage does. The idea is that it returns enough value in exchange to be worth it. It also has a "Custom Search" feature that is useful without sharing your history.

The problem isn't just private links. Even knowing about the public sites that someone visits can expose information about their politics, religion, sexual orientation and other things that they may want to keep private (especially in countries where they can be discriminated against or even killed based on this information).

One would hope there aren't too many things you genuinely want to keep private where the only protection is an unguessable URL. That's never going to be terribly secure. Google Docs you have to go out of your way to create a doc that works that way.

Nearly every website's password reset links are a secret in the URL.

Hopefully single-use!

It's too bad that there's no standard way to flag part/all of a URL as secret as a hint to caching engines and the like.

I built a kind of similar stuff as a side project in 2011 - Newzupp. It allowed you to leave notes for a URL which your friends / followers can see when they visit that URL.

In the beginning when I was testing this with my friends and colleagues, I sent every URL a user visited to the server to check if any of his friends have left any notes and then alert him via notification badges. I disabled it when I started seeing a lot of private URLs (like Google Docs links with share access) in server logs. I then changed the extension to query server only when a user clicks on extension button.

This made it a bit safer, but the extension still needed access to all the sites a user visits. And with Chrome's auto updation of extensions, one may never know if the extension author has started sending every URL back to server again.

After developing such extension, I am quite suspicious such extensions and only install extensions from trusted authors (Buffer, Pocket, etc).

I agree and will say that I'm as pleasantly surprised by the review process Mozilla has for its add-ons -- as I am dismayed that Chrome has no equivalent process. I'm in-queue of the Firefox review (takes on average 10 days) and have exchanged emails with their volunteer-team on best practices to adopt.

Ultimately it comes down to winning the user's trust, and I'm trying to address as many questions as I can up front.

In response to another comment, I've also un-minified the Chrome extension code and will keep it un-minified going forward (will take up to an hour to propagate [update: fresh installs are now un-minified / and the current-install base will get the update within 6 hours])

How does this make money? What’s the plan?

It won’t. I figure the amazing APIs made available by Algolia, Reddit, Google News (and hopefully more) are incurring the only ongoing expense, and I’ve done my best to design the extension to respect their needs. My only need is that people get something out of the effort. :)

Are you against making money on projects?

I hope to sell some funny stickers at some point! But in all seriousness, I’d be happy to charge for a product that doesn’t depend on others’ APIs. Ambitious projects that aren’t concerned with revenue are (more than likely) destined to fail -> certainly there terrific exceptions to this...

I really like this idea of not charging for a service heavily based on other people's APIs. How many nodejs programmers are out there trying to make dirty money off other people's work + CSS ? It's shameful

I disagree. Unless they're breaking licenses or taking credit for someone else's work, it's not dirty money. Is it shameful for Apple to make money from an OS based on FreeBSD?

> Is it shameful for Apple to make money from an OS based on FreeBSD?

Yes, thankfully they aren't doing it.

They were, weren't they?

What does NodeJS has to do with anything?

I had an extension called Deeper History which I shut down out security/privacy concerns very similar to yours. The solution I came up with half worked. I used https://github.com/travist/jsencrypt to encrypt the sensitive data before storing it in IndexedDB.

The problem was I couldn't get it to work with public keys I created locally. According to jsencrypt's github it should be possible. If you could get it to work you could give security conscious people a way to safely cache stuff locally.

Anyways if it would help to store user info on the client, I jut wanted to say there is a viable way forward on that. I have the code to chunk and encrypt stuff on the client if you're interested.

Thanks for sharing your approach. In the end, I decided the API results didn't need to be persisted in storage -- they get stored in a local variable and jettisoned with Javascript's garbage collection. My primary concern was: to-what scope did the data belong? (Check my other comments to see my dissatisfaction with Chrome's dev-doc's language on this topic.) I decided that the user history did not need to be precisely-known, so - my strategy here was to hash the url, cut the hash to a much shorter string (to increase likelihood of collisions), and then pad the shortened-hash with a random number of characters on either side, and concatenate that string to a large blob of text. The extension would then be able to perform an indexOf search on that blob to be reasonably sure that the user had been to that URL. This uses localStorage. (This keeps the extension from repeatedly querying commonly-visited URLs but also does not prove you were at any given URL since collisions are expected to happen.)

No problem. Nice work and good luck!

Thanks! BTW, I love the idea behind your extension -- because Chrome's history search is almost always a frustrating let-down. Provided you can solve the privacy issue, I'd use it in a heartbeat. I'd also be very curious to see your final implementation

Thanks. I'm currently spending my free time learning ML and strengthening up my math. I'm not sure I would want to commit to supporting DH again. Maybe one day though.

When I shut it down, DH had ~5K users and supporting it was work then. Not to mention no employers really seemed to care about it so it kind of made me wonder what I was doing it for. I made no money and I had people hitting me up to fix this or that.

Be careful what you wish for!

I really liked StumbleUpon exactly for this feature until they ruined the product during the monetization phase.

Edit: Having comments is really important. Maybe marked by color of source (blue for reddit and orange for hacker news) and separated into submission sections. Also important is the preservation of the original tree structure of the submission comments.

faceyspacey echoed that sentiment below too -- I've added expandable inline comments to the todo list (thanks!)

Great idea! I immediately thought "Why didn't I think of that!?"

With regards to the privacy concerns of Research mode, there may be a way solution. For sites like Reddit, it should be possible to build a bloom filter. Have the metafruit server actively spidering Reddit for new, popular threads and add them to a bloom filter. The plugin would download the bloom filter from the metafruit server at some regular interval. That way checking whether any particular URL has an associated conversation is just a local operation. Plus, it's faster than pinging an API, and burns less of the target API's resources.

That would also provide a way to monetize, by giving out the metafruit bloom filter to subscribers only. Or perhaps the free plugin can update its bloom filter once a day, but subscribers can update once per hour.

I might enjoy using this. But, PRIVACY! Sending back every visited URL has never been ok for any reason, first time I saw this idea shot down was in '93.

But there might be a way out:

I'd be willing to give up privacy of URL hashes. This is how I'd do it:

- you already track a set of URLs that have discussions (I assume). If not, you need to figure out how to seed these. Volunteers, APIs....

- hash these URLs on server, and use a not-too-unique hash function. You want to end up with a high collision rate, but not too high.

- now, the client can query for conversations without revealing the URL it has visited: - ask server whether there are any conversations for a particular hash. - if server finds any, it returns { pageUrl: '', conversationsUrls[]} - now client can decide whether the url really matches, or it was just a random hash collision.

- I know this is not perfect. A privacy-busting determined enemy could generate hashes of large number of public sites and use statistics to infer what sites you've visited just from your hashes. But it'd be good enough for me.

Bonus money-making idea: - offer your plugin as a paid service to different web communities. Increases their "community engagement".

I seriously contemplated starting an "annotations" startup in the 90s. Someone else did, and they folded after a few years.

This is kinda how Google's Safe Browsing[1] works, although with a few extra layers, such as (IIRC) always requesting some random hashes when confirming matches.

I read a better explanation on a mozilla mailing list once, but the key point is that it tries /really/ hard not to disclose private data.

[1] https://developers.google.com/safe-browsing/developers_guide...

Then there has to be a whole backend infrastructure, though. Lots more time/effort/money involved in that solution. Right now there's no recurring cost for the developer.

Funny, the plugin works on every website that I've tested except for just one case - the plugins own homepage ;-)

Ha, took a few minutes, but now it's showing up in the API results. :)

I have been using Reddit Check (https://chrome.google.com/webstore/detail/reddit-check/mllce...) for over a year and it's very nice. I love finding discussion on random websites I find.

This extension appears to be better made and has more features. It's nice to see discussion on HN too.

It does not work reliably. I clicked on a bunch of links from the HN front page. Reddit check did find that they had been posted to reddit before, but Kiwi did not. However all of those links had only been posted once, and had no discussion. Still it seems strange it would say they had never been posted before.

It was also unable to find youtube videos that had been posted before. Youtube is terrible at unique URLs, and I don't blame it. However Reddit Check is able to find all the different places youtube videos have been posted.

I found a link that had been posted to reddit hundreds of times. It only found 11 results. There was also an option for "fuzzy matches" which included a few more links to the exact same URL, but also links that had nothing to do with it. Reddit Check also has a problem where it only returns the first 25 results.

Clicking on any of the links closes the menu, so you can't open many links in new tabs at once. This is also a problem with Reddit Check.

It does not find http versions of https links. Also a problem with Reddit Check.

Clicking on the "submit to reddit" option opened a submission page, but not with the URL in it.

I tried to look at the code but it was all squished together. It does not appear possible to modify it anyway.

Anyway none of these are dealbreakers. I will be using this extension alongside Reddit Check due to the extra features it has. I am concerned about sending so many requests to reddit every time I open a new tab though.

Terrific comment, thank you. First, I've un-minified the Chrome extension code (only added 2kb in size), and it will remain that way going forward. It will take up to an hour for it to propagate to the Chrome Web Store, but the Firefox extension code is un-minified currently : https://addons.mozilla.org/en-US/firefox/addon/kiwi-conversa... (thanks to their review process, which requires it not be minified). The Chrome code is also available on Github : https://github.com/sdailey/kiwi

As for the results vs Reddit check -- maybe Reddit check uses a home-rolled API that crawls more frequently than Reddit's official API? Could you either tweet me the specific links or reply to this comment?

Some links that have been posted to reddit, but it can't see:





(If you search reddit for any of these URLs it goes straight to where they have been posted.)

Thanks. Maybe reddit's API filters out some results that have zero comments. I looked for an attribute in their API that would ask for all results but couldn't find anything. Also - I'd like to inspect Reddit Check's code but can't find an open-source repo https://www.reddit.com/dev/api#GET_search

I looked at Reddit Check's background page and checked the network requests it was sending. It sends a request like this:


Weirdly it seems to send the same request several times. And it should also use an &limit=100, so it gets 100 results instead of 25.

These links turned out to be extremely valuable feedback. Reddit's response is formed differently for them (as opposed to what it gives 95% of the time). I'll have an update very soon. [Update: new version has pushed to Chrome -- Kiwi should be processing all Reddit results now. Firefox update will be pushed tonight.]

Thank you so much for the fast response. I can't edit my comment now, but I will change my review in the chrome store.

3 days later: I have responded with an update to the extension that addresses the privacy concerns here. Whitelists have been implemented, privacy defaults have changed to start with Research Mode 'off', and commenter/Houshalter's problems were fixed. Now Kiwi can fetch Reddit posts that have been hidden by moderators. Full changelog report here: http://www.metafruit.com/kiwi/changelog/2015/08/06/kiwi-conv...

i'd like to be able to see comments in the widget without having to go to hacker news.

does the searching happen on your machine (scrape google search results by crafting a url query) or does it get routed to a central server that we are forced to trust? If the latter, no way in hell this is going to be popular around here.

It uses Reddit's API, Algolia's HN API, and the Google News Api. Any of these services can be toggled individually in settings.

[0] - Reddit - https://github.com/reddit/reddit/wiki/API [1] - HN - https://hn.algolia.com/api [2] - https://developers.google.com/news-search/v1/devguide#gettin...

Also, it can be set to search a-la-carte by toggling Research Mode.

Wow, so now I can read uninformed, ignorant opinions anywhere I go on the web? This is surely progress.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact