Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Pinbot – An extension to privately search one's browser history with AI (getpinbot.com)
98 points by klavinski 9 months ago | hide | past | favorite | 36 comments
Hello HN, I’m Kamil.

The past months have been filled with news about ChatGPT, Bard, etc. Thankfully, there are some heroic attempts to bring that power to the users.

I wanted to contribute to that effort with my side project, an extension for Chrome: it makes searching the history by meaning – instead of the exact words – possible.

This is only a proof of concept, building on the excellent transformers.js[0], and running entirely in the browser. My goal here is to explore the possibilities unlocked by a client-side AI.

I would love to have your feedback, to know which direction that project should follow!

[0] https://xenova.github.io/transformers.js

Any plans for a Firefox version? There are dozens of us.

I targeted Chrome for the proof of concept because of its market share, but to be honest, I use and prefer Firefox as my default browser.

Manifest v3 is a remarkably hostile development environment: Google knows that people block advertisements with extensions and want to limit their scope. I had to adapt a lot of code, and as such, making it work cross-browser would require more than a few changes. But I understand your point and want Pinbot to run on more browsers, especially Firefox!

Can chatgpt do it for you?

Update. Unfortunately I noticed my Mac getting hot, and I checked the activity monitor, energy impact was several thousand (whatever units Apple uses). Typically energy impact of Chrome is 120 or so. I removed the plugin (which also crashed once) and as soon as I did, the energy use dropped back to normal.

Thank you for letting me know, this information is crucial. The CPU-only compatibility of transformers.js might make it too slow for text-heavy websites. I will immediately study a way to throttle it.

However, the Chrome web store review might take some time to allow the update.

I assume a lot of processing is happening in a per tab context, I wonder how easy a queue would work in the browser? An optional local self-hosted server script might help.

Actually, a tab only sends its content to a unique offscreen document[0], and because of the single-threaded nature of JS, it works as a queue. So, you are right, throttling might be feasible without too much hurdle.

[0] https://developer.chrome.com/docs/extensions/reference/offsc...

I would like something similar to VoidTools Everything that allows a person to search their own computer using AI to generate answers instead. (But it should not send any information to third parties such as OpenAI.)

VoidTools Everything is truly an excellent tool! I have indeed considered making a desktop version. It is more complex than the current proof of concept, but a private one-stop shop for AI search is definitely a great vision!

Sadly not related to the 1990 NES Game (https://en.wikipedia.org/wiki/Pin_Bot_(video_game))

Or the Williams original, which was the first pin I ever played (at the local roller rink, no less).

I think you can only search pages that you visited after you installed the plugin, is that correct?

Currently, yes. Allowing a user to crawl his recent history just after installing the extension is a great idea! I added it on the Discord server.

Does it work for PDFs too? It would be amazing to find any paragraph.

I noticed it doesn't remember tweets directly, just twitter.com as url if you check the feed. It's hard to find again a tweet at that location.

It does not work yet for PDFs, but I agree it would be amazing. Full-text search for one's library and documents!

Thank you for reporting the bug regarding Twitter. I will investigate.

Probably a separate application extendable with plugins to search different media would be a better solution than a browser based one. Say an user wants to find a song suitable for a certain mood while reading a book on a certain subject, then one day someone could add a plugin that connects to the house IoT network to choose the best lighting and aromas based on the same information used by the above plugins, etc. The point is that the user input might not be a text line but a combination of it plus other data obtained by sensors like weather, temperature, .. pretty much everything, heart rate, etc.

Hey Nice idea I like the concept as someone who does sometimes go back through old history files to find that site I was on last week/month and knows how frustrating it can be as an experience.

One question I have is about the persistence of this extension when Im not using it and influence on other browsing loads like if I visit a WebGPU heavy shader site will having Pinbot installed drop my available framerate for example ?

Otherwise its a great Idea and will definitely put it on at least one machine I use so thanks for putting this out into the world and good luck with it !

The extension uses an SQLite database in the Origin Private File System[0]. Disabling the extension keeps the database, while removing it deletes the database.

Regarding performance, here is how it works: the extension accumulates page changes (thanks to a Mutation Observer[1], so I do not have to regularly read and compare the page) for some time, then checks if the sentences are in the database. Only unknown sentences are converted to embeddings.

The extension is CPU-only currently (WebGPU support was not merged yet in transformers.js), so it may be slow. I understand your concern, while that is a proof of concept, I consider a good performance to be vital to a good user experience.

[0] https://developer.chrome.com/blog/sqlite-wasm-in-the-browser... [1] https://developer.mozilla.org/en-US/docs/Web/API/MutationObs...

Firefox is much better at resurfacing sites that you've been to before. There's even a built-in address bar search shortcut of '^' which searches within just your history.

Chrome is obviously incentivized to push you to making a Google search anytime you're trying to find something, instead of looking within your browser.

This is a great idea, but one thing that I immediately wished for is that it's not tied to a specific browser instance or even a browser. I'd much prefer some kind of central indexing server that maintains the embeddings along with metadata from all the various sources and allows querying, setting retention periods etc, and with extensions like this one transparently feeding data into it.

How is that private?

You are fetching the browsing data, push it back to your server to be fed to AI, then receive queries.. right?

No, that is what I find exciting! The AI model runs entirely on your device, and your data is never sent anywhere. You can inspect the Developer tools of the extension if you are interested: it works offline!


I like this a lot, but I would like it to be easier to:

Prevent it from indexing sites I say, such as my banking website, etc. I think that should be a top priority. And having a strong privacy policy on the site (I know you say it's local, etc), but this is pretty great and I am already enjoying it.

Thank you the for the idea! I will add "allow a user-defined forbidden list of websites" to the ideas on the Discord server.

Regarding the privacy policy, you have a point: I did not put one on the website, as everything works offline, but people may indeed look for one.

That is very useful. It seems to only work on sites you have visited since installing. It would be nice if it could index your current history at installation time, even though it wont have access to the contents of those pages (probably).

This seems cool but to aid trust in these sorts of things, perhaps a UserScripts version rather than a packaged Chrome extension with easy to read code would be an easier ask? Tampermonkey and Greasemonkey etc still work.

I often prototype in Greasemonkey myself, so I understand and agree with your point. However, there are many requirements (the AI model weighs 90Mb; I run it in a sandboxed iframe because it uses `eval` and I want to guarantee it would not do something bad; but initialising the iframe and loading the model on every page would be quite cumbersome; etc.) that made more sense in an extension.

Does it keep history only for 2 weeks?

> The current extension keeps your history for only two weeks. Accounts keep all your history and synchronize it across your devices, while maintaining your privacy. Upcoming in a future version!

Yes, for the proof of concept. It is an arbitrary limit, as I am not completely sure how people would use the extension: it may fill the users' storage too quickly. In the future, I may consider adding a counter instead (removing websites which have not been visited/searched for X days).

After an hour my laptop was about to crash so I uninstalled it. It was using 8Gb of ram out of 8.

That information is very important for me, thank you.

Currently, to avoid computing the embedding of a sentence twice, I put them in a JS Map as a cache. I will find a way to empty the cache.

Did anyone else think Pinball?

Would be great if you could create this for bookmarks.

How would you want to do it? Among the fields, have a checkbox "search among the bookmarks"?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact