Show HN: Autolicious – AI-powered bookmark cataloging Chrome extension

apetresc · on Oct 23, 2023

Logical direction, but the real killer feature would be for an LLM to read every page you bookmark and not just categorize it once for you, but add it to a queryable knowledge-base that you can reference at any point in future conversations.

coolvision · on Oct 23, 2023

There is a nice opensource extension that does exactly that: https://github.com/memex-life/memex

I tried to use it, and it just does not work very well, I don't it's because implementation is bad, it's just RAG (retrieval based generation) does not work well outside of some simple use cases.

danielbln · on Oct 23, 2023

What are those simple use cases, and where do you see Retrieval Augmented Generation fall over?

coolvision · on Oct 23, 2023

I think it works better if query is a larger chunk of text. Like, if you have an email from a customer and want to compose a response based on some relevant documentation, it should work well.

But for a use case where you want to retrieve something from browsing history you would mainly use a short search query, just few words. in this case embeddings are too ambiguous and relevance of retrieved content is not great.

janalsncm · on Oct 24, 2023

That’s not a problem with RAG itself that’s an issue with your retriever. In the original RAG paper they used two vanilla BERT models and cosine similarity but there’s no requirement you do that. Use any retriever that gets you high precision. Use BM25 if you want, it’s simple and cheap.

You’re right in saying there’s not enough semantic meaning in the text of the query. The domain of queries and the domain of documents are very different. That’s why a real retrieval system will train the query encoder and doc encoder to be closer in their embedding space using click data. This is what Google is doing.

ayaanmomin · on Oct 24, 2023

"train the query encoder and doc encoder to be closer in their embedding space using click data" <- Any papers/resources you know where I can learn more about this process?

janalsncm · on Oct 24, 2023

Triplet loss.

https://pytorch.org/docs/stable/generated/torch.nn.TripletMa...

Triplet loss takes an anchor, positive, and negative. In this case the anchor is your query, the positive is a similar doc, and the negative is a dissimilar doc. When you train, backpropagate the loss to both the doc and the query encoder.

visarga · on Oct 23, 2023

alway wonder why browsers don't keep at least a text version of the pages for search

nowadays I expect browsers to incorporate small LLMs like Mistral out of the box

coolvision · on Oct 23, 2023

if history search worked better then you would visit google/bing less, so I expect pushback from browser vendors

also: https://bugs.chromium.org/p/chromium/issues/detail?id=297648

TeMPOraL · on Oct 23, 2023

From a comment on that issue page, dating a little over a decade ago:

> Shouldn't this be left to users to decide whether they want a feature or not.

Yes, those were good times. Nowadays, it's anathema to software vendors, and increasingly even to open source devs.

poyu · on Oct 24, 2023

Not every browser vendor runs a search engine…

janalsncm · on Oct 24, 2023

Apple and Mozilla aren’t Google. Apple runs their own searches on the App Store and iPhone, but Google pays them to be default. There’s no reason Apple couldn’t also index your search history in a more meaningful way other than perhaps space requirements.

totetsu · on Oct 24, 2023

Another advertising revenue win for Query Jerry from marketing.

patapong · on Oct 24, 2023

Opera used to do this! And then when you searched in the top bar, it would also search your visited pages. It was very useful.

eagrwl · on Oct 23, 2023

Zenfetch (https://www.zenfetch.com/) does an awesome job at this!

purplecats · on Oct 23, 2023

the ability to do this is pretty accessible nowadays, its just a little expensive for normal use for the layman

jondwillis · on Oct 24, 2023

That gets real messy real fast!

bobjordan · on Oct 23, 2023

I had an issue with my bookmarks where I used chrome for about a decade and then one fateful day, I allowed Microsoft edge to try to clone my bookmarks. It corrupted everything. Now I have hundreds of copies of duplicated folders and links. It would be great if this or another tool could help get it restored.

fortunateregard · on Oct 24, 2023

That sounds awful. Can you export your bookmarks as csv or json? If so, you can use duckdb (or xsv) to clean any duplicates.

While you're at it, I'd recommend to consider uploading the new bookmarks to a dedicated bookmark manager and avoid having the browser manage your bookmarks. For folks who rely on bookmarks, dedicated bookmark managers offer great value (eg. auto-archiving a copy of every bookmark to combat link rot). I currently use Raindrop, but if I were to start fresh I'd go with Linkwarden instead.

enoch2090 · on Oct 24, 2023

This is just so Edge. Unrelated with this post - I used to hate Edge so much for poping out and ask to replace Chrome. Therefore I insisted on using Chrome for years. One day I misclicked, and found out that Edge had exactly cloned my Chrome, and I was like: fine, I'll give it a try. Now Edge wins.

coolvision · on Oct 24, 2023

the tool is called chatGPT ) just export your bookmarks to html, send to gpt-4, ask it to de-duplicate and organize, most likely it will work well.

TheRoque · on Oct 24, 2023

I also wanted to do something like this. And I had other ideas for extensions using LLM to enhence the experience on the browser. Another one I wanted to do was a focus helper that notifies you (or block you ?) when you're not in a tab related to your work. You set a goal for your current session, e.g. "I'm currently working on my backend server", and the extension reacts on tabs unrelated to that (e.g. news websites or cat videos).

I think we're gonna be close to that, there's already models running in the browser (with https://webllm.mlc.ai/) the next step is to make it more efficient.

coolvision · on Oct 24, 2023

hey we have similar ideas ) I made a focus helper thing as well: https://grgv.xyz/blog/awf/ webllm looks cool, but I would need to upgrade my laptop for it...

TheRoque · on Oct 24, 2023

Exactly what I had in mind, you nailed it. With some focus on improving the UI (a list of premade prompts, or a simpler way to make them) it could become widely used. Also, I think using the OpenAI backend is a huge blocked. I wonder how a small quantized model running in the browser would perform. Sad thing is that for now WebGPU is not running on Linux' browsers. Maybe CPU inference is enough.

coolvision · on Oct 24, 2023

unfortunately only gpt-4 worked well in my experience, smaller models would work well only for blocking simple things like "cat videos page", but not for anything else less trivial. I have another proof-of-concept where smaller model fails compared to gpt-4: https://grgv.xyz/blog/apc/

smusamashah · on Oct 24, 2023

This is very helpful, exactly what I was looking for. I am using a custom local extension which submits bookmarks to https://dynalist.io/ and was thinking of integrating chatgpt with that somehow.

I haven't tried it yet but looking at screenshots, it will be nice if it supports importing and exporting bookmarks as plaintext. Even better a custom store for bookmarks.

anfogoat · on Oct 24, 2023

I wonder how far off we are from a time where functionality like this will not require you to announce your daily activities to OpenAI or any other third party.

This looks so useful to me but at the same time I could never use this, or the hundred other insanely useful things like this that all rely on OpenAI or some other equally dubious party. (Dubious meaning the party being someone other than me.)

klavinski · on Oct 24, 2023

If you want a similar extension which works without sending data to an external server, I made Pinbot as an experiment: https://getpinbot.com

SunghoYahng · on Oct 23, 2023

Thank you. I really needed it.

SunghoYahng · on Oct 23, 2023

But is it possible to make this work for bookmarks (or better yet, links) that already exist? It will be great if it can.

coolvision · on Oct 24, 2023

would be nice yea, but that would need more work than MVP made over a weekend...