Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Lumos, a local LLM Chrome extension (github.com/andrewnguonly)
146 points by andrewnguonly 5 months ago | hide | past | favorite | 39 comments
Lumos is an LLM co-pilot for browsing the web, powered by local LLMs (Ollama).

- Summarize long threads on issue tracking sites, forums, and social media sites

- Summarize news articles

- Ask questions about reviews on business and product pages

- Ask questions about long, technical documentation

- What else?

I have been looking for something like this! I noticed you suggest setting the ollama host as a variable (I personally run ollama on a beefier desktop and query remotely), hopefully that will translate over to the options page as well (I noticed the options page seems to have the ollama host hardcoded, though https://github.com/andrewnguonly/Lumos/pull/34/files#diff-83...)

Yes! I'm planning to migrate the Ollama hostname configuration to the Options page as well.

Until then, there's SSH port forwarding.

Most useful would be to browse Discord in the web view and summarize all the noisy chats

Would also be cool to have the extension click into every link of search results and spit out an aggregate of the results

Good idea. I'll try it with Whatsapp through web.whatsapp.com

Someone else mentioned Slack as well. I guess web apps for chat products are actually a big use case.

I hadn't considered Discord, but good idea! I personally use the Discord Mac app.

I'll look into figuring out how to parse the chats.

Would it be a good idea to allow people putting in their own parsers for any app they need using css selectors or a combination with some JS in settings of the extension?

Yup, still working on it! Open PR: https://github.com/andrewnguonly/Lumos/pull/34

Would be great to be able to query the LLM about search history, for example I know that I have seen something but can't remember exactly where or what.

Great idea. I created an issue to keep track of this.


http://rewind.ai is working on something like this as a product.

I have really been looking for something that summarizes all the websites I visit, and then pushes that into a vector database, to allow me to easily search through the contents of my history in an easy and fast way.

I too have been looking for a memex such as this, as well as a method for visualizing tabs I wish to save.

Shameless plug but this is precisely what Zenfetch does: https://zenfetch.com

Happy to elaborate more. We are gearing up for a wider HN launch soon

Ahh, the 2024 equivalent of having thousands of bookmarks, 99% of which will never see the light of day again.

This is sick! We've built a similar product for LLM-powered browsing at https://zenfetch.com

Admittedly, we use cloud hosted providers since we found the quality to be much higher, though awesome to see local versions as well.

Very cool! Have you considered local LLMs for a free/lite tier?

I do agree that 3rd party LLM services (e.g. OpenAI) are much more powerful. For basic tasks (e.g. summarization, Q&A), local LLMs seem to perform decently.

We have considered it, though most of our users use Zenfetch for things like

1. Research across an industry 2. Generating new content 3. Connecting ideas / building networked thought

For that reason we were deterred from using local models. Hopefully the Mistral team and similar keep pushing out new developments to get comparable with GPT-4 levels

I'm not very technical, but I use ollama on my Mac and know how to side-load browser extensions. Can you ELI5 what I need to do besides downloading and side-loading the extension, and ensuring I have the same model downloaded and running on my machine?

For installation, there's nothing else you need to do. It's ready to go.

If you want to configure a custom content parser for a specific website (i.e. websites you visit frequently), you'll need to modify `contentConfig.ts` according to the instructions here: https://github.com/andrewnguonly/Lumos?tab=readme-ov-file#cu...

Sorry, I realize these instructions are probably unclear. My approach for parsing content is a bit hacky. I need to create a video tutorial/documentation to show how to inspect the webpage and pick out content to parse. Created this issue to track: https://github.com/andrewnguonly/Lumos/issues/37

It looks like compilation is required, is that right? I don't have NPM installed, and would love to have a zipped version that I can just unzip and sideload. Please let me know if I'm misunderstanding what I'm supposed to do here.

Got it, you're right. I created a release that includes the packaged code (so you don't need to run `npm`, etc): https://github.com/andrewnguonly/Lumos/releases/tag/1.0.0

Unzip `lumos_1_0_0.zip` and you can load the unpacked extension.

I made an extension (its on github) to ask questions on selected text via chatgpt (by simply sending selected text). Can this do that too, as in asking question about the highlight text?

The app doesn't have this functionality, but I think it's a great idea. It's a nice (precision) alternative to the current scraping implementation.

Tracking issue here: https://github.com/andrewnguonly/Lumos/issues/40

Is there a firefox version?

I didn't build a dedicated Firefox version, but I suspect there's a way to install it without significant code changes. I actually just tried and I ran into an error: `background.service_worker is currently disabled`

I'll figure out a workaround and document the steps to install: https://github.com/andrewnguonly/Lumos/issues/36

I've been working on something in the same vein, though just using manual selections, or selections auto expanding from the visible viewport.

I really like the per-site parsers, which make a lot of sense

Congrats :)


I elaborate a little bit more on my thinking and approach for parsing in this blog post (4 min read): https://medium.com/@andrewnguonly/local-llm-in-the-browser-p...

I had similar ideas on the back burner for per site scripts. MV3 sure makes things annoying at times! I want to build something that's kind of a mix between what you're doing here and what I've been experimenting with.

Did you experiment with the sidePanel api yet? You can open it, but the <input> will not get focus like with the popups. You'd think at least with the commands keybindings, or reacting to the browser action icon click this should be possible?!

I asked Lumos some questions and it's hitting the embeddings endpoint many times, and then when I asked a follow up question, it hit the same endpoints again, each taking a few hundred ms. It's not caching them? TBD?

p.s. iirc There's some way to set the key in the manifest such that you'll get a static id, useful for OLLAMA_ORIGINS where you don't need * in the examples.

p.p.s starling-lm is really great for a default local model for these types of things

edit: https://github.com/andrewnguonly/Lumos/pull/38

I did not look into the sidePanel API yet. I will look though. I'm still fairly new to Chrome extension development. I just discovered the Options page heh...

Regarding the caching -- the entire RAG workflow needs an update. I'll spend some time on it. Thanks for calling out and thanks for the PR!

Can you give it YouTube video transcriptions?

I haven't tried it, but in general, it accepts any text.

Or are you asking if it can automatically get the transcript from a YouTube video? At the moment, it can't do the latter. I do plan to support multi-modal LLMs (llava via Ollama), but I'm not sure how the mechanics of the extension will work yet. Open issue: https://github.com/andrewnguonly/Lumos/issues/27

I was thinking specifically about getting it the youtube-provided transcript of a video you're watching, rather than doing the transcription itself. With access to the transcript, I could say to it "find me anything in my browser history related to this subject" (or whatever) and have it understand what I've been watching.

http://YouTubetranscript.com might be of use here.

This is great! I would love to extend it to hide sponsored results, ads, and call-to-action popups.

Would it have any practical applications that you can see beyond uBlock Origin, or is it just for the cool factor?

I imagine you'd use it in combination with uBlock Origin. The latter would prevent ads from being loaded in the first place, but when the site itself is presenting you with sponsored results, email signup overlays, and so on, you'd use a content filter like this to prevent them from rendering.

I know you can accomplish a lot of this kind of direct content filtering with uBlock Origin, but as far as I'm aware you have to configure e.g. CSS selectors per site.

why do you have to do it with a local LLM? Plenty of extensions do the same stuff with more powerful LLM.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact