Web LLM

ccozan · on April 25, 2023

I find theses affirmations a bit contradictory:

"This opens up a lot of fun opportunities to build AI assistants for everyone"

"We have tested it on windows and mac, you will need a gpu with about 6.4G memory."

capableweb · on April 25, 2023

Yeah, I don't think it's super common to have that available, in most segments.

Although I think Apple doesn't sell computers with less than 8GB RAM, which is used by the GPU as well as the memory is unified.

And latest Valve survey says that at least ~54% of people using Steam who participate in the survey, has a GPU with 8GB or more.

According to https://web3dsurvey.com/webgpu WebGPU enjoys a whopping ~6% browser support.

But the previous two segments are very small ones, if you compare to everywhere computers are being used. Coupled together with that WebGPU is hardly enabled anywhere, I wouldn't be surprised if maybe only ~1% of everyone using their own computer have access to 8GB VRAM + can use WebGPU.

So yeah, bit weird to say everyone when looking at the requirements of 8GB VRAM + WebGPU enabled browser.

flohofwoe · on April 25, 2023

> WebGPU enjoys a whopping ~6% browser support.

That's actually not too bad considering that WebGPU hasn't been released yet (the number will drastically go up when Chrome 113 is out of beta)

capableweb · on April 25, 2023

> the number will drastically go up when Chrome 113 is out of beta

Which coincidentally is happening in about 5 hours (but it'll take many days for the update to actually go everywhere) according to https://chromestatus.com/roadmap, exciting :)

youssefabdelm · on April 25, 2023

Can someone explain why these types of projects are compelling (running an LLM in the user's browser?)

I guess privacy and ease-of-use?

So you don't need to download llama.cpp, and run it in the terminal or something?

Veen · on April 25, 2023

Yeah, ease of use. I haven't wanted to bother with running local LLMs until now because I expected it would be complicated and time consuming (and I'm technically competent). I managed to get this up-and-running in about five minutes.

sourcecodeplz · on April 25, 2023

I thought the same but llama.cpp is literally two downloads in the same folder and then you double click an .exe

woah · on April 25, 2023

Reminds me of the HN comments on the original dropbox announcement: https://news.ycombinator.com/item?id=8863

AmericanChopper · on April 25, 2023

I know this comment is a bit of a meme, but in my experience “well actually you can already do this on Linux” is almost universally a calling card of an impending bad opinion.

tpoacher · on April 25, 2023

Why bad? OP from 2007 still has a point.

Actually, given dropbox's deterioration over time, FTP+SVN is sounding pretty good to me right now.

AmericanChopper · on April 25, 2023

Because the “do it yourself on linux” advocate tends to ignore all of the reasons why a person might prefer an easy to use, 3rd party managed services. Which could include not wanting to accept all of the compromises required to use Linux for personal computing, not having the competence to implement a Linux solution, or not having the motivation or time to maintain their own Linux-based services. So even when the solutions they advocate are perfectly decent, it tends to be associated with a lack of insight into how small of a segment people with the same competencies and preferences as them represent.

EGreg · on April 25, 2023

You lost me at .exe

b_mc2 · on April 25, 2023

A reason I like it is I have an "older" AMD GPU which is no longer supported by ROCm (sort of AMDs version of Cuda) which means running locally I'm either trying to figure out older ROCm builds to use my GPU and running into dependency issues or using my CPU which isn't that great either. But with WebGPU I'm able to run these models on my GPU which has been much faster than using the .cpp builds.

Its also fairly easy to route a Flask server to these models with websockets, so with that I've been able to run python and pass data to the model to run on the GPU and pass the response back to the program. Again, there's probably a better way but its cool to have my own personal API for a LLM.

zhte415 · on April 25, 2023

As well as privacy, cheaper and more scalable for a web service to use a user's resources than server side jumps out at me.

radiojasper · on April 26, 2023

It doesn't run on Firefox, so I can't see what it's about. It also doesn't run on Edge and I don't have Chrome installed, obviously.