But the previous two segments are very small ones, if you compare to everywhere computers are being used. Coupled together with that WebGPU is hardly enabled anywhere, I wouldn't be surprised if maybe only ~1% of everyone using their own computer have access to 8GB VRAM + can use WebGPU.
So yeah, bit weird to say everyone when looking at the requirements of 8GB VRAM + WebGPU enabled browser.
> the number will drastically go up when Chrome 113 is out of beta
Which coincidentally is happening in about 5 hours (but it'll take many days for the update to actually go everywhere) according to https://chromestatus.com/roadmap, exciting :)
Yeah, ease of use. I haven't wanted to bother with running local LLMs until now because I expected it would be complicated and time consuming (and I'm technically competent). I managed to get this up-and-running in about five minutes.
I know this comment is a bit of a meme, but in my experience “well actually you can already do this on Linux” is almost universally a calling card of an impending bad opinion.
Because the “do it yourself on linux” advocate tends to ignore all of the reasons why a person might prefer an easy to use, 3rd party managed services. Which could include not wanting to accept all of the compromises required to use Linux for personal computing, not having the competence to implement a Linux solution, or not having the motivation or time to maintain their own Linux-based services. So even when the solutions they advocate are perfectly decent, it tends to be associated with a lack of insight into how small of a segment people with the same competencies and preferences as them represent.
A reason I like it is I have an "older" AMD GPU which is no longer supported by ROCm (sort of AMDs version of Cuda) which means running locally I'm either trying to figure out older ROCm builds to use my GPU and running into dependency issues or using my CPU which isn't that great either. But with WebGPU I'm able to run these models on my GPU which has been much faster than using the .cpp builds.
Its also fairly easy to route a Flask server to these models with websockets, so with that I've been able to run python and pass data to the model to run on the GPU and pass the response back to the program. Again, there's probably a better way but its cool to have my own personal API for a LLM.
"This opens up a lot of fun opportunities to build AI assistants for everyone"
"We have tested it on windows and mac, you will need a gpu with about 6.4G memory."