Nice demo! I briefly tried it out and the demo felt much better than the original WebLLM one!
On a side note, i've been trying to do something similar too for similar reasons (privacy).
Based on my recent experience, i find that running LLM directly in the browser with decent UX (e.g. sub 1-2 second response time, no lag, no crashes) is still somewhat impossible given the current state of things. Plus, i think that relying on users' own GPU hardware for UX improvement via WebGPU is not exactly very practical on a large scale (but it is still something!) since not everyone may have access to GPU hardware
But yeah, if there's anything to look forward to in this space, i personally hope to see improved feasibility of running LLMs in browsers
On a side note, i've been trying to do something similar too for similar reasons (privacy).
Based on my recent experience, i find that running LLM directly in the browser with decent UX (e.g. sub 1-2 second response time, no lag, no crashes) is still somewhat impossible given the current state of things. Plus, i think that relying on users' own GPU hardware for UX improvement via WebGPU is not exactly very practical on a large scale (but it is still something!) since not everyone may have access to GPU hardware
But yeah, if there's anything to look forward to in this space, i personally hope to see improved feasibility of running LLMs in browsers