> I'm never gonna have my non-tech friend do any of this Who was making that ass...

littlestymaar · 2024-05-05T07:07:26

> Who was making that assertion? I certainly wasn't.

But you were responding to my comment, and that was the implied part in it (which I later clarified to answer your question).

> In the same way I am never going to tell my non-engineer friends to build their own todo app instead of just using something like Todoist. But if they told me they cared about data privacy/security, I'd walk them through the steps if they cared to hear them.

Fortunately for most apps there's a middle ground between “use a spyware” and “build your own”, and that's exactly why this tool is much needed for LLM in my opinion.

spmurrayzzz · 2024-05-05T16:15:15

> Fortunately for most apps there's a middle ground between “use a spyware” and “build your own”, and that's exactly why this tool is much needed for LLM in my opinion.

Sure I understand the motivation I think, the big tradeoff is performance. If your original commentary about people privileging convenience holds true across the end-to-end user experience here, I would say that single digit tokens per second rates probably qualify as inconvenient for many folks and thus cannibalize whatever ease-of-setup value you get at the outset.

There's a reason CUDA/ROCm is needed for the acceleration, there's a ton of work put into optimization via custom kernels to get the palatable throughput/latency consumers are used to when using frontier model APIs (or GPU-accelerated local stacks).