lneiman's comments

lneiman · 2026-02-01T05:19:20 1769923160

What exactly is this?

vunderba · 2026-02-01T06:22:31 1769926951

It was more an experiment in psychology than a real AI.

It recorded every input/output pair from users over time and eventually reached the point where, instead of generating a random response, it would simply search for the most similar A→B pair and return that output to the user, creating the illusion of intelligence.

If you’re familiar with the magician Derren Brown, there’s a good comparison: when he appeared to play competitively against nine very strong English chess players simultaneously. It’s like that.

https://en.chessbase.com/post/derren-brown-s-che-trick-once-...

dapangzi · 2026-02-01T05:24:08 1769923448

Probably not a good comment...

I'm so ancient I mistakenly linked it mentally with SmarterChild.

https://en.wikipedia.org/wiki/SmarterChild

lneiman · 2025-12-02T17:39:25 1764697165

Author here. We were hitting tail latency and low GPU utilization issues serving SLMs via Triton.

I built a scrappy client-side router using Redis and Lua to track real-time GPU load. It boosted utilization by ~40% and improved latencies.

Happy to hear feedback on the implementation or thoughts on better ways to do this!

pbrumm · 2025-12-06T15:28:37 1765034917

Have you tried switching it to a job queue where the GPU instances try to keep themselves busy. That way you can auto scale the gpus based on utilization. I find it easier to tune and you can monitor latency and backlogs easier. It does require some async mechanisms to the client but I have found it easier to maintain