Hacker Newsnew | past | comments | ask | show | jobs | submit | lneiman's commentslogin

What exactly is this?

It was more an experiment in psychology than a real AI.

It recorded every input/output pair from users over time and eventually reached the point where, instead of generating a random response, it would simply search for the most similar A→B pair and return that output to the user, creating the illusion of intelligence.

If you’re familiar with the magician Derren Brown, there’s a good comparison: when he appeared to play competitively against nine very strong English chess players simultaneously. It’s like that.

https://en.chessbase.com/post/derren-brown-s-che-trick-once-...


Probably not a good comment...

I'm so ancient I mistakenly linked it mentally with SmarterChild.

https://en.wikipedia.org/wiki/SmarterChild


Author here. We were hitting tail latency and low GPU utilization issues serving SLMs via Triton.

I built a scrappy client-side router using Redis and Lua to track real-time GPU load. It boosted utilization by ~40% and improved latencies.

Happy to hear feedback on the implementation or thoughts on better ways to do this!


Have you tried switching it to a job queue where the GPU instances try to keep themselves busy. That way you can auto scale the gpus based on utilization. I find it easier to tune and you can monitor latency and backlogs easier. It does require some async mechanisms to the client but I have found it easier to maintain


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: