It was more an experiment in psychology than a real AI.
It recorded every input/output pair from users over time and eventually reached the point where, instead of generating a random response, it would simply search for the most similar A→B pair and return that output to the user, creating the illusion of intelligence.
If you’re familiar with the magician Derren Brown, there’s a good comparison: when he appeared to play competitively against nine very strong English chess players simultaneously. It’s like that.
Have you tried switching it to a job queue where the GPU instances try to keep themselves busy. That way you can auto scale the gpus based on utilization. I find it easier to tune and you can monitor latency and backlogs easier. It does require some async mechanisms to the client but I have found it easier to maintain
reply