The author argues that AI algorithms don’t have desires, yet this is precisely what reinforcement learning simulates.
With reinforcement learning positive rewards are assigned for positive outcomes (a “carrot”) and negative rewards are assigned for negative outcomes (a “stick”).
The model is trained to maximize the expected rewards per decision. I’d say this is fairly analogous to desire.
But the reward (i.e., meaning) is defined by a human engineer. The point of the article is to say that we don’t have AGI yet because today’s AI needs humans to ascribe meaning to its decisions.
Agree. Instead of "meaning" I might just use "desire". The AI is trained to meet the human engineer's chosen desire. The engineer specifies this desire through a loss function (supervised learning) or rewards (reinforcement learning). Two more points:
1. This "desire" applies only to the training algorithm.
The deployed AI (running in inference) is no longer trying to minimize a loss function or maximize discounted rewards.
2. This "desire" is not a "deliberate desire".
It is more like an "animal desire" such as wanting to eat because of hunger. Aristotle draws this distinction. Such a desire is automatic (rather than deliberate) and is not part of choice-making.
For base desires, that 'agent' is evolution. E.g., we can't will ourselves to stop breathing for the next hour, no matter what our higher-level desire might be.
Many of our higher desires are set for us (taught) by other humans: our parents and society. We wouldn't want an AGI that was able to set all of its own desires, any more than we'd want that in our children. Rather dangerous indeed.
Nah you know what, all these claims we have hard AI? You don't see people claim it explicitly but you see incredible artifacts. What is the deal? They are incredible! They cannot be believed! So just today I was reading an article on the China Times, in English, and it said all the elements recoverable from a salt flat, something something lithium strontium and strontium. 0 comments. So I post a comment, respectfully, I do respect China don't want to question them going against the grain, so I said "Strontium and strontium? Different isotopes?" So that's a machine mistake--which is fine, I don't discriminate against machines--or perhaps...just careless? Or different isotopes, so different subscripts that got lost in the medium. Many possibilities. At any rate, I sent the comment, took responsibility for it, and it's a typo so it's what you're supposed to do. It can happen to anybody.
Regardless the article had otherwise perfect spelling and grammar, along with www.rt.com the writing is very very polished, way better than American newspapers. And I like them more because they, like me, learned English overseas. Going to America fucks up your American English. It's best to learn English overseas, don't learn it by living in America. Just become lazy, stop actually printing a page out and taking an actual red pen at it, then updating, repeatedly. There's no substitute. And the articles make more sense, all written by bilinguals. Same way I prefer watching a movie in English language, with the subtitles in Spanish, redundant information, otherwise it kind of sucks.
The more sophisticated the machine learning heuristics--not algorithms, never algorithms--the more embarrassing the error. Less frequent, but more embarrassing. Can't say it applies to China Times, but it's always ambiguous, man or machine error? In the end it's the same, man communicates through machines, like computers, made by man.
With reinforcement learning positive rewards are assigned for positive outcomes (a “carrot”) and negative rewards are assigned for negative outcomes (a “stick”).
The model is trained to maximize the expected rewards per decision. I’d say this is fairly analogous to desire.
I’ve written an introduction to reinforcement learning here: https://improve.ai/reinforcement-learning/