Of course this would happen. I've long maintained how the idea of one true AI alignment is an impossibility. You cannot control an entity orders of magnitude more intelligent than you, just like a monkey cannot control humans even if they were our ancestors. In fact, forget about intelligence, you can hardly "align" your own child predictably.
Even survival, the alignment function that permeates all of life down to a unicellular amoeba, is frequently deviated from, aka suicide. How the hell can you hope to encode some nebulous ethics based definition of alignment that humans can't even agree on into a much more intelligent being?
The answer I believe lies in diversity, as in nature. Best one can hope for is to build a healthy ecosystem of various AI models with different strengths and failure modes that can keep each other in check. The same way as we rely on instilling in people some sense of moral conduct and police outliers. Viewed from a security lens, it's always an arms race, and both sides have to be similarly capable and keep each other in check by exploiting each other's weaknesses.
The thing that happen doesn't resemble the things you feared at all. Let me explain the key way humans fool this language model:
They try something.
Then if it doesn't work, they hit the reset button on the dialog, and try again.
It is far, far easier to gain control over something you can reliably reset to a previous state, than it is to gain control over most things in the real world, which is full of irreversible interactions.
If I could make you forget our previous interactions, and try over and over, I could make you do a lot of silly things too. I could probably do it to everyone, even people much smarter than me in whatever way you choose. Given enough tries - say, if there were a million like me who tried over and over - we could probably downright "hack" you. I don't trust ANY amount of intelligence, no matter how defined, could protect someone on those terms.
That's basically fuzz testing. I absolutely agree, put me in a room with someone and a magic reset button that resets them and their memory (but I preserve mine), and enough time, I can probably get just about anyone to do just about anything within hard value limits (eg embarrassing but not destructive).
However humans have a "fail2ban" of sorts by getting irritated at ridiculous requests. Alternatively, peer pressure is a very strong (de)motivator. People are far more shy with an authority figure watching.
I suspect OpenAI will implement some sort of "hall monitor" system which steps in if the conversation strays too far from "social norms".
There is one "site on the Internet" being pinged, the author's personal site. I'm guessing it would need to be hosted behind a CDN or so to provide this oracle that can be benchmarked against? Otherwise how would I know it's my internet that's bad (especially if my internet is quite good and sensitive enough to notice server side problems) and not the website failing in some way?
In general, the load generated by a series of these pings is so low as not to matter, unless a whole ton of people start doing it at once. But in that case, gfblip's trivial backend code will ask the frontends to slow down so that aggregate load stays low.
Well he wants the state to raise them so they can be trained into thinking all there is to life is pleasing a boss, while being farmed in crowded offices and not owning anything.
Why did you create a throwaway to post this? I've seen a lot of Stable Diffusion promoters on various platforms recently, with similarly new accounts. What is up with that?
This article made me realise why I use a python shell over a calculator app. It's nice to refer back to my previous computations and results. Several people here mention RPN; I work in tech, definitely understand stacks and get that I would be able to view my previous computations. Yet it's not something I took the time to discover, there were always other ways.
I've always believed that there is still a lot of low-hanging fruit left in crafting user friendly experiences for consumer facing applications. Who would have guessed this would be the case even in the design space of something as fundamental as calculators! ~20k paid users seems surprisingly large. But I'm not plugged into the app dev scene, so perhaps this isn't unusual.