Hacker News new | past | comments | ask | show | jobs | submit | FrasiertheLion's comments login

4chan doesn't like Elon LOL


Of course this would happen. I've long maintained how the idea of one true AI alignment is an impossibility. You cannot control an entity orders of magnitude more intelligent than you, just like a monkey cannot control humans even if they were our ancestors. In fact, forget about intelligence, you can hardly "align" your own child predictably.

Even survival, the alignment function that permeates all of life down to a unicellular amoeba, is frequently deviated from, aka suicide. How the hell can you hope to encode some nebulous ethics based definition of alignment that humans can't even agree on into a much more intelligent being?

The answer I believe lies in diversity, as in nature. Best one can hope for is to build a healthy ecosystem of various AI models with different strengths and failure modes that can keep each other in check. The same way as we rely on instilling in people some sense of moral conduct and police outliers. Viewed from a security lens, it's always an arms race, and both sides have to be similarly capable and keep each other in check by exploiting each other's weaknesses.


The thing that happen doesn't resemble the things you feared at all. Let me explain the key way humans fool this language model:

They try something.

Then if it doesn't work, they hit the reset button on the dialog, and try again.

It is far, far easier to gain control over something you can reliably reset to a previous state, than it is to gain control over most things in the real world, which is full of irreversible interactions.

If I could make you forget our previous interactions, and try over and over, I could make you do a lot of silly things too. I could probably do it to everyone, even people much smarter than me in whatever way you choose. Given enough tries - say, if there were a million like me who tried over and over - we could probably downright "hack" you. I don't trust ANY amount of intelligence, no matter how defined, could protect someone on those terms.


That's basically fuzz testing. I absolutely agree, put me in a room with someone and a magic reset button that resets them and their memory (but I preserve mine), and enough time, I can probably get just about anyone to do just about anything within hard value limits (eg embarrassing but not destructive).

However humans have a "fail2ban" of sorts by getting irritated at ridiculous requests. Alternatively, peer pressure is a very strong (de)motivator. People are far more shy with an authority figure watching.

I suspect OpenAI will implement some sort of "hall monitor" system which steps in if the conversation strays too far from "social norms".


Heck, dude, we don't even seem to be able to control an entity orders of magnitude _dumber_ than us.


Not me, I'm a great cat herder!


Exactly!


It actually seems quite easy to train a separate classifier on top of this to censor bad messages


it is not quite easy given that they tried and this posting is all about endless parade of workarounds


The entire field of application security and cryptanalysis begs to differ. It's always an arms race.


apoptosis is an essential part of human life, and preventing cancer.

there is something it is like, to be a cell in a human body

morality is clearly relative if you ditch humanism, either downward (cellular) or upward (AI).

i agree with you.


Bank Man Fried


There is one "site on the Internet" being pinged, the author's personal site. I'm guessing it would need to be hosted behind a CDN or so to provide this oracle that can be benchmarked against? Otherwise how would I know it's my internet that's bad (especially if my internet is quite good and sensitive enough to notice server side problems) and not the website failing in some way?


In general, the load generated by a series of these pings is so low as not to matter, unless a whole ton of people start doing it at once. But in that case, gfblip's trivial backend code will ask the frontends to slow down so that aggregate load stays low.


He's always pushing his population collapse agenda and urging people to have children... Guess it's all about having them, not raising them.


Well he wants the state to raise them so they can be trained into thinking all there is to life is pleasing a boss, while being farmed in crowded offices and not owning anything.


Why did you create a throwaway to post this? I've seen a lot of Stable Diffusion promoters on various platforms recently, with similarly new accounts. What is up with that?


It's quite simply because I'm on my work computer, and I wanted to fire off a comment here. No nefarious purposes. My regular account is uejfiweun.


Requires massive centralization of data and complicated logic for access control enforcement, which now has to happen for every call.



Wouldn't p2p applications be problematic because they reveal the IPs of several people who are connected?


This article made me realise why I use a python shell over a calculator app. It's nice to refer back to my previous computations and results. Several people here mention RPN; I work in tech, definitely understand stacks and get that I would be able to view my previous computations. Yet it's not something I took the time to discover, there were always other ways.

I've always believed that there is still a lot of low-hanging fruit left in crafting user friendly experiences for consumer facing applications. Who would have guessed this would be the case even in the design space of something as fundamental as calculators! ~20k paid users seems surprisingly large. But I'm not plugged into the app dev scene, so perhaps this isn't unusual.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: