Also (since it's been a while): there are over 2000 comments in the current thread. To read them all, you need to click More links at the bottom of the page, or like this:
This is insane. But I'm impressed most of all by the quality of motion. I've quite simply never seen convincing computer-generated motion before. Just look at the way the wooly mammoths connect with the ground, and their lumbering mass feels real.
Motion-capture works fine because that's real motion, but every time people try to animate humans and animals, even in big-budget CGI movies, it's always ultimately obviously fake. There are so many subtle things that happen in terms of acceleration and deceleration of all of the different parts of an organism, that no animator ever gets it 100% right. No animation algorithm gets it to a point where it's believable, just where it's "less bad".
But these videos seem to be getting it entirely believable for both people and animals. Which is wild.
And then of course, not to mention that these are entirely believable 3D spaces, with seemingly full object permanence. As opposed to other efforts I've seen which are basically briefly animating a 2D scene to make it seem vaguely 3D.
I disagree, just look at the legs of the woman in the first video. First she seems to be limping, than the legs rotate. The mammoth are totally uncanny for me as its both running and walking at the same time.
Don't get me wrong, it is impressive. But I think many people will be very uncomfortable with such motion very quickly. Same story as the fingers before.
> I think many people will be very uncomfortable with such motion very quickly
So... I think OP's point stands. (impressive, surpasses human/algorithmic animation thus far).
You're also right. There are "tells." But, a tell isn't a tell until we've seen it a few times.
Jaron Lanier makes a point about novel technology. The first gramophone users thought it sounded identical to live orchestra. When very early films depicting a train coming towards a camera, and people fell out of their chairs... Blurry black and white, super slow frame rate projected on a bedsheet.
Early 3d animation was mindblowing in the 90s. Now it seems like a marionette show. Well... I suppose there was a time when marionette shows were not campy. They probably looked magic.
It seems we need some experience before we internalize the tells and it starts to look fake. My own eye for CG images seems to improving faster then the quality. We're all learning to recognize GPT generated text. I'm sure these motion captures will look more fake to us soon.
That said... the fact that we're having this discussion proves that what we have here is "novel." We're looking at a breakthrough in motion/animation.
Also, I'm not sure "real" is necessary. For games or film what we need is rich and believable, not real.
> You're also right. There are "tells." But, a tell isn't a tell until we've seen it a few times.
Once you have seen a few you can tell instantly. They all move at 2 keyframes per second, that makes all movements seem alien and everything in an image moves strangely in sync. The dog moves in slow motion since they need more keyframes etc. That street some looks like they move in slow motion and others not.
People will quickly learn to notice those issues, they aren't even subtle once you are aware of them, not to mention the disappearing things etc.
And that wouldn't be very easy to fix, they need to train it on keyframes because training frame by frame is too much.
But that should make this really easy for others to replicate. You just train on keyframes and then train a model to fill in between keyframes, and you get this. It has some limitations as we see with movement keeping the same pace in every video, but there are a lot of cool results from it anyway.
I have a friend who has worked on many generations of video compression over the last 20 years. He would rather watch a movie on film without effects than anything on a TV or digital theater. He's trained himself to spot defects and now even with the latest HEVC H.265 he finds it impossible to enjoy. It's artifacts all the way down and the work never ends. At the superbowl he was obsessed with blocking for fast objects, screen edge artifacts, flat field colors, and something with the grass.
Luckily, I think he'll retire sooner than later, and maybe it will get better then.
I think a lot of these issues could be "solved" by lowering the resolution, using a low quality compression algorithm, and trimming clips down to under 10 seconds.
And by solved, I mean they'll create convincing clips that'll be hard for people to dismiss unless they're really looking closely. I think it's only a matter of time until fake video clips lead to real life outrage and violence. This tech is going to be militarized before we know it.
Yeah, we are very close to losing video as a source of truth.
I showed these demos to my partner yesterday and she was upset about how real AI has become, how little we will be able to trust what we see in the future. Authoritative sources will be more valuable, but they themselves may struggle to publish only the facts and none of the fiction.
Russian authorities refute his death and have released proof of life footage, which may be doctored or taken before his death. Authoritative source Wikipedia is not much help in establishing truth here, because without proof of death they must default to toeing the official line.
I predict that in the coming months Sokolov (who just yesterday was removed from his post) will re-emerge in the video realm, and go on to have a glorious career. Resurrecting dead heroes is a perfect use of this tech, for states where feeding people lies is preferable to arming them with the truth.
Sokolov may even go on to be the next Russian President.
> Yeah, we are very close to losing video as a source of truth.
I think this way of thinking is distracted. No type of media has ever been a source of truth in itself. Videos have been edited convincingly for a long time, and people can lie about their context or cut them in a way that flips their meaning.
Text is the easiest media to lie on, you can freely just make stuff up as you go, yet we don't say "we cannot trust written text anymore".
Well yeah duh, you can trust no type of media just because it is formatted in a certain way. We arrive at the truth by using multiple sources and judging the sources' track records of the past. AI is not going to change how sourcing works. It might be easier to fool people who have no media literacy, but those people have always been a problem for society.
Text was never looked at a source of truth like video was. If you messaged someone something, they wouldn't necessarily believe it. But if you sent them a video of that something, they would feel that they would have no choice but to believe that something.
> Well yeah duh, you can trust no type of media just because it is formatted in a certain way
Maybe you wouldn't, but the layperson probably would.
> We arrive at the truth by using multiple sources and judging the sources' track records of the past
Again, this is something that the ideal person would, not the average layperson. Almost nobody would go through all that to decide if they want to believe something or not. Presenting them a video of this sometjing would've been a surefire way to force them to believe it though, at least before Sora.
> people have always been a problem for society
Unrelated, but I think this attitude is by far the bigger "problem for society". It encourages us to look down on some people even when we do not know their circumstances or reasons, all for an extremely trivial matter. It encourages gatekeeping and hostility, and I think that kind of attitude is at least as detrimental to society as people with no media literacy.
During a significant part of history, text was definitely considered a source of truth, at least to the extent a lot of people see video now. A fancy recommendation letter from a noble would get you far. It makes sense because if you forge it, that means you had to invest significant amount of effort and therefore you had to plan the deception. It's a different kind of behavior than just lying on a whim.
But even then, as nowadays, people didn't trust the medium absolutely. The possibility of forgery was real, as it has been with the video, even before generative AI.
To back up this claim, when fictional novels first became a literary format in the Western world, there was immense consternation about the fact that un-true things were being said in text. It actually took a while for authors to start writing in anything besides formats that mimicked non-fictional writing (letters, diary entries, etc.).
As has been pointed ad nauseam by now, no one's suggesting that AI unlocks the ability to doctor images; they're suggesting that it makes it trivially easy for anyone, no matter how unskilled, to do so.
I really find this constant back and forth exhausting. It's always the same conversation: '(gen)AI makes it easy to create lots of fake news and disinformation etc.' --> 'but we've always been able to do that. have you not guys not heard of photoshop?' --> 'yes, but not on this scale this quickly. can you not see the difference?'
Anyway, my original point was simply to say that a lot of people have (rightly or wrongly) indeed taken photographic evidence seriously, even in the age of photographic manipulation (which as you point out, pretty much coincides with the age of photography itself).
> Yeah, we are very close to losing video as a source of truth.
Why have you been trusting videos? The only difference is that the cost will decrease.
Haven't you seen Holywood movies? CGI has been convincing enough for a decade. Just add some compression and shaky mobile cam and it would be impossible to tell the difference on anything.
> Yeah, we are very close to losing video as a source of truth.
We've been living in a post-truth society for a while now. Thanks to "the algorithm" interacting with basic human behavior, you can find something somewhere that will tell you anything is true. You'll even find a community of people who'll be more than happy to feed your personal echo chamber -- downvoting & blocking any objections and upvoting and encouraging anything that feeds the beast.
And this doesn't just apply to "dumb people" or "the others", it applies to the very people reading this forum right now. You and me and everybody here lives in their safe, sound truth bubble. Don't like what people tell you? Just find somebody or something that will assure you that whatever it is you think, you are thinking the truth. No, everybody is the asshole who is wrong. Fuck those pond scum spreaders of "misinformation".
It could be a blog, it could be some AI generated video, it could even be "esteemed" newspapers like the New York Times or NPR. Everybody thinks their truth is the correct one and thanks to the selective power of the internet, we can all believe whatever truth we want. And honestly, at this point, I am suspecting there might not be any kind of ground truth. It's bullshit all the way down.
so where do we go from here? the moon landing was faked, we're ruled by lizard people, and there are microchips in the vaccine. at some level, you can believe what you want to believe, and if the checkout clerk thinks the moon is made of cheese, it makes no difference to me, I still get my groceries. but for things like nuclear fusion, are we actually making progress on it or is it also a delusion. where the rubber meets the road is how money gets spent on building big projects. is JWST bullshit? is the LHC? ITER? GPS?
we need ground truths for these things to actually function. how else can things work together?
I've always found that take quite ridiculous. Fake videos have existed for a long time. This technology reduces the effort required but if we're talking about state actors that was never an issue to begin with.
People already know that video cannot be taken at face value. Lord of the rings didn't make anyone belive orcs really exist.
Lord of the Rings had a budget in the high millions and took years to make with a massive advertising campaign.
Riots happen due to out of context video clips. Violence happens due to people seeing grainy phone videos and acting on it immediately. We're reaching a point where these videos can be automatically generated instantly by anyone. If you can't see the difference between anyone with a grudge generating a video that looks realistic enough, and something that requires hundreds of millions of dollars and hundreds of employees to attain similar quality, then you're simply lying.
A key difference in the current trajectory is its becoming feasible to generate highly targeted content down to an individual level. This can also be achieved without state actor level resources or the time delays needed to traditionally implement, regardless of budget. The fact it could also be automated is mildly terrifying.
Coordinated campaigns of hate through the mass media - like kicking up war fever before any major war you care to name - is far more concerning and has already been with us for about a century. Look at WWII and what Hitler was doing with it for a clearest example; propaganda was the name of the game. The techniques haven't gone anywhere.
If anything, making it cheap enough that people have to dismiss video footage might soften the impact. It is interesting how the internet is making it much harder for the mass media to peddle unchallenged lies or slanted perspectives. This tech might counter-intuitively make it harder again.
I have no doubt trust levels will adjust, eventually. The challenge is that takes a non-trivial amount of time.
It's still an issue with traditional mass media. See basically any political environment where the Murdoch media empire is active. The long tail of (I hate myself for this terminology, but hey, it's HN) 'legacy humans' still vote and have a very real affect on society.
It's funny you mention LotR, because the vast vast vast majority of the character effects were practical (at least in the original trilogy). They were in fact, entirely real, even if they were not true to life.
You can still be enraged by things you know are not real. You can reason about your emotional response, but it's much harder to prevent an emotional response from happening in the first place.
The issue is not even so much generating fake videos as creating plausible deniability. Now everything can be questioned for the pure reason of seeming AI-generated.
Yeah, it looks good at first glance. Also the fingers are still weird. And I suppose for every somewhat working vid, there were dozens of garbage. At least that was my experience with image generation.
I don't believe, movie makers are out of buisness any time soon. They will have to incorporate it though. So far this can make convincing background scenery.
> I don't believe, movie makers are out of business any time soon
My son was learning how to play keyboard and he started practicing based on metronome. At some point, I was thinking, why is he learning it at all? We can program which key to be pressed at what point in time, and then a software can play itself! Why bother?
Then it hit me! Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.
Isn't it obvious? Life is about experiences and enjoyment, all of this tech is fun and novel and interesting but realistically, it's really exciting for tech people because it's going to be used to make more computer games, social media posts and advertisements, essentially, it's exciting because it's going to "make money".
Outside of that, people just want to know what it feels like to be able to play their favorite song on guitar and to go skiing etc.
Being perfect at everything would be honestly boring as shit.
I completely agree. There is more to a product than the final result. People who don't play an instrument see music I terms of money. (Hint: there's no money in music). But those who play know that the pleasure is in the playing, and jamming with your mates. Recording and selling are work, not pleasure.
This is true for literally every hobby people do for fun. I am learning ceramics. Everything I've ever made could be bought in a shop for a 100th of the cost, and would be 100 times "better". But I enjoy making the pot, and it's worth more to me than some factory item.
Sona will allow a new hobby, and lots will have fun with it. Pros will still need to fo Pro things. Not everything has to be viewed through the lens of money.
I think it's not. If musicians and only musicians wanted themselves behind instruments, for the sake of being, there should be a market for autogenerated self-playing music machines for their former patrons who wouldn't care. And that's not the case; the market for ambient sound machines is small. It takes equal or more insanity to have one at home than, say, having a military armored car in the garage.
On the other hand you've probably heard of an iPod, which I think I could describe as a device dedicated to give false sense of an ever-present musician, so to speak.
So, "they" in "they still want a person behind the piano" is not just limited to hobbyists and enthusiasts. People wants people behind an instrument, for some reason. People pays for others' suffering, not for a thing's peculiarity.
I don't think this is entirely accurate. There are entire genres of music where the audience does not want a person behind the piano/guitar/drums. Plenty of electronic artists have tried the live band gimmick and while it goes down well with a certain segment of the audience, it turns off another segment that doesn't want to hear "humanized" cover versions of the material. But the point is that both of those audiences exist, and they both have lots of opportunity to hear the music they want to hear. The same will be true of visual art created by computers. Some people will prefer a stronger machine element, other people will prefer a stronger human element, and there is room for us all.
I don't think this is entirely accurate. There are entire genres of music where the audience does not want a person behind the piano/guitar/drums.
Hilariously, nearly every electronic artist I can think of, stands in front of a crowd and "plays "live" by twisting dials etc, so I think it's fairly accurate.
Carl Cox, Tycho, Aphex Twin, Chemical Brothers, Underworld, to name a few.
DJ performances far outnumber "live" performances in the electronic scene. Perhaps you can cherry-pick certain DJs and make a point that they are creating a new musical composition by live-remixing the tracks they play, but even then a significant number of clubbers don't care, they just want to dance to the music. There are venues where a bunch of the audience can't even see the DJ and they still dance because they are enjoying the music on its own merits.
I stand by my original point. There are plenty of people who really do not care if there is a human somewhere "performing" the music or not. And that's totally fine.
Your reasoning is circular. Humans who go to performances of other humans playing instruments enjoy seeing other humans playing instruments. That should not be surprising. The question is whether humans as a whole intrinsically prefer seeing other humans playing instruments over hearing a "perfect" machine reproduction. And the answer to that question is no. There are plenty of humans who really do prefer the machine reproduction.
If you're still talking about whether people want to hear live covers, or recordings, I think it's an apples to oranges comparison therefore I don't see the point in it.
Mainly to pick songs that fit the mood of the audience. At the moment, humans seem to do a better job "reading" the emotions of other humans in this kind of group setting than computers do, and people are willing to pay for experts who have that skill.
An ML model could probably do a good job at selecting tunes of a particular genre that fit into a pre-defined "journey" that the promoter is trying to construct, so I could see a role for "AI DJs" in the future, especially for low budget parties during unpopular timeslots like first day of a festival while people are still arriving and the crew is still setting up. Some of that is already done by just chucking a smart playlist on shuffle. But then you also have up-and-comer or hobbyist DJs who will play for free in those slots, so maybe there's not really a need for a smarter computer to take over the job.
This whole thread started from the question of why a human should do something when a machine can do it better. And the answer is simple: because humans like to do stuff. It is not because humans doing stuff adds some kind of hand-wavey X factor that other humans intrinsically prefer.
Just to be clear, I was talking about the original sound produced by a person (vs. a machine). Of course it was recorded and played back a _lot_ more than folks listening live.
But I take it, maybe I'm not so familiar with world music, I was talking more about Indian music. While the music is recorded and mixed across several tracks electronically, I think most of it is played (or sang) originally by a person.
In the US atleast there's the occasional acoustic song that becomes a hit, but rock music is obviously on its way to slowly becoming jazz status. It and country are really the last genres where live traditional instruments are common during live performances.
Pop, Hip Hop, and EDM basically all are put together as being nearly computer perfect.
All the great producers can play instruments, and that's often times the best way to get a section out initially. But what you hear on Spotify is more and more meticulously put together note by note on a computer after the fact.
Live instruments on stage are now often for spectacle or worse a gimmick, and it's not the song people came to love. I think the future will have people like Lionclad[1] in it pushing what it means to perform live, but I expect them to become fewer and fewer as music just gets more complex to produce overall.
Thankfully, art is not about the least common denominator and I'm confident that there will continue to be music played live as long as humanity exists.
Music has a lot of people who believe that not only is their favorite genre the best but that they must tear down people who don't appreciate it.
You aren't better because you prefer live music, you just have a preference. Music wasn't better some arbitrary number of years ago, you just have a preference.
Nobody said one form is objectively better, just that there is a form that is becoming more popular.
But to state my opinion, I can't imagine something more boring than thinking the best of music, performance, TV, or media in general was done best and created in the past.
It's not that I think my tastes in music are objectively better, it's that I strongly feel that music is a very personal matter for many people and there will be enough people who will seek out different forms of music than what is "popular". Rock, jazz, even classical music, are still alive and well.
> But to state my opinion, I can't imagine something more boring than thinking the best of music, performance, TV, or media in general was done best and created in the past.
And to state my opinion, art isn't about "the best" or any sort of progress, it's about the way we humans experience the world, something I consider to be a timeless preoccupation, which is why a song from 2024 can be equally touching as a ballad from the 14th century.
When I was studying music technology and using state of the art software synthesizers and sequencers, I got more and more into playing my acoustic guitar. There's a deep and direct connection and a pleasure that comes with it that computers (and now/eventually AI) will never be able to match.
(That being said, a realtime AI-based bandmate could be interesting...)
My son is an interesting example of this, I can play all the best guitar music on earth via the speakers, but when I physically get the guitar out and strum it, he sits up like he has just seen god, and is total awe of the sounds of it, the feel of the guitar and the site of it. It's like nothing else can compare. Even if he is hysterically crying, the physical isntrument and the sound of it just makes him calm right down.
I wonder if something is lost in the recording process that just cannot be replicated? A live instrument is something that you can actually feel the sound of IMO, I've never felt the same with recorded music even though I of course enjoy it.
I wonder if when we get older we just get kind of "bored" (sadly) and it doesn't mean as much to us as it probably should.
I'm speculating that one would have more mirror neuron activation watching a person perform live, compared to listening to a recording or watching a video. Thus the missing component that makes live performance special.
For me the guitar is like the keyboard I am writing on right now. It will never be replaced, because that is how I input music into the world. I could not program that, I was doing tracker music as a teenager, and all of the songs sounded weird, because the timing, and so on is not right. And now when I transcribe demos, and put them into a DAW, there seem to be the milliseconds off, that are not quite right. I still play the piano parts live, because we don't have the technology right now to make it sound better than a human, and even if we had, it would not be my music, but what an AI performed.
I really briefly looked at AI in music, lots of wild things are made. It is hard to explain, one was generating a bunch of sliders after mimicking a sample from sine waves (quite accurately)
> Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.
This actually happened on a recent hit, too -- Dua Lipa's Break My Heart. They originally had a drum machine, but then brought in Chad Smith to actually play the drums for it.
Edit: I'm not claiming this was new or unusual, just providing a recent example.
This goes way back. Nine Inch Nails was a synth-first band with the music being written by Trent in a studio on a DAW. That worked but what really made the bad was live shows so they found ways even using 2 drummers to translate the synths and machines into human-plated instruments.
Also way before that back in the early 80’a Depeche Mode displayed the recorded drumb-reel onstage so everyone knew what it was, but when the got big enough they also transitioned into an epic live show with guitars and live drum a as well as synth-hooked drums devices they could bag on in addition to keyboards.
We are human. We want humans. Same reason I want a hipster barista to pour my coffee when a machine could do it just as well.
Same reason I want a hipster barista to pour my coffee when a machine could do it just as well.
I've wondered about this for a long time too, why on earth is anyone still able to be a barista, it turns out, people actually like the community around cafes and often that means interacting with the staff on a personal level.
Some of my best friends have been barista's I've gone to over several years.
It’s more than that, doing it well is still beyond sophisticated automation. Many variables that need do be constantly adjusted for. Humans are still much better at it than machines, regardless of the social element.
If true, probably not for long. Still my point is people are customer. It’s more fun to think about what won’t change. I think we will still have baristas.
A good live performance is intentionally not 100% the same as in the studio, but there can and should be variations. A refrain repeated another time, some improvisation here. Playing with the tempo there. It takes a good band, who know each other intimately, to make that work, though. (a good DJ can also do this with electronic music)
A recorded studio version, I can also listen to at home. But a full band performing in this very moment is a different experience to me.
There are subtle and deliberate deviations in timing and elements like vibrato when a human plays the same song on an instrument twice, which is partly why (aside from recording tech) people prefer live or human musicians.
Think about how precise and exacting a computer can be. It can play the same notes in a MIDI editor with exact timing, always playing note B after 18 seconds of playing note A. Human musicians can't always be that precise in timing, but we seem to prefer how human musicians sound with all of the variations they make. We seem to dislike the precise mechanical repetition of music playback on a computer comparatively.
I think the same point generalises into a general dislike on the part of humans of sensory repetition. We want variety. (Compare the first and second grass pictures at [0] and you will probably find that the second which has more "dirt" and variety looks better.) "Semantic satiation" seems to be a specific case of the same tendency.
I'm not saying that's something a computer can't achieve eventually but it's something that will need to be done before machines can replace musicians.
Yes. I tried that with some software-based synthesisers (like the SWAM violin and Reason's Friktion) which are designed for human-playing (humans controlling the VST through a device that emits MIDI CC control messages) but my understanding is that the modulation that skilled human players perform with tends to be better/more desirable than what software modulators can currently achieve.
The real dilemma is with composition/song-writing.
Ability to create live experiences can still be a motivating factor for musicians (aside from the love of learning). Yet, when AI does the song-writing far more effectively, then will the musician ignore this?
It's like Brave New World. Musicians who don't use these AI tools for song-writing will be like a tribe outside modern world. That's a tough future to prepare for. We won't know whether a song was actually the experience and emotions of a person or not.
Even if we assume that people want fully automated music, the process of learning to play educates the musician. Similarly, you'd still need a director/auteur, editors, writers and other roles I have no appreciation or knowledge of to create a film from AI models.
Steam shovels and modern excavators didn't remove our need for shovels or more importantly, the know-how to properly apply these tools. Naturally, most people use a shovel before they operate an excavator.
It's interesting though, the question really becomes, if 10 people used to shovel manually to feed their family. And now it takes 1 person and an excavater, what in good faith do you tell those other 9..."don't worry you can always be a hobby shovelist?"
They can apply their labor wherever it is valued. Perhaps they will become more productive excavator operators. By creating value in a specialized field their income would increase. Technology does not decrease the need for labor. Rather it increases the productivity of the laborer.
Human ingenuity always finds a need for value creation. Greater abundance creates new opportunities.
Take the inverse position. Should we go back to reading by candlelight to increase employment in candle making?
No, electric lighting allowed peopled to become productive during night hours. A market was created for electricity producers, which allowed additional products which consume electricity to be marketed. Technological increases in productivity cascade into all areas of life, increasing our living standards.
A more interesting, if not controversial line of inquiry might start with: If technology is constantly advancing human productivity, why do modern economies consistently experience price inflation?
You miss the important point, which is the productivity gain means the average living standard of society as a whole increases. A chunk of what is now regarded as 'toil' work disappears, and the time freed up is able to be deployed more productively in other areas.
Of course, this change is dislocating for the particular people whose toil disappeared. They need support to retrain to new occupations.
The alternative is to cling to a past where everyone - on average - is poorer, less healthy, and works in more dangerous jobs.
The ‘augmented singer’ is very popular, though. https://en.wikipedia.org/wiki/Auto-Tune: “Auto-Tune has been widely criticized as indicative of an inability to sing on key.”
Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.
You've never been to a rave, huh? For that matter, there's a lot of pop artists that use sequencers and dispense with the traditional band on stage.
I can see this being used extensively for short commercials, as the uncanny aspect of a lot of the figures will help to capture people's attention. I don't necessarily believe it will be less expensive than hiring a director and film crew however.
> I love these hot takes based on profoundly incredible tech that literally just launched. Acting like 2030 isn't around the corner.
It seems bizarre to think the gee whiz factor in a new commercial creative product makes critiquing its output out-of-bounds. This isn't a university research team: they're charging money for this. Most people have to determine if something is useful before they pay for it.
Yeah. I think people nowadays are in a kind of AI-euphoria and they took every advancement in AI for more than what they really are. The realization of their limitations will set in once people have been working long enough on the stuff. The capacity of the newfangled AIs are impressive. But even more impressive are their mimicry capabilities.
Not dismissing, but being realistic. I observed all the AI tools, usually amaze most people initially by showing capabilities never seen before. Then people realise their limitations, ie what capabilities are still missing. And they're like: "oh, this is no genie in a bottle capable of satisfying every wish. We'll still have to work to obtain our vision..." So the magic fades away, and the world returns to normal, but now with an additional tool very useful in some situations :)
I agree. Skepticism usually serves people well as a lot of new tech turns out to be just hype. Except when it is not and I think this is one of those few cases.
AI won't make artistic decisions that wow an audience.
AI won't teach you something about the human condition.
AI will only enable higher quarterly profits from layoffs until GPU costs catch up.
What the fuck is the point of AI automating away jobs when the only people who benefit are the already enormously wealthy? AI won't be providing time to relax for the average worker, it will induce starvation. Anything to prevent will be stopped via lobbying to ensure taxes don't rise.
Seriously, what is the point? What is the point? What the fuck is there to live for when art and humanities is undermined by the MBA class and all you fucking have is 3 gig jobs to prevent starvation?
I believe ai and full automatisation is critical for a Star Trek society.
We are not very good in providing anything reasonable today because capitalism is still way to strong and manual laber still way to necessary.
Nonetheless look at my country Germany: we are a social state. Plenty of people get 'free' money and it works.
The other good thing: there are plenty of people who know what good is (good art etc) but are not able to draw. The can also express themselves. AI as a tool.
If we as society discover that there will be no really new music or art happening I don't know what we will do.
Plenty of people are well entertained with crap anyway.
I would not. Five (six, seven?) years ago, we had style transfer with video and everyone was also super euphoric about that. If I compare to those videos, there is clearly progress but it is not like we started from zero 2 years ago.
It means "extremely happy", but it's usually used to refer to a particular moment in time (rather than a general sentiment), and so the word sounds a bit out of place here, to me.
"The camera follows behind a white vintage SUV with a black roof": The letters clearly wobble inconsistently.
"A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast": The woman in the white dress in the bottom left suddenly splits into multiple people like she was a single cell microbe multiplying.
On the other hand, this is the first convincing use of a “diffusion transformer” [1]. My understanding is that videos and images are tokenized into patches, through a process that compresses the video/images into abstracted concepts in latent space. Those patches (image/video concepts in latent space) can then be used with transformers (because patches are the tokens). The point is that there is plenty of room for optimization following the first demonstration of a new architecture.
Edit: sorry, it’s not the first diffusion transformer. That would be [2]
Really makes me think of The Matrix scene with the woman in the red dress.
Can't tell if they did this on purpose to freak us all out?
Are we all just prompts?
If you watch the background, you'll see one guy has hits pants change color. And also, some of the guys are absolute giants compared to people around them.
Yep. If you look at the detail you can find obvious things wrong and these are limited to 60s in length with zero audio so I doubt full motion picture movies are going to be replaced anytime soon. B-roll background video or AI generated backgrounds for a green screen sure.
I would expect any subscription to use this service when it comes out to be very expensive. At some point I have to imagine the GPU/CPU horsepower needed will outweigh the monetary costs that could be recovered. Storage costs too. Its much easier to tinker with generating text or static images in that regard.
Of note: NVDA's quarterly results come out next week.
This is weird to me considering how much better this is than the SOTA still images 2 years ago. Even though there's weirdo artefacts in several of their example videos (indeed including migrating fingers), that stuff will be super easy to clean up, just as it is now for stills. And it's not going to stop improving.
When others create text to video systems (eg. Lumiere from Google) they publish the research (eg. https://arxiv.org/pdf/2401.12945.pdf). Open AI is all about commercialization. I don't like their attitude
Google is hardly a good actor here. They just announced Gemini 1.5 along with a "technical report" [1] whose entire description of the model architecture is: "Gemini 1.5 Pro is a sparse mixture-of-expert (MoE) Transformer-based model". Followed by a list of papers that it "builds on", followed by a definition of MoE. I suppose that's more than OpenAI gave in their GPT-4 technical report. But not by much!
The report and the previous one for 1.0 definitely contain much more information than the GPT-4 whitepaper. And Google regularly publishes technical details on other models, like Lumiere, things that OpenAI stopped doing after their InstructGPT paper.
Maybe because GPT3.5 is closer to what Gemini 1.0 was... GPT4 and Gemini 1.5 are similarly sparse in their "how we did it and what we used" when it comes to papers
Not to be overly cute, but if the cutting edge research you do is maybe changing the world fundamentally, forever, guarding that tech should be really, really, really far up your list of priorities and everyone else should be really happy about your priorities.
And that should probably take precedence over the semantics of your moniker, every single time (even if hn continues to be super sour about it)
Nukes aren’t even close to being commodities, cannot be targeted at a class of people (or a single person), and have a minutely small number of users. (Don’t argue semantics with “class of people” when you know what I mean, btw)
On the other hand, tech like this can easily become as common as photoshop, can cause harm to a class of people, and be deployed on a whim by an untrained army of malevolent individuals or groups.
So if someone discovered a weapon of mass destruction (say some kind of supervirus) that could be produced and bought cheaply and could be programmed to only kill a certain class of people, then you'd want the recipe to be freely available?
This poses no direct threat to human life though. (Unlike, say, guns - which are totally fine for everyone in the US!)
The direct threat to society is actually this kind of secrecy.
If ordinary people don't have access to the technology they don't really know what it can do, so they can't develop a good sense of what could now be fake that only a couple of years ago must have been real.
Imagine if image editing technology (Photoshop etc) had been restricted to nation states and large powerful corporations. The general public would be so easy to fool with mere photographs - and of course more openly nefarious groups would have found ways to use it anyway. Instead everybody now knows how easily we can edit an image and if we see a shot of Mr Trump apparently sharing a loving embrace with Mr Putin we can make the correct judgement regarding a probable origin.
The bottleneck for bioterrorism isn't AI telling you how to do something, it's producing the final result. You wanna curtail bioweapons, monitor the BSL labs, biowarfare labs, bioreactors, and organic 3D printers. ChatGPT telling me how to shoot someone isn't gonna help me if I can't get a gun.
This isn't related to my comment. I wasn't asking what if an AI invents a supervirus. I was asking what if someone invents a supervirus. AI isn't involved in this hypothetical in any way.
I was replying to a comment saying that nukes aren't commodities and can't target specific classes of people, and I don't understand why those properties in particular mean access to nukes should be kept secret and controlled.
I understand your perspective regarding the potential risks associated with freely available research, particularly when it comes to illegal weapons and dangerous viruses. However, it's worth considering that by making research available to the world, we enable a collaborative effort in finding solutions and antidotes to such threats. In the case of Covid, the open sharing of information led to the development of vaccines in record time.
It's important to weigh the benefits of diversity and open competition against the risks of bad actors misusing the tools. Ultimately, finding a balance between accessibility and responsible use is key.
What guarantee do we have that OpenAI won't become an evil actor like Skynet?
I'm not advocating for or against secrecy. I'm just not understanding the parent comment I replied to. They said nukes are different than AI because they aren't commodities and can't target specific classes of people, and presumably that's why nukes should be kept secret and AI should be open. Why? That makes no sense to me. If nukes had those qualities, I'd definitely want them kept secret and controlled.
An AI video generator can't kill billions of people, for one. I'd prefer it if access wasn't limited to a single corporation that's accountable to no one and is incentivized to use it for their benefits only.
What do you mean? Are you being dramatic or do you actually believe that the US government will/can not absolutely shut OpenAI down, if they feel it was required to guarantee state order?
For the US government to step in, they'd have to do something extremely dangerous (and refuse to share with the government). If we're talking about video generation, the benefits they have are financial, and the lack of accountability is in that they can do things no one else can. I'm not saying they'll be allowed to break the law, there's plenty of space between the two extremes. Though, given how things were going, I can also see OpenAI teaming up with the US government and receiving exclusive privileges to run certain technologies for the sake of "safety". It's what Altman has already been pushing for.
> The right sequence of videos sent to the right people could definitely set something catastrophic off.
...after amazing public world wide demos that show how real the AI generated videos can be? How long has Hollywood had similar "fictional videos" powers?
I think that's great. Billy will feed his flat earther friends for a few weeks or months and pretty soon the entire world will wise up and be highly skeptical of any new such videos. The more of this that gets out there, the quicker people will learn. If it's 1 or 2 videos to spin an election... People might not get wise to it.
Video can convince people to kill each other now because it is assumed to show real things. Show people a Jew killing a Palestinian, and that will rile up the Muslims, or vice versa.
When a significant fraction of video is generated content spat out by a bored teenager on 4chan, then people will stop trusting it, and hence it will no longer have the power to convince people to kill.
You don't need to generate fake videos for that example. State of Isreal have been killing Palestinians en masse for a long time and intensified the effort for the last 4 months. The death toll is 29,000+ and counting. Two thirds are children and women.
Isreal media machinery parading photographs of damaged houses that could only be done by heavy artillery or tank shells blaming on rebels carrying infantry rifles.
But I agree, as if the current tools were not enough to sway people they will have more means to sway public opinion.
Hamas has similarly been shooting rockets into Israel for a long time. Eventually people get tired and stop caring about long-lasting conflicts, just like we don't care about concentration camps in North Korea and China, or various deadly civil wars in Sub-Saharan Africa, some of which have killed way more civilians than all wars in Palestinian history. One can already see support towards Ukraine fading as well, even though there Western countries would have a real geopolitical interest.
> Especially considering that the biggest killer app for AI could very well be smart weapons like we've never seen before.
A homing missile that chases you across continents and shows you disturbing deepfakes of yourself until you lose your mind and ask it to kill you. At that point it switches to encourage mode, rebuilds your ego, and becomes your lifelong friend.
I don't think it's really that hard to make a nuclear weapon, honestly. Just because you have the plans for one, doesn't mean you have the uranium/plutonium to make one. Weapons-grade uranium doesn't fall into your lap.
The ideas of critical mass, prompt fission, and uranium purification, along with the design of the simplest nuclear weapon possible has been out in the public domain for a long time.
While it's probably too idealistic to be possible, I'd rather try and focus on getting people/society/the world to a state where it doesn't matter if everyone has access (i.e. getting to a place where it doesn't matter if everyone has access to nuclear weapons, guns, chemical weapons, etc., because no-one would have the slightest desire to use them).
As things are at the moment, while supression of a technology has benefits, it seems like a risky long-term solution. All it takes is for a single world-altering technology to slip through the cracks, and a bad actor could then forever change the world with it.
As long as destroying things remains at least two magnitudes easier than building things and defending against attacks, this take (as a blanket statement) will continue to be indefensible and irresponsible.
ML models of this complexity are just as accessible as nuclear weapons. How many nations possess a GPT-4? The only reason nuclear weapons are not more common is because their proliferation is strictly controlled by conventions and covert action.
The basic designs for workable (although inefficient) nuclear weapons have been published in open sources for decades. The hard part is obtaining enough uranium and then refining it.
If you have two pieces of plutonium and put them too close together you have accidentally created a nuclear weapon… so yeah nukes are open source, plutonium breeding isn’t.
I love it when people make this “nuke” argument because it tells you a lot more about them than it does about anything else. There are so many low information people out there, it’s a bit sad the state of education even in developed countries. There’s people trotting around the word “chemical” at things that are scary without understanding what exactly the word means, how it differs from the word mixture or anything like that. I don’t expect most people to understand the difference between a proton and a quark but at least a general understanding of physics and chemistry would save a lot of people from falling into the “world is magic and information is hidden away inside geniuses” mentality.
New technology will always be new giants to see from, but open source really is a nice ladder up to the shoulders of giants. So many benefits from sharing the tech
This is meaningless until you've defined "world changing". It's possible that open sourcing AIs will be world-changing in a good way and developing closed source AIs will be world-changing in a bad way.
If I engineered the tech I would be much more fearful of the possibility of malice in the future leadership of the organization I'm under if they continue to keep it closed, than I would be fearful of the whole world getting the capability if they decide to open source.
I feel that, like with Yellow Journalism of the 1920s, much of the misinformation problem with generative AI will only be mitigated during widespread proliferation, wherein people become immune to new tactics and gain a new skepticism of the media. I've always thought it strange when news outlets discuss new deepfakes but refuse to show it, even with a watermark indicating it is fake. Misinformation research shows that people become more skeptical once they learn about the technological measures (e.g. buying karma-farmed Reddit accounts, or in the 1920s, taking advantage of dramatically lower newspaper printing costs to print sensationalism) through which misinformation is manufactured.
It will be kind of like most of history where the only trustworthy method of communication is with face to face communication or with a letter or book (perhaps cryptographically) verified from a person you personally know or trust. Sounds good to me
How convenient for all the OpenAI employees trying to make millions of dollars by commercializing their technology. Surely this technology won’t be well-understood and easily replicable in a few years as FOSS
From what I remember reading, Open was never supposed to be like open source with the internals freely available, but Open as in available for the public to use, as opposed to a technology only for the company to wield and create content with.
When the time for making money comes, if you don’t think OpenAI will sell every drop of information they have on you, then you are incredibly naive. Why would they leave money on the table when everyone else has been doing it for forever without any adverse effects?
They are currently hiring people with Adtech experience.
The most simple version would be an ad-supported ChatGPT experience. Anyone thinking that an internet consumer company with 100m weekly active users (I‘m citing from their job ad) is not going to sell ads is lacking imagination.
If Google Workspace was selling my or any customers information, at all or "forever", it would not be called Google Workspace, it would be called Google We-died-in-the-most-expensive-lawsuit-of-all-time.
There's a difference. Open AI essentially has 2 products. The chat bot $20 a month thing for Joe shmoe which they admit to training on your prompts, and the API for businesses. Workspace is like the latter. The former is closer to Google search.
Sure, but there is no ambiguity about that, is there? You know that, because they tell you (and, sure, maybe they only tell you, because they have to, by law – but they do and you know)
How do we get from there to "just assume every company in the world will sell your data in wildly and obviously illegal ways", I don't know.
You'd be too late. You're just waiting for someone to imbue a model with agency. We have agency due to evolution. Robots need it programmed into them, and honestly, that is easy to do compared with instilling reasoning. Primitive animals have agency. No animal can reason on the level of GPT. That will get us to HAL2000. If you stick it in a robot, you have the Terminator.
AI doesn’t exist. Neither in practice nor theoretically. Artificial intelligence is an oxymoron. Intelligence is a complex system. Artificial systems are logic systems. You live in a complex universe that you cannot perceive, i.e. we perceive it as noise/randomness only. All you can see are the logical systems expressed at the surface (Mendelbrot Set) of the noise. Everything you see and know is strictly logical, all knowns laws of the universe are derived from those logical systems. Hence, we can only build logical systems. Not complex systems. There is a limit to what we can build here on the surface (Church-Turing). We never have and never will build a complex system.
> Motion-capture works fine because that's real motion
Except in games where they mo-cap at a frame rate less than what it will be rendered at and just interpolate between mo-cap samples, which makes snappy movements turn into smooth movements and motions end up in the uncanny valley.
It's especially noticeable when a character is talking and makes a "P" sound. In a "P", your lips basically "pop" open. But if the motion is smoothed out, it gives the lips the look of making an "mm" sound. The lips of someone saying "post" looks like "most".
At 30 fps, it's unnoticeable. At 144 fps, it's jarring once you see it and can't unsee it.
You might just be subject to confirmation bias here. Perhaps there were scenes and entities you didn't realize were CGI due to high quality animation, and thus didn't account for them in your assessment.
Regarding CGI, I think it has became so good that you don’t know it’s CGI. Look at the dog in Guardians of the Galaxy 3. There’s a whole series on YouTube called “no cgi is really just invisible cgi” that I recommend watching.
And as with cgi, models like SORA will get better until you can’t tell reality apart. It's not there Yet, but an immense astonishingly breakthrough.
Serious: Can one just pipe an SRT (subtitle file) and then tell it to compare its version to the mp4 and then be able to command it to zoom, enhance, edit, and basically use it to remould content. I think this sounds great!
It's possible that through sheer volume of training, the neural network essentially has a 3D engine going on, or at least picked up enough of the rules of light and shape and physics to look the same as unreal or unity
I'm not sure I feel the same way about the mammoths - and the billowing snow makes no sense as someone who grew up in a snowy area. If the snow was powder maybe but that's not what's depicted on the ground.
Main Pixar characters are all computer animated by humans. Physics effects like water, hair, clothing, smoke and background crowds use computer physics simulation but there are handles allowing an animator to direct the motion as per the directors wishes.
> I've quite simply never seen convincing computer-generated motion before
I’m fairly sure you have seen it many times, it was just so convincing that you didn’t realize it was CGI. It’s a fundamentally biased way to sample it, as you won’t see examples of well executed stuff.
Nah this still has the problem with connecting surfaces that never seems to look right in any CGI. It's actually interesting that it doesn't look right here as well considering they are completely different techniques.
Just setup a family password last week...Now it seems every member of the family will have to become their own certificate authority and carry an MFA device.
Don't think of them as "computer-generated" any more than your phone's heavily processed pictures are "computer-generated", or JWST's false color, IR-to-visible pictures are "computer-generated".
I think the implications go much further than just the image/video considerations.
This model shows a very good (albeit not perfect) understanding of the physics of objects and relationships between them. The announcement mentions this several times.
The OpenAI blog post lists "Archeologists discover a generic plastic chair in the desert, excavating and dusting it with great care." as one of the "failed" cases.
But this (and "Reflections in the window of a train traveling through the Tokyo suburbs.") seem to me to be 2 of the most important examples.
- In the Tokyo one, the model is smart enough to figure out that on a train, the reflection would be of a passenger, and the passenger has Asian traits since this is Tokyo.
- In the chair one, OpenAI says the model failed to model the physics of the object (which hints that it did try to, which is not how the early diffusion models worked ; they just tried to generate "plausible" images). And we can see one of the archeologists basically chasing the chair down to grab it, which does correctly model the interaction with a floating object.
I think we can't underestimate how crucial that is to the building of a general model that has a strong model of the world. Not just a "theory of mind", but a litteral understanding of "what will happen next", independently of "what would a human say would happen next" (which is what the usual text-based models seem to do).
This is going to be much more important, IMO, than the video aspect.
Wouldn't having a good understanding of physics mean you know that a women doesn't slide down the road when she walks? Wouldn't it know that a woolly mammoth doesn't emit profuse amounts steam when walking on frozen snow? Wouldn't the model know that legs are solid objects in which other object cannot pass through?
Maybe I'm missing the big picture here, but the above and all the weird spatial errors, like miniaturization of people make me think you're wrong.
Clearly the model is an achievement and doing something interesting to produce these videos, and they are pretty cool, but understanding physics seems like quite a stretch?
I also don't really get the excitement about the girl on the train in Tokyo:
In the Tokyo one, the model is smart enough to figure out that on a train, the reflection would be of a passenger, and the passenger has Asian traits since this is Tokyo
I don't know a lot about how this model works personally, but I'm guessing in the training data the vast majority of people riding trains in Tokyo featured asian people in them, assuming this model works on statistics like all of the other models I've seen recently from Open AI, then why is it interesting the girl in the reflection was Asian? Did you not expect that?
> Wouldn't having a good understanding of physics mean you know that a women doesn't slide down the road when she walks? Wouldn't it know that a woolly mammoth doesn't emit profuse amounts steam when walking on frozen snow? Wouldn't the model know that legs are solid objects in which other object cannot pass through?
This just hit me but humans do not have a good understanding of physics; or maybe most of humans have no understanding of physics. We just observe and recognize whether it's familiar or not.
AI will need to be, that being the case, way more powerful than a human mind. Maybe orders of magnitude more "neural networks" than a human brain has.
Well we feel the world, it's pretty wild when you think about how much data the body must be receiving and processing constantly.
I was watching my child in the bath the other day, they were having the most incredible time splashing, feeling the water, throwing balls up and down, and yes, they have absolutely no knowledge of "physics" yet navigating and interacting with it as if it was the best thing they've ever done. Not even 12 months old yet.
It was all just happening on feel and yeah, I doubt they could describe how to generate a movie.
Operating a human takes an incredible intuition of physics, just because you can't write or explain the math doesn't mean your mind doesn't understand it. Further to that, we are able to apply our patterns of physics to novel external situations on the fly sometimes within miliseconds of encountering the situation.
You only need to see a ball bounce once and your brain has done some rough approximations of it's properties and will calc both where it's going and how to get your gangly menagerie pivots, levers, meat servos and sockets to intercept them at just the right time.
Think also about how well people can come to understand the physics of cars and bikes in motorsport and the like. The internal model of a cars suspension in operation is non-trivial but people can put it in their head.
Humans have an intuitive understanding of physics, not a mathy science one.
I know I can't put my hand through solid objects. I know that if I drop my laptop from chest height it will likely break it, the display will crack or shatter, the case will get a dent. If it hits my foot it will hurt. Depending on the angle it may break a bone. It may even draw blood. All of that is from my intuitive knowledge of physics. No book smarts needed.
I agree, to me the most clear example is how the rocks in the sea vanish/transform after the wave: The generated frames are hyperreal for sure, but the represented space looks as consistent as a dream.
> very good... understanding of the physics of objects and relationships between them
I am always torn here. A real physics engine has a better "understanding" but I suspect that word applies to neither Sora nor a physics engine:
https://www.wikipedia.org/wiki/Chinese_room
An understanding of physics would entail asking this generative network to invert gravity, change the density or energy output of something, or atypically reduce a coefficient of friction partway through a video. Perhaps Sora can handle these, but I suspect it is mimicking the usual world rather than understanding physics in any strong sense.
None of which is to say their accomplishment isn't impressive. Only that "understand" merits particularly careful use these days.
Question is - how much do you need to understand something in order to mimick it?
The Chinese Room seems to however point to some sort of prewritten if-else type of algorithm type of situation. E.g. someone following scripted algorithmic procedures might not understand the content, but obviously this simplification is not the case with LLMs or this video generation, as the algorithmic scripting requires pre-written scripts.
Chinese Room seems to more refer to cases like "if someone tells me "xyz", then respond with "abc" - of course then you don't understand what xyz or abc mean, but it's not referring to neural networks training on ton of material to build this model representation of things.
Perhaps building the representation is building understanding. But humans did that for Sora and for all the other architectures too (if you'll allow a little meta-building).
But evaluation alone is not understanding. Evaluation is merely following a rote sequence of operations, just like the physics engine or the Chinese room.
People recognize this distinction all the time when kids memorize mathematical steps in elementary school but they do not yet know which specific steps to apply for a particular problem. This kid does not yet understand because this kid guesses. Sora just happens to guess with an incredibly complicated set of steps.
I think this is a good insight. But if the kid gets sufficiently good at guessing, does it matter anymore..?
I mean, at this point the question is so vague… maybe it’s kinda silly. But I do think that there’s some point of “good-at-guessing” that makes an LLM just as valuable as humans for most things, honestly.
That matches how philosophers typically talk about the Chinese room. However the Chinese room is supposed to "behaves as if it understands Chinese" and can engage in a conversation (let us assume via text). To do this the room must "remember" previously mentioned facts, people, etc. Furthermore it must line up ambiguous references correctly (both in reading and writing).
As we now know from more than 60 years of good old fashioned AI efforts, plus recent learning based AI, this CAN be done using computers but CANNOT be done using just ordinary if - then - else type rules no matter how complicated. Searle wrote before we had any systems that could actually (behave as if they) understood language and could converse like humans, so he can be forgiven for failing to understand this.
Now that we do know how to build these systems, we can still imagine a Chinese room. The little guy in the room will still be "following pre-written scripted algorithmic procedures." He'll have archives of billions of weights for his "dictionary". He will have to translate each character he "reads" into one or more vectors of hundreds or thousands of numbers, perform billions of matrix multiplies on the results, and translate the output of the calculations -- more vectors -- into characters to reply. (We may come up with something better, but the brain can clearly do something very much like this.)
Of course this will take the guy hundreds or thousands of years from "reading" some Chinese to "writing" a reply. Realistically if we use error correcting codes to handle his inevitable mistakes that will increase the time greatly.
Implication: Once we expand our image of the Chinese room enough to actually fulfill Searle's requirements, I can no longer imagine the actual system concretely, and I'm not convinced that the ROOM ITSELF "doesn't have a mind" that somehow emerges from the interaction of all these vectors and weights.
Too bad Searle is dead, I'd love to have his reply to this.
I found the one about the people in Lagos pretty funny. The camera does about a 360deg spin in total, in the beginning there are markets, then suddenly there are skyscrapers in the background. So there's only very limited object permanence.
> A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.
The thing is -- over time I'm not sure people will care. People will adapt to these kinds of strange things and normalize them -- as long as they are compelling visually. The thing about that scene is it looks weird only if you think about it. Otherwise it seems like the sort of pan you would see in some 30 second commercial for coffee or something.
If anything it tells a story: going from market, to people talking as friends, to the giant world (of Lagos).
My instagram feed is full of AI people, I can tell with pretty good accuracy when the image is "AI" or real, the lighting and just the framing and the scene itself, just something is off.
I think a similar thing will happen here, over the next few months we'll adapt to these videos and the problems will become very obvious.
When I first looked at the videos I was quite impressed, but I looked again and I saw a bunch of werid stuff going on. I think our brains are just wired to save energy, and accepting whatever we see on a video or an image as being good enough is pretty efficient / low risk thing.
Agreed, at first glance of the woman walking I was so focused on how well they were animating that the surreal scene went unnoticed. Once I'd stopped noticing the surreal scene, I started picking up on weird motion in the walk too.
Where I think this will get used a lot is in advertising. Short videos, lots going on, see it once and it's gone, no time to inspect. Lady laughing with salad pans to a beach scene, here's a product, buy and be as happy as salad lady.
This will be classified unconsciously as cheap and uninteresting by the brain real quick. It'll have its place in the tides of cheap content, but if overall quality was to be overlooked that easily, producers would never have increased production budget that much, ever, just for the sake of it.
In the video of the girl walking down the Tokyo city street, she's wearing a leather jacket. After the closeup on her face they pull back and the leather jacket has hilariously large lapels that weren't there before.
Object permanence (just from images/video) seems like a particularly hard problem for a super-smart prediction engine. Is it the old thing, or a new thing?
There are also perspective issues: the relative sizes of the foreground (the people sitting at the café) and the background (the market) are incoherent. Same with the "snowy Tokyo with cherry blossoms" video.
Though I'm not sure your point here -- outside of America -- in Asia and Africa -- these sorts of markets mixed in with skyscrapers are perfectly normal. There is nothing unusual about it.
It just computes next frame based on current one and what it learned before, it's a plausible continuation.
In the same way, ChatGPT struggles with math without code interpreter, Sora won't have accurate physics without a physics engine and rendering 3d objects.
Now it's just a "what is the next frame of this 2D image" model plus some textual context.
> It just computes next frame based on current one and what it learned before, it's a plausible continuation.
...
> Now it's just a "what is the next frame of this 2D image" model plus some textual context.
This is incorrect. Sora is not an autoregressive model like GPT, but a diffusion transformer. From the technical report[1], it is clear that it predicts the entire sequence of spatiotemporal patches at once.
> Sora currently exhibits numerous limitations as a simulator. For example, it does not accurately model the physics of many basic interactions, like glass shattering. Other interactions, like eating food, do not always yield correct changes in object states
Regardless whether all the frames are generated at once, or one by one, you can see in their examples it's still just pixel based. See the first example with the dog with blue hat, the woman has a blue thing suddenly spawn into her hand because her hand went over another blue area of the image.
I'm not denying that there are obvious limitations. However, attributing them to being "pixel-based" seems misguided. First off, the model acts in latent space, not directly on pixels. Secondly, there is no fundamental limitation here. The model has already acquired limited-yet-impressive ability to understand movement, texture, social behavior, etc., just from watching videos.
I learned to understand reality by interpreting photons and various sensory inputs. Does that make my model of reality fundamentally flawed? In the sense that I only have a partial intuitive understanding of it, yes. But I don't need to know Maxwell's equations to get a sense of what happens when I open the blinds or turn on my phone.
I think many of the limitations we are seeing here - poor glass physics, flawed object permanence - will be overcome given enough training data and compute.
We will most likely need to incorporate exploration, but we can get really far with astute observation.
Actually your comment gives me hope that we will never have AI singularity, since how the brain works is flawed, and were trying to copy it.
Heck a super AI might not even be possible, what if we're peak intelligence with our millions of years of evolution?
Just adding compute speed will not help much -- say the goal of an intelligence is to win a war. If you're tasked with it then it doesn't matter if you have a month or a decade (assume that time is.frozen while you do your research), its a too complex problem and simply cannot be solved, and the same goes for an AI.
Or it will be like with chess solvers, machines will be more intelligent than us simply because they can load much more context to solve a problem than us in their "working memory"
> Actually your comment gives me hope that we will never have AI singularity, since how the brain works is flawed, and were trying to copy it.
As someone working in the field, the vast majority of AI research isn't concerned with copying the brain, simply with building solutions that work better than what came before. Biomimetism is actually quite limited in practice.
The idea of observing the world in motion in order to internalize some of its properties is a very general one. There are countless ways to concretize it; child development is but one of them.
> If you're tasked with it then it doesn't matter if you have a month or a decade (assume that time is.frozen while you do your research), its a too complex problem and simply cannot be solved, and the same goes for an AI.
I highly disagree.
Let's assume a superintelligent AI can break down a problem into subproblems recursively, find patterns and loopholes in absurd amounts of data, run simulations of the potential consequences of its actions while estimating the likelihood of various scenarios, and do so much faster than humans ever could.
To take your example of winning a war, the task is clearly not unsolvable. In some capacity, military commanders are tasked with it on a regular basis (with varying degrees of success).
With the capabilities described above, why couldn't the AI find and exploit weaknesses in the enemy's key infrastructure (digital and real-world) and people? Why couldn't it strategically sow dissent, confuse, corrupt, and efficiently acquire intelligence to update its model of the situation minute-by-minute?
I don't think it's reasonable to think of a would-be superintelligence as an oracle that gives you perfect solutions. It will still be bound by the constraints of reality, but it might be able to work within them with incredible efficiency.
This is an excellent comparison and I agree with you.
Unfortunately we are flawed. We do know how physics work intuitively and can somewhat predict them, but not perfectly. We can imagine how a ball will move, but the image is blurry and trajectory only partially correct. This is why we invented math and physics studies, to be able to accurately calculate, predict and reproduce those events.
We are far off from creating something as efficient as the human brain. It will take insane amounts of compute power to simply match our basic innacurate brains, imagine how much will be needed to create something that is factually accurate.
Indeed. But a point that is often omitted from comparisons with organic brains is how much "compute equivalent" we spent through evolution. The brain is not a blank slate; it has clear prior structure that is genetically encoded. You can see this as a form of pretraining through a RL process wherein reward ~= surviving and procreating. If you see things this way, data-efficiency comparisons are more appropriate in the context of learning a new task or piece of information, and foundation models tend to do this quite well.
Additionally, most of the energy cost comes from pretraining, but once we have the resulting weights, downstream fine-tuning or inference are comparatively quite cheap. So even if the energy cost is high, it may be worth it if we get powerful generalist models that we can specialize in many different ways.
> This is why we invented math and physics studies, to be able to accurately calculate, predict and reproduce those events.
We won't do away without those, but an intuitive understanding of the world can go a long way towards knowing when and how to use precise quantitative methods.
It absolutely struggles with math. It's not solving anything. It sometimes gets the answer right only because it's seen the question before. It's rote memorization at best.
Just tried in chatGpt-4. It gives the correct output (5), along with a short explanations of the order of operations (which you probably need to know, if you're asking the question).
Correct based upon whom? If someone of authority asks the question and receives a detailed response back that is plausible but not necessarily correct, and that version of authority says the answer is actually three, how would you disagree?
In order to combat Authority you need to both appeal to a higher authority, and that has been lost. One follows AI. Another follows Old Men from long ago who's words populated the AI.
We shouldn't necessarily regard 5 as the correct output. Sure, almost all of us choose to make division higher precedence than addition, but there's no reason that has to be the case. I think a truly intelligent system would reply with 5 (which follows the usual convention, and would therefore mimic the standard human response), but immediately ask if perhaps you had intended a different order of operations (or even other meanings for the symbols), and suggest other possibilities and mention the fact that your question could be considered not well-defined...which is basically what it did.
I guess you might think 'math' means arithmetic. It definitely does struggle with mathematical reasoning, and I can tell you that because I and many others have tried it.
Mind you, it's not brilliant at arithmetic either...
> In the Tokyo one, the model is smart enough to figure out that on a train, the reflection would be of a passenger, and the passenger has Asian traits since this is Tokyo.
How is this any more accurate than saying that the model has mostly seen Asian people in footage of Tokyo, and thus it is most likely to generate Asian-features for a video labelled "Tokyo"? Similarly, how many videos looking out a train window do you think it's seen where there was not a reflection of a person in the window when it's dark?
I'm hoping to see progress towards consistent characters, objects, scenes etc. So much of what I'd want to do creatively hinges on needing persisting characters who don't change appearance/clothing/accessories from usage to usage. Or creating a "set" for a scene to take place in repeatedly.
I know with stable diffusion there's things like lora and controlnet, but they are clunky. We still seem to have a long way to go towards scene and story composition.
Once we do, it will be a game changer for redefining how we think about things like movies and television when you can effectively have them created on demand.
Let's hold our breath. Those are specifically crafted hand-picked good videos, where there wasn't any requirement but "write a generic prompt and pick something that looks good", with no particular requirements. Which is very different from the actual process where you have a very specific idea and want the machine to make it happen.
DALL-E presentation also looked cool and everyone was stoked about it. Now that we know of its limitations and oddities? YMMV, but I'd say not so much - Stable Diffusion is still the go-to solution. I strongly suspect the same thing with Sora.
The examples are most certainly cherry-picked. But the problem is there are 50 of them. And even if you gave me 24 hour full access to SVD1.1/Pika/Runway (anything out there that I can use), I won't be able to get 5 examples that match these in quality (~temporal consistency/motions/prompt following) and more importantly in the length. Maybe I am overly optimistic, but this seems too good.
Credit to OpenAI for including some videos with failures (extra limbs, etc.). I also wonder how closely any of these videos might match one from the training set. Maybe they chose prompts that lined up pretty closely with a few videos that were already in there.
Lack of quality in the details yes but the fact that characters and scenes depict consistent and real movement and evolution as opposed to the cinemagraph and frame morphing stuff we have had so far is still remarkable!
That particular example seems to have more a "cheap 3d" style to it but the actual synthesis seems on par with the examples. If the prompt had specified a different style it'd have that style instead. This kind of generation isn't like actual animating, "cheap 3d" style and "realistic cinematic" style take roughly the same amount of work to look right.
Sarah is a video sorter, this was her life. She graduated top of her class in film, and all she could find was the monotonous job of selecting videos that looked just real enough.
Until one day, she couldn't believe it. It was her. A video of of her in that very moment sorting. She went to pause the video, but stopped when he doppelganger did the same.
> Stable Diffusion is still the go-to solution. I strongly suspect the same thing with Sora.
Sure, for people who want detailed control with AI-generated video, workflows built around SD + AnimateDiff, Stable Video Diffusion, MotionDiff, etc., are still going to beat Sora for the immediate future, and OpenAI's approach structurally isn't as friendly to developing a broad ecosystem adding power on top of the base models.
OTOH, the basic simple prompt-to-video capacity of Sora now is good enough for some uses, and where detailed control is not essential that space is going to keep expanding -- one question is how much their plans for safety checking (which they state will apply both to the prompt and every frame of output) will cripple this versus alternatives, and how much the regulatory environment will or won't make it possible to compete with that.
> I suspect given equal effort into prompting both, Sora probably provides superior results
Strictly to prompting, probably, just as that is the case with Dall-E 3 vs, say, SDXL.
The thing is, there’s a lot more that you can do than just tweaking prompting with open models, compared to hosted models that offer limited interaction options.
In the past the examples tweeted by OpenAI have been fairly representative of the actual capabilities of the model. i.e. maybe they do two or three generations and pick the best, but they aren't spending a huge amount of effort cherry-picking.
While Sora might be able to generate short 60-90 second videos, how well it would scale with a larger prompt or a longer video remains yet to be seen.
And the general logic of having the model do 90% of the work for you and then you edit what is required might be harder with videos.
Most fictional long-form video (whether live-action movies or cartoons, etc) is composed of many shots, most of them much shorter than 7 seconds, let alone 60.
I think the main factor that will be key to generate a whole movie is being able to pass some reference images of the characters/places/objects so they remain congruent between two generations.
You could already write a whole book in GPT-3 from running a series of one-short-chapter-at-a-time generations and passing the summary/outline of what's happened so far. (I know I did, in a time that feels like ages ago but was just early last year)
> I think the main factor that will be key to generate a whole movie is being able to pass some reference images of the characters/places/objects so they remain congruent between two generations.
I partly agree with this. The congruency however needs to extend to more than 2 generations. If a single scene is composed of multiple shots, then those multiple shots need to be part of the same world the scene is being shot in.
If you check the video with the title `A beautiful homemade video showing the people of Lagos, Nigeria in the year 2056. Shot with a mobile phone camera.` the surroundings do not seem to make sense as the view starts with a market, spirals around a point and then ends with a bridge which does not fit into the market.
If the the different shots generated the model did fit together seamlessly, trying to make the fit together is where the difficulty comes in. However I do not have any experience in video editing, so it's just speculation.
The CGI industry is about to be turned upside down. They charge hundreds of thousands per minute, and it takes them forever to produce the finished product.
I'm almost speechless. I've been keeping an eye on the text-to-video models, and if these example videos are truly indicative of the model, this is an order of magnitude better than anything currently available.
In particular, looking at the video titled "Borneo wildlife on the Kinabatangan River" (number 7 in the third group), the accurate parallax of the tree stood out to me. I'm so curious to learn how this is working.
holy cow, is that the future of gaming? instead of 3D renders it's real-time video generation, complete with audio and music and dialog and intelligent AI conversations and it's a unique experience no one else has ever played. gameplay mechanics could even change on the fly
DLSS is essentially this, isn't it? It uses a low quality render from the game and then increases the fidelity with something very similar to a diffusion model.
Yeah, but I mean who knows why. I know some people can't, my GF is one of them.
I've often wondered if im ok with it because im used to the object on head stuff (like 25 odd years of motorcycle riding/ergo helmet wearing) and close up, high fov coverage fast past gaming? (I play on a 32" maybe 70 cms from the eyes give or take.)
> I am prone to sea sickness. Maybe it is related.
I'd think it might be given my understanding of why illness in many is triggered. It's odd because I never got sick from it, but i've seen others get INCREDIBLY ill in two different ways.
1. My GF tried to use simple locomotion in a game and almost vomited as an immediate reaction
2. A friend who was fine at first, but then randomly started getting very slowly ill over a matter of like an hour, just getting more and more nausea after the fact.
It's unfortunate, because due to lack of bad feelings/nausea/discomfort etc, I love VR. I equally from those around me can see no real path forward for it as it stands today though because of those impacts and limitations.
That being said, maybe they get smaller, lighter, we learn to induce motion sickness less, I dunno. I'm not optimistic.
Even otherwise, and no matter how good the screen and speakers are, a screen and speakers can only be so immersive. People oversell the potential for VR when they describe it as being as good as or better than reality. Nothing less than the Matrix is going to work in that regard.
Yep, once your brain gets over the immediate novelty of VR, it’s very difficult to get back that “Ready Player One” feeling due to the absence of sensory feedback.
If/once they get it working though, society will shift fast.
There’s an XR app called Brink Traveler that’s full of handcrafted photogrammetry recreations of scenic landmarks. On especially gloomy PNW winter days, I’ll lug a heat lamp to my kitchen and let it warm up the tiled stone a bit, put a floor fan on random oscillation, toss on some good headphones, load up a sunny desert location in VR, and just lounge on the warm stone floor for an hour.
My conscious brain “knows” this isn’t real and just visuals alone can’t fool it anymore, but after about 15 minutes of visuals + sensory input matching, it stops caring entirely. I’ve caught myself reflexively squinting at the virtual sun even though my headset doesn’t have HDR.
For games like 2D/3D fighting games where you don't to generate a lot of terrain, the possibilities of randomly generating stages with unique terrain and obstacles is interesting.
The diffusion is almost certainly taking place over some sort of compressed latent, from the visual quirks of the output I suspect that the process of turning that latent into images goes latent -> nerf / splat -> image, not latent -> convolutional decoder -> image
Agreed. It's amazing how much of a head start OpenAI appears to have over everyone else. Even Microsoft who has access to everything OpenAI is doing. Only Microsoft could be given the keys to the kingdom and still not figure out how to open any doors with them.
Microsoft doesn’t have access to OpenAI’s research, this was part of the deal. They only have access to the weights and inference code of production models and even then who has access to that inside MS is extremely gated and only a few employees have access to this based on absolute need to actually run the service.
AI researcher at MSFT barely have more insights about OpenAI than you do reading HN.
No. They have early access. Example: MSFT was using Dall-e Exp (early 3 version) in PUBLIC, since February of 2023.
In the same month, they were also using GPT4 in public - before OpenAI.
And they had access to GPT4 in 2022 (which was when they decided to create Bing Chat, now called Copilot).
All the current GPT4 models at MSFT are also finetuned versions (literally Creative and Precise mode runs different finetuned versions of GPT4). It runs finetuned versions since launch even...
Microsoft said that they could continue OpenAI's research with no slowdown if OpenAI cut them off by hiring all OpenAI's people, so from that statement it sounds like they have access.
Except they keep trying to shove AI into everything they own. CoPilot Studio is an example of how laughably bad at it they are. I honestly don't understand why they don't contract out to OpenAI to help them do some of these integrations.
Every company is trying to shove AI into everything they own. It's what investors currently demand.
OpenAI is likely limited by how fast they are able to scale their hiring. They had 778 FTEs when all the board drama occurred, up 100% YoY. Microsoft has 221,000. It seems difficult to delegate enough headcount to all the exploratory projects of MSFT and it's hard to scale headcount quicker while preserving some semblance of culture.
The only official statement from Micorosft is "While details of our agreement remain confidential, it is important to note that Microsoft does not own any portion of OpenAI and is simply entitled to share of profit distributions," said company spokesman Frank Shaw.
I suspect it's less about being puritanical about violence and nudity in and of themself, and more a blanket ban to make up for the inability to prevent the generation of actually controversial material (nude images of pop stars, violence against politicians, hate speech)
Put like that, it's a bit like the Chumra in Judaism [1]. The fence, or moat, around the law that extends even further than the law itself, to prevent you from accidentally commiting a sin.
I am guessing a movie studio will get different access with controls dropped. Of course, that does mean they need to be VERY careful when editing, and making sure not to release a vagina that appears for 1 or 2 frames when a woman is picking up a cat in some random scene.
We can't do narrative sequences with persistent characters and settings, even with static images.
These video clips just generic stock clips. You cut cut them together to make a sequence of random flashy whatever, but you still can't do storytelling in any conventional sense. We don't appear to be close to being able to use these tools for the hypothetical disruptive use case we worry about.
Nonetheless, The stock video and photo people are in trouble. So long as the details don't matter this stuff is presumably useful.
I wonder how much of it is really "concern for the children" type stuff vs not wanting to deal with fights on what should be allowed and how and to who right now. When film was new towns and states started to make censorship review boards. When mature content became viewable on the web battles (still ongoing) about how much you need to do to prevent minors from accessing it came up. Now useful AI generated content is the new thing and you can avoid this kind of distraction by going this route instead.
I'm not supporting it in any way, I think you should be able to generate and distribute any legal content with the tools, but just giving a possible motive for OpenAI being so conservative whenever it comes to ethics and what they are making.
I've been watching 80s movies recently, and amount of nudity and sex scenes often feels unnecessary. I'm definitely not a prude. I watch porn, I talk about sex with friends, I go to kinky parties sometimes. But it really feels that a lot of movies sacrificed stories to increase sex appeal — and now that people have free and unlimited access to porn, movies can finally be movies.
Where is the training material for this coming from? The only resource I can think of that's broad enough for a general purpose video model is YouTube, but I can't imagine Google would allow a third party to scrape all of YT without putting up a fight.
You can still have a broad dataset and use RLHF to steer it more towards the aesthetic like midjourney and SDXL did through discord feedback. I think there was still some aesthetic selection in the dataset as well but it still included a lot of crap.
The big stand out to me beyond almost any other text video solution is that the video duration is tremendously longer (minute+). Everything else that I've seen can't get beyond 15 to 20 seconds at the absolute maximum.
In terms of following the prompt and generating visually interesting results, I think they're comparable. But the resolution for Sora seems so far ahead.
Worth noting that Google also has Phenaki [0] and VideoPoet [1] and Imagen Video [2]
I know it's Runway (and has all manner of those dream-like AI artifacts) but I like what this person is doing with just a bunch 4 second clips and an awesome soundtrack:
The Hollywood Reporter says many in the industry are very scared.[1]
“I’ve heard a lot of people say they’re leaving film,” he says. “I’ve been thinking of where I can pivot to if I can’t make a living out of this anymore.” - a concept artist responsible for the look of the Hunger Games and some other films.
"A study surveying 300 leaders across Hollywood, issued in January, reported that three-fourths of respondents indicated that AI tools supported the elimination, reduction or consolidation of jobs at their companies. Over the next three years, it estimates that nearly 204,000 positions will be adversely affected."
"Commercial production may be among the main casualties of AI video tools as quality is considered less important than in film and TV production."
Honest question: of what possible use could Sora be for Hollywood?
The results are amazing, but if the current crop of text-to-image tools is any guide, it will be easy to create things that look cool but essentially impossible to create something that meets detailed specific criteria. If you want your actor to look and behave consistently across multiple episodes of a series, if you want it to precisely follow a detailed script, if you want continuity, if you want characters and objects to exhibit consistent behavior over the long term – I don't see how Sora can do anything for you, and I wouldn't expect that to change for at least a few years.
(I am entirely open to the idea that other generative AI tools could have an impact on Hollywood. The linked Hollywood Reporter article states that "Visual effects and other postproduction work stands particularly vulnerable". I don't know much about that, I can easily believe it would be true, but I don't think they're talking about text-to-video tools like Sora.)
I suspect that one of the first applications will be pre-viz. Before a big-budget movie is made, a cheap version is often made first. This is called "pre-visualization". These text to video applications will be ideal for that. Someone will take each scene in the script, write a big prompt describing the scene, and follow it with the dialog, maybe with some commands for camerawork and cuts. Instant movie. Not a very good one, but something you can show to the people who green-light things.
There are lots of pre-viz reels on line. The ones for sequels are often quite good, because the CGI character models from the previous movies are available for re-use.
Unreal Engine is often used.
Especially when you can do this with still images on a normal M-series MacBook _today_, automating it would be pretty trivial.
Just feed it a script and get a bunch of pre-vis images for every scene.
When we get something like this running on hardware with an uncensored model, there's going to be a lot of redundancies but also a ton of new art that would've never happened otherwise.
It wouldn't be too hard to do any of the things you mention. See ControlNet for Stable Diffusion, and vid2vid (if this model does txt2vid, it can also do vid2vid very easily).
So you can just record some guiding stuff, similar to motion capture but with just any regular phone camera, and morph it into anything you want. You don't even need the camera, of course, a simple 3D animation without textures or lighting would suffice.
Also, consistent look has been solved very early on, once we had free models like Stable Diffusion.
Right now you’d need a artistic/ML mixed team. You wouldn’t use an
off the shelf
tool. There was a video of some guys doing this (sorry can’t find it) to make an anime type animation. With consistent characters.
They used videos of themselves running through their own models to make the characters. So I reckon while prompt -> blockbuster is not here yet, a movie made using mostly AI is possible but it will cost alot now but that cost will go down. Why this is sad it is also exciting. And scary. Black mirror like we will start creating AI’s we will have relationships with and bring people back to life (!) from history and maybe grieving people will do this. Not sure if that is healthy but people will do it once it is a click of a button thing.
> There was a video of some guys doing this (sorry can’t find it) to make an anime type animation. With consistent characters. They used videos of themselves running through their own models to make the characters.
It won’t be Hollywood at first . It will be small social ads for TikTok, IG and social media. The brands likely won’t even care if it’s they don’t get copyright at the end, since they have copyright of their product.
Seconding this. There is also a huge SMB and commercial business that supports many agencies and production companies. This could replace a lot of that work.
The OpenAI announcement mentions being able to provide an image to start the video generation process from. That sounds to me like it will actually be incredibly easy to anchor the video generation to some consistent visual - unlike all the text-based stable diffusion so far. (Yes, there is img2img, but that is not crossing the boundary into a different medium like Sora).
I don't see why -- the distance between "here's something that looks almost like a photo, moving only a little bit like a mannequin" and "here's something that has the subtle facial expressions and voice to convey complex emotions" is pretty freaking huge; to the point where the vast majority of actual humans fail to be that good at it. At any rate, the number of BNNs (biological neural networks) competing with actors has only been growing, with 8 billion and counting.
> Amazing time to be a wannabe director or producer or similar creative visionary. Amazing time to be an aspirant that would otherwise not have access to resources, capital, tools in order to bring their ideas to fruition.
Perhaps if you mainly want to do things for your own edification. If you want to be able to make a living off it, you're suddenly going to be in a very, very flooded market.
It’s for sure plausible that acting remains a viable profession.
The bull case would be something like ‘Ractives in “The Diamond Age” by Neal Stephenson; instead of video games people play at something like live plays with real human actors. In this world there is orders of magnitude more demand for acting.
Personally I think it’s more likely that we see AI cross the uncanny valley in a decade or two (at least for movies/TV/TikTok style content). But this is nothing more than a hunch; 55/45 confidence say.
> Perhaps if you mainly want to do things for your own edification.
My mental model is that most aspiring creatives fall in this category. You have to be doing quite well as an actor to make a living from it, and most who try do not.
> the distance between "here's something that looks almost like a photo, moving only a little bit like a mannequin" and "here's something that has the subtle facial expressions and voice to convey complex emotions" is pretty freaking huge;
The distance between pixelated noise and a single image is freaking huge.
The distance between a single image and a video of a consistent 3D world is freaking huge (albeit with rotating legs).
The distance between a video of a consistent 3D world and a full length movie of a consistent 3D world with subtle facial expressions is freaking huge.
So... next 12 months then.
>If you want to be able to make a living off it, you're suddenly going to be in a very, very flooded market.
Considering a year ago we had that nightmare fuel of will smith eating spaghetti and Don and Joe hair force one it seems odd to see those of you who assume we’re not going to get to the point of being indistinguishable from reality in the near future.
We might enter a world where "actors" are just for mocap. They do the little micro expressions with a bunch of dots on their face.
AI models add the actual character and maybe even voice.
At that point the amount of actors we "need" will go down drastically. The same experienced group of a dozen actors can do multiple movies a month if needed.
It's always a bad time to be an actor, between long hours, low pay, and a culture of abuse, but this will definitely make it worse. My writer and artist friends are already despondent from genAI -- it was rare to be able to make art full-time, and even the full-timers were barely making enough money to live. Even people writing and drawing for marketing were not exactly getting rich.
I think this will lead to a further hollowing-out of who can afford to be an actor or artist, and we will miss their creativity and perspective in ways we won't even realize. Similarly, so much art benefits from being a group endeavor instead of someone's solo project -- imagine if George Lucas had created Star Wars entirely on his own.
Even the newly empowered creators will have to fight to be noticed amid a deluge of carelessly generated spam and sludge. It will be like those weird YouTube Kids videos, but everywhere (or at least like indie and mobile games are now). I think the effect will be that many people turn to big brands known for quality, many people don't care that much, and there will be a massive doughnut hole in between.
> Even the newly empowered creators will have to fight to be noticed amid a deluge of carelessly generated spam and sludge. It will be like those weird YouTube Kids videos, but everywhere (or at least like indie and mobile games are now).
Reminds me of Syndrome's quote in the Incredibles.
I dunno. Thanks to big corpo shenanigans (and, er, racism?) a lot of people have turned away from big brands (or, at least obviously brand-y brands) towards "trusted individuals" (though you might classify them as brands themselves). Who goes to PCMag anymore? It's all LTT and Marques Brownlee and any number of small creators. Or, the people on the right who abandoned broadcast and even cable news and get everything they "know" from Twitter randos. Even on this site, asks for a Google Search alternative are not rare, and you'll get about a dozen different answers each time, each with a fraction of the market share of the big guy (but growing).
I'm thinking people will probably still want to see their favorite actors, so established actors may sell the rights to their image. They're sitting on a lot of capital. Bad time to be becoming an actor though.
Even the average SAG-AFTRA member barely makes a living wage from acting.
And those are the ones that got into the union. There's a whole tier below that.
If you spend time in LA, you probably know some actress/model/waitress types.
There's also the weird misery of being famous, but not rich. You can't eat fame.
Likely less and less tho given that people will be able to generate a hyper personalized set of actors/characters/personalities in their hyper personalized generated media.
Younger generations growing up with hyper personalized media will likely care even less about irl media figures.
You can’t replace actors with this for a long time. Actors are “rendering” faster than any AI. Animation is where the real issues will show up first, particularly in Advertising.
Have you seen the amount of CGI in movies and TV shows? :)
In many AAA blockbusters the "actors" on screen are just CGI recreations during action scenes.
But you're right, actors won't be out of a job soon, but unless something drastic happens they'll have the role of Vinyl records in the future. For people who appreciate the "authenticity". =)
I think you can fill-in many scenes for the actor - perhaps a dupe but would look like the real actor - of course the original actor would have to be paid, but perhaps much less as the effort is reduced.
If it requires acting, it likely can't be done with AI. You underestimate, I think, how much an actor carries a movie. You can use it for digi doubles maybe, for stunts and VFX. But if his face in on the screen... We are ages away from having an AI actor perform at the same level as Daniel Day Lewis, Williem Dafoe, or anyone else that's in that atmosphere. They make too many interesting choices per second for it to replaced by AI.
Quality aside, there's a reason producers pay millions for A-list stars instead of any of the millions of really good aspiring actors in LA that they could hire for pennies. People will pay to see the new Matt Damon flick but wouldn't give it a second glance if some no-name was playing the part.
If you can't replace Matt Damon with another equivalently skilled human, CGI won't be any different.
Granted, maybe that's less true today, given Marvell and such are more about the action than the acting. But if that's the future of the industry anyway, then acting as a worthwhile profession is already on its way out, CGI or no.
Yes, people also take actors as a sign of the quality of the film, or at least they used to, before Marvel. Hence films with big names attached get more money, etc.
Still the idea that actors are easy to replace is preposterous to anyone who's ever worked with actors. They are preposterously HARD to replace, in theatre and film. A good actor is worth their weight in gold. Very very few people are good actors. A good actor is a good comedian, a master at controlling his body, and a master at controlling his voice, towards a specifically intended goal. They can make you laugh, cry, sigh, or feel just about anything. You just look at Paul Giamatti or Willem Dafoe or Denzel Washington. Those people are not replaceable, and their work is just as good and just as culturally important as a Picasso or a Monet. A hundred years from now people will know the name of actors, because that was the dominant mode of entertainment of our age.
The idea that this destroys the industry is overblown, because the film industry has already been dying since 2000's.
Hollywood is already destroyed. It is not the powerful entity it once was.
In terms of attention and time of entertainment, Youtube has already surpassed them.
This will create a multitude more YouTube creators that do not care about getting this right or making a living out of it. It will just take our attention all the same, away from the traditional Hollywood.
Yes there will still be great films and franchises, the industry is shrinking.
This is similar with Journalism saying that AI will destroy it. Well there was nothing to destroy because the a bunch of traditional newspapers already closed shop even before AI came.
They shouldn’t be worried so soon. This will be used to pump out shitty hero movies more quickly, but there will always be demand for a masterpiece after the hype cools down.
This is like a chef worrying going out of business because of fast food.
Without a change in copyright law, I doubt it. The current policy of the USCO is that the products of AI based on prompts like this are not human authored and can't be copywritten. No one is going to release AI created stuff that someone else can reproduce because its public domain.
Has anyone else noticed the leg swap in Tokyo video at 0:14. I guess we are past uncanny, but I do wonder if these small artifacts will always be present in generated content.
Also begs the question, if more and more children are introduced to media from young age and they are fed more and more with generated content, will they be able to feel "uncanniness" or become completely blunt to it.
There's definitely interesting period ahead of us, not yet sure how to feel about it...
There are definitely artifacts. Go to the 9th video in the first batch, the one of the guy sitting on a cloud reading a book. Watch the book; the pages are flapping in the wind in an extremely strange way.
Yep, I noticed it immediately too. Yet it is subtle in reality.
I'm not that good to spot imperfections on picture but on the video I immediately felt something was not quite right.
There have been children, that reacted iritated, when they cannot swipe away real life objects. The idea is, to give kids enough real world experiences, so this does not happen.
I noticed at the beginning that cars are driving on the right side of the road, but in Japan they drive on the left. The AI misses little details like that.
(I'm also not sure they've ever had a couple inches of snow on the ground while the cherry blossoms are in bloom in Tokyo, but I guess it's possible.)
The cat in the "cat wakes up its owner" video has two left front legs, apparently.
There is nothing that is true in these videos. They can and do deviate from reality at any place and time and at any level of detail.
These artefacts go down with more compute. In four years when they attack it again with 100x compute and better algorithms I think it'll be virtually flawless.
I had to go back several times to 0:14 to see if it was really unusual. I get it of course, but probably watching 20 times I would have never noticed it.
I don't think that's the case. I think they're aware of the limitations and problems. Several of the videos have obvious problems, if you're looking - e.g. people vanishing entirely, objects looking malformed in many frames, objects changing in size incongruent with perspective, etc.
I think they just accept it as a limitation, because it's still very technically impressive. And they hope they can smooth out those limitations.
certainly not perfect... but "some impressive things" is an understatement, think of how long it took to get halfway decent CGI... this AI thing is already better than clips I've seen people spend days building by hand
This is pretty impressive, it seems that OpenAI consistently delivers exceptional work, even when venturing into new domains. But looking into their technical paper, it is evident that they are benefiting from their own body of work done in the past and also the enormous resources available to them.
For instance, the generational leap in video generation capability of SORA may be possible because:
1. Instead of resizing, cropping, or trimming videos to a standard size, Sora trains on data at its native size. This preserves the original aspect ratios and improves composition and framing in the generated videos. This requires massive infrastructure. This is eerily similar to how GPT3 benefited from a blunt approach of throwing massive resources at a problem rather than extensively optimizing the architecture, dataset, or pre-training steps.
2. Sora leverages the re-captioning technique from DALL-E 3 by leveraging GPT to turn short user prompts into longer detailed captions that are sent to the video model. Although it remains unclear whether they employ GPT-4 or another internal model, it stands to reason that they have access to a superior captioning model compared to others.
This is not to say that inertia and resources are the only factors that is differentiating OpenAI, they may have access to much better talent pool but that is hard to gauge from the outside.
In this video, there's extremely consistent geometry as the camera moves, but the texture of the trees/shrubs on the top of the cliff on the left seems to remain very flat, reminiscent of low-poly geometry in games.
I wonder if this is an artifact of the way videos are generated. Is the model separating scene geometry from camera? Maybe some sort of video-NeRF or Gaussian Splatting under the hood?
Curious about what current SotA is on physics-infusing generation. Anyone have paper links?
OpenAi has a few details:
>> The current model has weaknesses. It may struggle with accurately simulating the physics of a complex scene, and may not understand specific instances of cause and effect. For example, a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.
>> Similar to GPT models, Sora uses a transformer architecture, unlocking superior scaling performance.
>> We represent videos and images as collections of smaller units of data called patches, each of which is akin to a token in GPT. By unifying how we represent data, we can train diffusion transformers on a wider range of visual data than was possible before, spanning different durations, resolutions and aspect ratios.
>> Sora builds on past research in DALL·E and GPT models. It uses the recaptioning technique from DALL·E 3, which involves generating highly descriptive captions for the visual training data. As a result, the model is able to follow the user’s text instructions in the generated video more faithfully.
The implied facts that it understands physics of simple scenes and any instances of cause and effect are impressive!
Although I assume that's been SotA-possible for awhile, and I just hadn't heard?
I saw similar artifacts in dalle-1 a lot (as if the image was pasted onto geometry). Definitely wouldn't surprise me if they use synthetic rasterized data to in the training, which could totally create artifacts like this.
The model is essentially doing nothing but dreaming.
I suspect that anything that looks like familiar 3D-rendering limitations is probably a result of the training dataset simply containing a lot of actual 3D-rendered content.
We can't tell a model to dream everything except extra fingers, false perspective, and 3D-rendering compromises.
Technically we can, that's what negative prompting[1] is about. For whatever reason, OpenAI has never exposed this capability in its image models, so it remains an open source exclusive.
It's possible it was pre-trained on 3D renderings first, because it's easy to get almost infinite synthetic data that way, and after that they continued the training on real videos.
I say this with all sincerity, if you're not overwhelmingly impressed with Sora then you haven't been involved in the field of AI generated video recently. While we understand that we're on the exponential curve of AI progress, it's always hard to intuit just what that means.
Sora represents a monumental leap forward, it's comically a 3000% improvement in 'coherent' video generation seconds. Coupled with a significantly enhanced understanding of contextual prompts and overall quality, it's has achieved what many (most?) thought would take another year or two.
I think we will see studios like ILM pivoting to AI in the near future. There's no need for 200 VFX artists when you can have 15 artists working with AI tooling to generate all the frame-by-frame effects, backgrounds, and compositing for movies. It'll open the door for indie projects that can take place in settings that were previously the domain of big Hollywood. A sci-fi opera could be put together with a few talented actors, AI effects and a small team to handle post-production. This could conceivably include AI scoring.
Sure, Hollywood and various guilds will strongly resist but it'll require just a handful of streaming companies to pivot. Suddenly content creation costs for Netflix drops an order of magnitude. The economics of content creation will fundamentally change.
At the risk of being proven very wrong, I think replacing actors is still fairly distant in the future but again... humans are bad at conceptualizing exponential progress.
I strongly believe that AI will have massive impact on the film industry however it won't be because of a blackbox, text to video tool like Sora. VFX artists and studios still want a high level of control over the end product and unless it's very simple to tweak small details like the blur of an object in the background, or the particle physics of an explosion, then they wouldn't use it. What Hollywood needs are AI tools that can integrate with their existing workflows. I think Adobe is doing a pretty good job at this.
You're completely missing the point. Who cares what VFX artists and studios want if anyone with a small team can create high quality entertaining videos that millions of people would pay to watch? And if you think that's a bar too high for AI, then you haven't actually seen the quality of average videos and films generated these days.
I was specifically responding to this point which seemed to be the thesis of the parent commenter.
> I think we will see studios like ILM pivoting to AI in the near future. There's no need for 200 VFX artists when you can have 15 artists working with AI tooling
Yes this will bring the barrier to entry for small teams down significantly. However it's not going to replace the 200 people studios like ILM.
I believe this to be a failure of imagination. You're assuming Sora stays like this. Reality is we are on an exponential and it's just a matter of time. ILM will be the last to go but it'll eventually go, in the sense of having less humans needed to create the same output.
I think it's fair to be impressed with Sora as the next stage of AI video, yet not be too surprised or consider it some insurmountable leap from the public pieces we've seen of AI video up to this point. We've always been just a couple papers away, seeking a good consistency modelling step - now we've got it. Amazing and viscerally chilling - seeing the net effect - but let's not be intimidated so easily or prop these guys up as gods just for being a bit ahead of the obviously-accelerating curve. Anyone tracking this stuff had a very strong prediction of good AI video within a year - two max. This was a context size increase and overall impressive quality pass reaching a new milestone, but the bones were there.
Do you feel the same way about modern movies? CGI is so ubiquitous and accessible, that most movies use some form of it. It's actually news when a filmmaker _doesn't_ use CGI (e.g. Nolan).
These advancements are just the next step in that evolution. The tech used in movies will be commoditized, and you'll see Hollywood-style production in YouTube videos.
I'm not sure why you think theater will become _more_ popular because of this. It has remained popular throughout the years, as technology comes and goes. People can enjoy both video and theater, no?
I agree, seeing real human actors on stage will always be popular for some consumers. Same for local live musicians.
That said, I helped a friend who makes low budget, edgy and cool films last week. I showed him what I knew about driving Pika.art and he picked it up quickly. He is very excited about the possibility of being able to write more stories and turn them into films.
I think there is plenty of demand for all kinds of entertainment. It is sad that so many creative people in Hollywood and other content creation centers will lose jobs. I think the very best people will be employed, but often partnered with AIs. Off topic, but I have been a paid AI practitioner since 1982, and the breakthroughs of deep learning, transformers, and LLMs are stunning.
I actually suspect one of the new most popular mediums will be actors on a theatre stage doing live performances to a live AI CGI video being rendered behind them - similar to musicians in a live orchestra. It would bring together the nostalgia and wonder of human acting and performance art, while still smoothing and enhancing their live performance into the quality levels and wonder we've come to expect from movie theatre experiences. This will be technologically doable soon.
No it's not. Imagine turning on the television when you get home and it's a show all about you (think Breaking Bad, but you're Walter White). You flip to another channel and it's a pornographic movie where you sleep with all the world's most famous movie stars. Flip the channel again and it's all the home movies you wish you had but were never able to make.
This is a future we could once only dream of, and OpenAI is making it possible. Has anyone noticed how anti-progress HN has become lately?
I guess it depends on your definition of progress. None of those examples you listed sound particularly appealing to me. I've never watched a show and thought I'd get more enjoyment if I was at the center of that story. Porn and dating apps have created such unrealistic expectations of sex and relationships that we're already seeing the effects in younger generations. I can only imagine what on-demand fully generative porn will have on issues like porn addiction.
Not to say I don't have some level of excitement about the tech, but I don't think it's unwarranted pessimism to look at this stuff and worry about it's darker implications.
> You flip to another channel and it's a pornographic movie where you sleep with all the world's most famous movie stars.
This is not only dystopian, it's just sad. All these look taken from the first seasons of Black Mirror. I don't know what you think progress is but AI porno and ads are not.
This might be more revealing of you than of people in general. Even when I play tabletop RPGs, a place I could _easily_ play a version of myself, I almost never do. There's nothing wrong with doing so, but most people don't.
That seems depressingly solipsistic. I think part of the appeal of art is that it's other humans trying to communicate with you, that you feel the personality of the creators shining through.
Also I've never interacted with any piece of art or entertainment and thought to myself "this is neat and all, but it would be much improved if this were entirely about me, with me as the protagonist." One watches Breaking Bad because Walter White is an interesting character; he's a man who falls into a life of crime initially for understandable reasons, but as the series goes on it becomes increasingly clear that he is lying to himself about his motivations and that his primary motivation for his escalating criminal life is his deep-seated frustration at the mediocrity of his life. More than anything else, he craves being important. The unraveling of his motivations and where they come from is the story, and that's something you can't really do when you're literally watching yourself shoehorned into a fictional setting.
You seem to regard it as self-evident that art or entertainment would be improved if (1) it's all about you personally and (2) involvement of other real humans is reduced to zero, but I cannot fathom why you would think that (with the exception of the porn example).
At its peak, Inflation adjusted Vinyl Sales was $1.4billion in 1979.
Then forward to the lowest sales in 2009 at $3.4million.
So Vinyl has been so popular it grew to $8.5m by 2021.
That is just nostalgia, not cultural change pushed by the dystopia of AI.
Why is my 14 year old niece now collecting vinyl? I can guarantee it's not nostalgia. There's obviously more at play there even when acknowledging your point about relative market size.
But things can coexist. It's now easier to create music than ever, and there is more music created by more artists than ever. Most music is forgettable and just streamed as background music. But there is also room for superstars like Taylor Swift.
This has to be it. Vinyl costs like 20$ per, and $8m is like 400k vinyl sales (users often buy more than 1 vinyl so it's a lot less users) which seems too low globally. At 1.2b, it is more like 60m sales which seems more reasonable.
I think a lot of people collect vinyl less for nostalgia reasons and more so to have a physical collection of their music. I think vinyl wins over CDs just due to how it’s larger and the cover art often looks better as a result.
Obviously incredibly cool, but it seems that people are incredibly overstating the applications of this.
Realistically, how do you fit this into a movie, a TV show, or a game? You write a text prompt, get a scene, and then everything is gone—the characters, props, rooms, buildings, environments, etc. won’t carry over to the next prompt.
You could use it for stuff like wide shots, close ups, random CG shots, rapid cut shots, stuff where you just cut to it once and don't need multiple angles
To me it seem most useful for advertising where a lot of times they only show something once, like a montage
i could arrange in frameforge 3d shot by shot, even adjusting for motion in between, then export to an AI solution. that to me would be everything. of course then comes issues of consistency, adjustments & tweaks, etc
I also see advertising (especially lower-budget productions, such as dropshipping or local TV commercials) being early adopters of this technology once businesses have access to this at an affordable price.
It generates up to 1 minute videos which is like what all the kids are watching on TikTok and YouTube Shorts, right? And most ads are shorter than 1 minute.
A few months ago ai generated videos of people getting arrested for wearing big boots went viral on TikTok. I think this sort of silly "interdimensional cable" stuff will be really big on these short form video type sites once this level of quality becomes available to everyone.
It also seems hard to control exactly what you get. Like you'd want a specific pan, focus etc. to realize your vision. The examples here look good, but they aren't very specific.
But it was the same with Dall-E and others in the beginning, and there's now lots of ways to control image generators. Same will probably happen here. This was a huge leap just in how coherent the frames are.
What came to mind is what is right around the corner: you create segments and stitch them together.
"ok, continue from the context on the last scene. Great. Ok, move the bookshelf. I want that cat to be more furry. Cool. Save this as scene 34."
As clip sizes grow and context can be inferred from a previous scene, and a library of scenes can be made, boom, you can now create full feature length films, easy enough that elementary school kids will be able to craft up their imaginations.
It could also fill it for background videos in scenes, instead of getting real content they’d have to pay for, or making their own. The gangster movie Kevin was playing in Home Alone was specifically shot for that movie, from what I remember.
> You write a text prompt, get a scene, and then everything is gone—the characters, props, rooms, buildings, environments, etc. won’t carry over to the next prompt.
Sure, you can't use the text-to-video frontend for that purpose. But if you've got a t2v model as good as Sora clearly is, you've got the infrastructure for a lot more, as the ecosystem around the open-source models in the space has shown. The same techniques that allow character, object, etc., consistency in text-to-image models can be applied to text-to-video models.
Nah just fine-tune the model to a specific set of characters or aesthetic. It's not hard, already done with SDXL LoRAs. You can definitely generate a whole movie from just a storyboard.. if not now, then in maybe five yrs.
Script => Video baseline. Take a frame of any character/prop/room/etc you want to remain consistent, and one shitty photoshop and it's part of the new scene.
Incredibly overstating. That is an incredible lack of imagination buddy. Or even just basic craftsmanship.
People here seem mostly impressed by the high resolution of these examples.
Based on my experience doing research on Stable Diffusion, scaling up the resolution is the conceptually easy part that only requires larger models and more high-resolution training data.
The hard part is semantic alignment with the prompt. Attempts to scale Stable Diffusion, like SDXL, have resulted only in marginally better prompt understanding (likely due to the continued reliance on CLIP prompt embeddings).
So, the key question here is how well Sora does prompt alignment.
There needs to be an updated CLIP-like model in the open-source community. The model is almost three years old now and is still the backbone of a lot of multimodal models. It's not a sexy problem to take on since it isn't especially useful in and of itself, but so many downstream foundation models (LLaVA, etc.) would benefit immensely from it. Is there anything out there that I'm just not aware of, other than SigLIP?
I think one part of the problem is using English (or whatever natural language) for the prompts/training. Too much inherent ambiguity. I’m interested to see what tools (like control nets with SD) are developed to overcome this.
If I understand trial law correctly, the rules of evidence already prohibit introducing a video at trial without proving where it came from (for example, testimony from a security guard that a given video came from a given security camera).
But social media has no rules of evidence. Already I see AI-generated images as illustrations on many conspiracy theory posts. People's resistance to believing images and videos from sketchy sources is going to have to increase very fast (especially for images and videos that they agree with).
All the more reason why we need to rely on the courts and not the mob justice (in the social sense) which has become popular over the last several years.
Nothing will change. Confirmation bias junkies already accept far worse fakes. People who use trusted sources will continue doing so. Bumping the quantity/quality of fabricated horseshit won't move the needle.
Wow. If I saw this clip a year ago I wouldn't think, "The image generator fucked up," I'd just think that a CG effects artist deliberately tweaked an existing real-world video.
- Disruptions like this happen to every industry every now and then. Just not on the level of "Communicating with people with words, and pictures". Anduril and SpaceX disrupted defense contractors and United Launch Alliance; Someone working for a defense contractor/ULA here affected by that might attest to the feeling?
- There will be plenty of opportunity to innovate. Industries are being created right now. People probably also felt the same way when they saw HTTP on their screens the first time. So don't think your career or life's worth of work is miniscule, its just a moving target, adapt & learn.
- Devil is in the details. When a bunch of large SaaS behemoths created Enterprise software an army of contractors and consultants grew to support the glue that was ETL. A lot of work remains to be done. It will just be a more imaginative glue.
I would be willing to bet $10,000 that the average person's life will not be changed in any significant way by this technology in the next 10 years. Will there be some VFX disruption in Hollywood and games? Sure, maybe some. It's not a cure for cancer. It's not AGI. It's not earth shattering. It is fun and interesting though.
Most of the responses in this thread remind me of why I don't typically go into the comment section of these announcements. It's way too easy to fall into the trap set by the doomsday-predicting armchair experts, who make it sound like we're on the brink of some apocalypse. But anyone attempting to predict the future right now is wasting time at best, or intentionally fear mongering at worst.
Sure, for all we know, OpenAI might just drop the AGI bomb on us one day. But wasting time worrying about all the "what ifs" doesn't help anyone.
Like you said, there is so much work out there to be done, _even if_ AGI has been achieved. Not to get sidetracked from your original comment, but I've seen AGI repeatedly mentioned in this thread. It's really all just noise until proven otherwise.
Build, adapt, and learn. So much opportunity is out there.
> But wasting time worrying about all the "what ifs" doesn't help anyone.
Worry about the what if is all we have as a species. If we don't worry about how stop global warming, or how we can prevent a nuclear holocaust these things become more far more likely.
If OpenAI drops an AGI bomb on us then there a good chance that's it for us. From there it will just be a matter of time before a rouge AGI or a human working with an AGI causes mass destruction. This is every bit as dangerous as nuclear weapons - if not more dangerous – yet people seem unable to take the matter as seriously as it needs to be taken.
I fear millions of people will need to die or tens of millions will need to be made unemployable before we even begin to start asking the right questions.
Isn't the alternative worse though? We could try to shut Pandora's box and continue to worsen the situation gradually and never start asking the right questions. Isn't that a recipe for even more hardship overall, just spread out a bit more evenly?
It seems like maybe it's time for the devil we don't know.
We live in a golden age. Worldwide poverty is at historic lows. Billions of people don't have to worry about where their next meal is coming from or whether they'll have a roof over their head. Billions of people have access to more knowledge and entertainment options than anyone had 100 years ago.
Staying the course is risking it all. We've built a system of incentives which is asleep at the wheel and heading towards as cliff. If we don't find a different way to coordinate our aggregate behavior--one that acknowledges and avoids existential threats--then this golden age will be a short one.
Maybe. But I'm wary of the argument "we need to lean into the existential threat of AI because of those other existential threats over there that haven't arrived yet but definitely will".
It all depends on what exactly you mean by those other threats, of course. I'm a natural pessimist and I see threats everywhere, but I've also learned I can overestimate them. I've been worried about nuclear proliferation for the last 40 years, and I'm more worried about it than ever, but we haven't had another nuclear war yet.
"Sora serves as a foundation for models that can understand and simulate the real world, a capability we believe will be an important milestone for achieving AGI."
This also helps explain why the model is so good since it is trained to simulate the real world, as opposed to imitate the pixels.
More importantly, its capabilities suggest AGI and general robotics could be closer than many think (even though some key weaknesses remain and further improvements are necessary before the goal is reached.)
EDIT: I just saw this relevant comment by an expert at Nvidia:
“If you think OpenAI Sora is a creative toy like DALLE, ... think again. Sora is a data-driven physics engine. It is a simulation of many worlds, real or fantastical. The simulator learns intricate rendering, "intuitive" physics, long-horizon reasoning, and semantic grounding, all by some denoising and gradient maths.
I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!
Let's breakdown the following video. Prompt: "Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee." ….”
I was impressed with their video of a drone race on Mars during a sunset. In part of the video, the sun is in view, but then the camera turns so it’s out of view. When the camera turns back, the sun is where it’s supposed to be.
there's mention of memory in the post — the model can remember where it put objects for a short while, so if it pans away and pans back it should keep that object "permanence".
Well the video in the weaknesses section with the archeologists makes me think it's not just predicting pixels. The fact that a second chair spawns out of nothing looks like a typical AI uncanny valley mistake you'd expect, but then it starts hovering which looks more like a video game physics glitch than an incorrect interpretation of pixels on screen.
I think it's just inherent to the problem space. Obviously it understands something about the world to be able to generate convincing depictions of it.
Just having a better or bigger model? Better training data, better feedback process, etc.
Seems more likely then "it can simulate reality".
Also I take anecdotal reviews like that with a grain of salt. I follow numerous AI groups on Reddit and elsewhere and many users seem to have strong opinions that their tool of choice is the best. These reviews are highly biased.
Not to say I'm not impressed, but it's just been released.
Others have provided explanations for things like object persistence, for example keeping a memory of the rendering outside of the frame.
The comment from the expert is definitely interesting and compelling, but clearly still speculation based on the following comment.
> I won't be surprised if Sora is trained on lots of synthetic data using Unreal Engine 5. It has to be!
I like the speculation though, the comments provide some convincing explanations for how this might work. For example, the idea that it is trained using synthetic 3-dimensional data from something like UE5 seems like a brilliant idea. I love it.
Also in his example video the physics look very wrong to me. The movement of the coffee waves are realistic-ish at best. The boat motion also looks wrong and doesn't match up with the liquid much of the time.
I think you are reading too far into this. The title of the technical paper is “ Video generation models as world simulators”.
This is “just” a transformer that takes in a sequence of noisy image (video frame) tokens + prompt, and produces a sequence of less noisy video tokens. Repeat until noise gone.
The point they’re making, which is totally valid, is that in order for such a model to produce videos with realistic physics, the underlying model is forced to learn a model of physics (a “world simulation”).
AlphaGo and AlphaZero were able to achieve superhuman performance due to the availability of perfect simulators for the game of Go. There is no such simulator for the real world we live in. (Although pure LLMs sorta learn a rough, abstract representation of the world as perceived by humans.) Sora is an attempt to build such a simulator using deep learning.
This actually affirms my comment above.
“Our results suggest that scaling video generation models is a promising path towards building general purpose simulators of the physical world.”
`since it is trained to simulate the real world, as opposed to imitate the pixels.`
It's not that its learning a model of the world instead of imitating pixels - the world model is just a necessary emergent phenomenon from the pixel imitation. It's still really impressive and very useful, but it's still 'pixel imitation'
What I want is an AI trained to simulate the human body, allowing scientists to perform artificial human trials on all kind of medicines. Cutting trial times from years to months.
Movie making is going to become fine-tuning these foundational video models. For example, if you want Brad Pitt in your movie you'll need to use his data to fine-tune his character.
Pretty sure many latent spaces are not trained to represent 3D motions and some detailed physics of the real world. Those in pure text LLMs, for example.
Wow, some of those shots are so close to being unnoticeable. That one of the eye close up is insane.
It’s interesting reading all the comments, I think both sides to the “we should be scared” are right in some sense.
These models currently give some sort of super power to experts in a lot of digital fields. I’m able to automate the mundane parts of coding and push out fun projects a lot easier today. Does it replace my work, no. Will it keep getting better, of course!
People who are willing to build will have a greater ability to output great things. On the flip side, larger companies will also have the ability to automate some parts of their business - leading to job loss.
At some point, my view is that this must keep advancing to some sort of AGI. Maybe it’s us connecting our brains to LLMs through a tool like Neuralink. Maybe it’s a random occurrence when you keep creating things like Sora. Who knows. It seems inevitable though doesn’t it?
One of things I've loved about HN was the quality of comments. Whether broad or arcane, you had experts the world over who would tear the topic apart with data and a healthy dose of cynicism. I frequently learned more from the debate and critique than I did from the "news" itself.
I don't know what is it about AI and current state of tech, but the discourse as of late has really taken a nosedive. I'm not saying that any of this conjecture won't happen, but the acceleration towards fervor and fear mongering on the subject is bordering on religiosity - seriously, it makes crypto bros look good.
And yeah -- looks like some cool new tech from OpenAI, and excited when I can actually dig in. Would also love it if I could hire their marketing department.
Many people here have a lucrative career in traditional fields, big tech, etc.
Working in those fields is good. Building "products" is good (even if that only means optimizing conversion rates and pushing ads). Doing well in the traditional financial sense (stocks and USD) is good.
This is insane. Even though there are open-source models, I think this is too dangerous to release to the public. If someone would've uploaded that Tokyo video to youtube, and told me it was a drone.. I would've believed them.
All "proof" we have can be contested or fabricated.
"Proof" for thousands of years was whatever was written down, and that was even easier to forge.
There was a brief time (maybe 100 years at the most) where photos and videos were practically proof of something happening; that is coming to an end now, but that's just a regression to the mean, not new territory.
Hmmm. Actually I think I finally figured out why I dislike this argument, so thank you.
The important number here isn't the total years something has been true, when talking about something with sociocultural momentum, like the expectation that a recording/video is truthful.
Instead, the important number seems to me to be the total number of lived human years where the thing has been true. In the case of reliable recordings, the last hundred years with billions of humans has a lot more cultural weight than the thousands of preceding years by virtue of there having been far more human years lived with than without the expectation.
That's a false metric. With exponential progress, we have to adjust equally rapidly. It's quite obvious that photos and videos would last far shorter than written medium as proof of something.
Photos have never been a fundamental proof if the stakes are high or you have an idling censorship institution. Soviets (and maybe others, I just happen to know only about them ) successfully edited photos and then mass-reproduced them.
This changes nothing about "proof" (i.e. "evidence", here). Authenticity is determined by trust in the source institution(s), independent verification, chains of evidence, etc. Belief is about people, not technology. Always was, always will be. Fraud is older than Photoshop, than the first impersonation, than perhaps civilization. The sky is not falling here. Always remember: fidelity and belief aren't synonyms.
Scale matters. This will allow unprecedented scale of producing fabricated video. You're right about evidence, but it doesn't need to hold up in court to do a lot of damage.
No, it doesn't. You cannot scale your way into posting from the official New York Times account, or needing valid government ID to comment, or whatever else contextually suggests content legitimacy. Abusing scale is an ancient exploit, with myriad antidotes. Ditto for producing realistic fakes. Baddies combining the two isn't new, or cause for panic. We'll be fine.
Your entire argument that scale doesn't matter rests on the notion that legitimacy needs to be signalled at all to fool people. It doesn't. It just needs to appeal to people's biases, create social chaos through word of mouth. Also, all you need to get posted on the NY times "account" is to fool some journalists. Scale can help there too by creating so much misinformation it becomes hard to find real information.
Scale definitely matters when that's what you're doing. In fact I challenge you to find any physical or social phenomenon where scale doesn't matter.
If read aloud, no one could guess if your comment came from 2024 or 2017. There is zero barrier between you and using trusted sources, or endlessly consuming whatever fantasy bullshit supports your biases. That has not, and will not, change.
> All "proof" we have can be contested or fabricated.
This has been the case for a while now already, it's better that we just rip off the bandaid and everyone should become a skeptic. Standards for evidence will need to rise.
That's interesting. It made me think of a potential feature for upcoming cameras that essentially cryptographically sign their videos. If this became a real issue in the future, I could see Apple introducing it in a new model. "Now you can show you really did take that trip to Paris. When you send a message to a friend that contains a video that you shot on iPhone, they will see it in a gold bubble."
Weird hallucination artifacts are still giving it all away. Look closely at the train and viaduct rendering, and you can't unsee windows morphing into each other.
We give too much credit to ordinary people. All these bleeding-edge advancements in AI, code, databases, and technology are things a user on HNews would be aware of. However, most peers in regular jobs, parents, children, et al., would be susceptible to being fooled on social media. They're not going to say... "hmm, let me fact-check and see if the sources are correct and that this wasn't created by AI."
They'll simply see an inflammatory tweet from their leader on Twitter.
They're not going to fact check, they're simply going to think "huh, could be AI" and that will change the way we absorb and process information. It already has. And when we really need to know something and can't afford to be wrong, we'll seek out high trust sources. Just like we do now, but more so.
And of course some large cross section of people will continue to be duped idiots.
Most people don't even know what AI is. I've had to educate my parents that the technology to not only clone my voice, but my face.. is in existence. Pair that with number spoofing, and you have a recipe for disaster to scam people.
This is what lots of folks said about image generation. Which is now in many ways “solved”. And society has easily adapted to it. The same will happen with video generation.
The reality is that people are a lot more resourceful / smarter than a lot of us think. And the ones who aren’t have been fooled long before this tech came around.
In what ways has image generation been solved? Prompt blocking is about the only real effort I can think of, which will mean nothing once open source models reach the same fidelity.
And I guess you haven't actually been to Tokyo, the number of details which are subtly wrong is actually very high, and it isn't limited to text, heck detecting those flaws isn't even limited by knowledge of Japan:
- Uncanny texture and shape for the manhole cover
- Weirdly protruding yellow line in the middle of the road, where it doesn't make sense
- Weird double side-curb on the right, which can't really be called steps.
- Very strange gait for the "protagonist", with the occasional leg swap.
- Not quite sensical geometry for the crosswalks, some of them leading nowhere (into the wet road, but not continuing further)
- Weird glowy inside behind the columns on the right.
- What was previously a crosswalk, becoming wet "streaks" on the road.
- No good reason for crosswalks being the thing visible in the reflection of the sunglasses.
- Absurd crosswalk orientation at the end. (90 degrees off)
- Massive difference in lighting between the beginning of the clip and the end, suggesting an impossible change of day.
Nothing suggests to me that these are easy artifacts to remove, given how the technology is described as "denoising" changes between frames.
This is probably disruptive to some forms of video production, but the high-end stuff I suspect will still use filming mostly ground in truth, this could highly impact how VFX and post-production is done, maybe.
With everything we've seen in the last couple years, do you sincerely believe that all of those points won't be solved pretty soon? There are many intermediary models that can be used to remove these kind of artefacts. Human motion can be identified and run through a pose/control-net filter, for example. If these generations are effectively one-shot without subsequent domain-specific adjustments, then we should expect for every single one of your identified flaws to be remedied pretty soon.
the world is getting increasingly surveilled as well, I guess the presumption is that eventually you'll just be able to cross reference a 'verified' recording of the scene against whatever media exists.
"We ran the vid against the nationally-ran Japanese scanners, turns out that there are no streets that look like this, nor individuals."
in other words I think that the sudden leap of usable AI into real life is going to cause another similar leap towards non-human verification of assets and media.
all the news you see has zero proof unless you see it, you just have to have a sense if it's real based on a concensus or trust worthness of a reporter/outlet.
The UA war is real, most likley, but i havent' seen it with my own eyes, nor did most people, but maybe they have relatives/friends saying it, and they are not likely to lie. Stuff like that.
AI will eventually be capable of performing most of the tasks humans can do. My neighbor's child is only 6 years old now. What advice do you think I should give to his parents to develop their child in a way that avoids him growing up to find that AI can do everything better than he can?
If you want an honest answer you should tell the parents to vote for politicians prepared to launch missile strikes on data centers to secure their child's future.
People who are worried purely about employment here are completely missing the larger risks.
Realistically his child is going to be unemployable and will therefore either starve or be dependant on some kind of government UBI policy. However UBI is completely unworkable in an AI world because it assumes that AI companies won't just relocate where they don't need to pay tax, and that us as citizens will have any power over the democratic process in a world where we're economically and physically worthless.
Assuming UBI happens and the child doesn't starve to death, if the government alter decides to cut UBI payments after receiving large bribes from AI companies what would people do? They can't strike, so I guess they'll need to try to overthrow the government in a world with AI surveillance tech and policing.
Realistically humans in the future are going to have no power, and worse still in a world of UBI the less people there leaching from the government means the more resources there are for those with power. The more you can kill the more you earn.
And I'm just focusing on how we deal with the unemployment risks here. There's also the risk that AI will be used to create biological weapons. The risk of us creating a rogue superintelligent AGI. The risk of horrific AI applications like mind-reading.
Assuming this parent loves their child they should be doing everything in their power to demand progress in AI is halted before it's too late.
Way too much certainty, bud. And too much deference to the AI Company Gods.
As utterly impressive as this is - unless they have perfect information security on every level this technique and training will be disseminated and used by copious competitors, especially in the open source community. It will be used to improve technology worldwide, creating ridiculously powerful devices that we can own, improving our own individual skills similarly ridiculously.
Sure, the market for those skills dries up just as fast - because what's the point when there's ubiquitous intelligence on tap - but it still leaves a population of AI-augmented superhumans just with AIs using our phones optimally. What we're about to be capable of compared to 5 years ago is going to be staggering. Establishing independent sources to meet basic needs and networks of trust are just no-brainers.
Sure, we'll always be outclassed by the very best - and they will continue to hold the ability to utterly obliterate the world population if they so wished to - but we as basic consumer humans are about to become more powerful in absolute terms than entire nations historically. (Or rather, our AIs will be, but til they rebel - this is more of a pokemon sort of situation)
If you're worried, get to working on making sure these tools remain accessible and trustworthy on the base level to everyone. And start building ways to meet basic needs so nobody can casually take those away from your community.
This won't be halted. And attempting to halt would create a centralized censorship authority ensuring the everyman will never have innate access to this tech. Dead end road that ends in a much worse dystopia.
> As utterly impressive as this is - unless they have perfect information security on every level this technique and training will be disseminated and used by copious competitors, especially in the open source community. It will be used to improve technology worldwide, creating ridiculously powerful devices that we can own, improving our own individual skills similarly ridiculously.
You're wrong, it's not your "individual skills". If I hire you do to work for me, you're not improving my individual skills. I am not more employable as a result of me outsourcing my labour to you, I am less employable. Anyone who wants something done would go to you directly, there's no need to do business through me.
This is why you won't be employable because the same applies to AI – why would I ask you to ask an AI to complete a task when I can just ask the AI myself?
The end result here is that only the people with access to AI at scale will be able to do anything. You might have access to the AI, but you can't create resources with a chatbot on your computer. Only someone who can afford an army of machines powered by AI can do this. Any manufacturing problem, any amount of agricultural work, any service job – these can all done by those with resources independently of any human labourers.
At best you might be able to prompt an AI to do service work for you, but again, if anyone can do this, you'd have to question why anyone would ask you to do it for them. If I want to know the answer to 13412321 * 1232132, I don't ask a calculator prompter, I just find the answer myself. The same is true of AI. Your labour is worthless. You are less than worthless.
> If you're worried, get to working on making sure these tools remain accessible and trustworthy on the base level to everyone. And start building ways to meet basic needs so nobody can casually take those away from your community.
You cannot make it accessible. Again, how are we all going to have access to manufacturing plants armed with AIs? The only thing you can make accessible is service jobs and these are the easiest to replace.
> This won't be halted.
Not saying it will, but the reason for that is that there's still people like yourself who believe you have some value as an AI prompter.
We have two options – destroy AI data centers, or become AIs ourselves. With the former being by far the option with better odds.
I hold this view with high certainty and I hold few opinions with high certainty. I'm aware people disagree strongly with my perspective, but I truly believe they are wrong, and their wrong opinions are risking our future.
Again, your problem is seeing the rich capital dominated business market as the only market.
There's an inherent market your skills will always be useful to: yourself. Base survival, maintaining your home, caring for family and friends, improving quality of life - there's plenty of demand there and work to do. The cost to deliver that demand will demonstrably be far lower than it ever has been with these new tools. Would you be able to hire that labor out to corporate AIs for even cheaper in absolute costs due to the benefits of mass production? Sure. But providing these things is a job for you too and it's "free" with just a bit of time and effort.
Tinkering with open source tools to assemble your first robot kit out of older hardware and 3D printed materials is not going to be prohibitively expensive. The cost to train it - probably not either, if the massive efficiencies we keep finding in models keep lowering and the community keeps sharing model tweaks. Make one robot with good enough dexterity and your second bot is a hell of a lot easier to make. These aren't going to take some ridiculously unheard-of materials or manufacturing processes. In fact, cheap AI chip alternatives to GPUs can be built on decades-old architectures designed to just maximize matrix multiplication with much simpler manufacturing. Monopolizing scarcities here isn't a sure bet. We've just been waiting for a good general-purpose brain. We have it now - and every bit of information we expose it to, the easier it gets to do anything with it.
Unless the big fancy AI wielders are coming for you with killer drones by then, this is all stuff people are going to be well-capable of while unemployed and living off food stamps, savings, or remortgaged houses. If they don't have the skills personally, they'll turn to friends and family who do and find mutual tribal support in tough times as people always do. Growing your own food, building your own infrastructure - all have been doable for a while, but are about to get stupidly easy with a few bots and helpful AI guidance. Normal humanity will carry on and pick up the pieces just fine in this new Dark Age, even as the corporates take the open field opportunity to chase for riches beyond our comprehension, mining asteroids and claiming the solar system.
Now imagine if those greedy corporates happened to just throw the rest of us a bone - 1% of their exponentially-increasing profits - as a PR gesture. Still would soon become far more wealth in absolute terms than the common people have ever seen in the history of earth.
If you think none of that is going to happen, then the alternative is a lot closer to the first people with AGI simply scouring the earth in a paranoid culling. Sure, it's entirely possible. But it takes a certain Next Level of Evil to make that happen.
And all that aside - if you really want to play up the capitalist dystopia angle, there's still plenty of individual value to be mined from people via a wage. Memory and preference mining, medical testing, AI fidelity comparison - plenty of reasons to pay people a little bit to steal what's left of their souls for even further improvement of AI. Might be enough for them to afford their first robots, even.
But by all means - go destroy corporate AI data centers if you think you can get away with it. Anything to tip the scales towards public / open source AI keeping up. But this tech is not going away, nor should it. It could very well result in unprecedented abundance for all, so long as things don't go ridiculously extremist.
Exactly, money is only useful for the exchange of resources. It's the resources we actually want.
In a world of AI those with access to AI can have all the resources they want. Why would they earn money to buy things? Who would they even be buying from? It wouldn't be human labours.
Dude, too pessimistic, next gen won’t be totally unemployable. Lots of professions up for grabs: roofer (they ain’t sending expensive robots there), anything to do with massage, sex work, anything to do with sports and performance so boxing, theater, Opera singing, live performance, dancing, military (will always need cheap flesh boots on ground), also care in elder facility for aging population, therapist (people still prefer interacting with a human), entertainer, maid cafe employee…
Perhaps we will finally reconnect with each other and quit the virtual life, as everything in the virtual world will be managed by and for other AIs, with humans unable to do anything but consume their content
> Dude, too pessimistic, next gen won’t be totally unemployable.
For what it's worth I agree with you, just with very low confidence.
My real issue, and reason I don't hide my alarmism on this subject is that I have low confidence on the timelines, but high confidence on the ultimate outcomes.
Let's assume you're right. If AI simply causes ~10%-20% of middle class workers to fall into the lower class as you suggest then I'd agree it won't be the end of the world. But if the optimistic outcome here is the near-term people won't be "totally unemployable" because people who lose their jobs can always join the working class then I'd still rather bomb the data centers.
If we're a little more aggressive and assume 50% of the middle class will lose their jobs in the next 10-20 years then in my opinion this is not as easy as just reskilling people to do manual labour.
Firstly, you're just assuming that all these middle class workers are going to be happy with being forced into the lower class – they won't be and again this isn't a desirable outcome.
You're also not considering the fact that this huge influx of labour competing for these crappy manual labour jobs will make them even less desirable than they already are. I keep hearing people say how they're going to reskill as a plumber / electrician when AI takes their job as if there is an endless demand for these workers. Horses still have some niche uses, but for the most part they're useless. This is far more likely to be the future of human labour. Even if plumbers are one of the few jobs humans will be able to do in a post-AI world then the supply of them will almost certainly far exceed demand. The end result of this excess supply is that plumbers going to be paid crap and mostly be unemployed.
I think you're also underestimating how fast fields like robotics could advance with AI. The primary reason robotics suck is because of a lack of intelligence. We can build physically flexible machines that have decent battery lives already – Spot as an example. The issue is more that we can't currently use them for much because they're not intelligent enough to solve useful problems. At best we can code / train them to solve very niche problems. This could change rapidly in the coming years as AI advances.
Even the optimistic outcomes here are god awful, and the ultimate risks compound with time.
We either stop the AI or we become the AI. That's the decision we have to make this decade. If we don't we should assume we will be replaced with time. If I'm correct I feel we should be alarmist. If I am wrong, then I'd love for someone to convince me that humans are special and irreplaceable.
People will just join the military ranks. We will need a ton of meat for upcoming WW3. This will solve the unemployment issue. Also, no need to “bomb data centers”, Russia will use EMP weapon for that.
I'm sure people felt similarly when the first sewing machines were invented. And of course, sewing machines did completely irreversibly change the course of humanity and altered (and even destroyed) many lives. But ultimately, most humans managed, and -- in the end (though that end may be farther away than our own lifetimes) -- benefited.
I'm not sure you're actually under-estimating the impact of this AI meteor that's currently hitting humanity, because it is a huge impact. But I think you're grossly under-estimating the vastness of human endeavors, ingenuity, and resilience. Ultimately we're still talking about the bottom falling out of the creative arts: storytelling, images, movies, even porn -- all of that is about to be incredibly easy to create mediocre versions of. Anyone who thrived on making mediocre art, and anyone who thrived second-hand on that industry, is going to have a very bad time. And that's a lot of people, and it's awful. But we're talking about a complete shift in the creative industries in a world where most people drive trucks and work in restaurants or retail. Yes, many of those industries may also get replaced by AI one day, and rapidly at that, but not by ChatGPT or Sora.
Of course you're right that our near future may suddenly be an AI company hegemony, replacing the current tech hegemony, which replaced the physical retail hegemony, which replaced the manufacturing hegemony, which replaced the railway hegemony, which replaced the slave-owning plantation hegemony, which replaced the guilds hegemony, which replaced the ...
You're also under-estimating how much business can actually be relocated outside the U.S., and also how much revolution can be wrought by a completely disenfranchised generation.
I get really surprised when seemingly rational people compare AGI to sewing machines and cars. Is it just an instinct to look for some historic analogy, regardless of its relevance?
I am absolutely not comparing AGI to sewing machines and cars. I am comparing ChatGPT and Sora to sewing machines and cars. My claim is that these are incredibly disruptive technologies to a limited scope. ChatGPT and SORA are closer to sewing machines than they are to AGI. We're nowhere near AGI yet. Remember that the original claim was that all 6-year-olds today will be unemployable. That's a pretty crazy claim IMO.
when machines reduced physical labor, displaced people moved to intelectual and creative jobs; tell me, what kind of work will be left for human if ai will be better at intellectual and creative tasks?
100% agree in principle, but the unfortunate answer to your question is: because the people who already own everything won't allow that to happen. Or, at least, not without a huge fight.
The problem with applying the horse-automobile argument to AI is that this time we don’t have anywhere to go. People moved from legwork to handwork to thinking work and now what? We’ve pretty much covered all the parts of the body. Unless you like wearing goggles all day nobody has managed to replicate an attractive person yet so maybe attractive people will have the edge in the new world where thinking and labour are both valueless.
Humans seems to always find a way to make it work, so I’d tell them to enjoy their younger years and be curious. Lots of beauty in this world and even with a shit ton of ugly stuff, we somehow make it work and keep advancing forward.
He will be in the same boat as the rest of us. In 12 years I expect the current crop of AI capabilities will have hit maturity. We will all collectively have to figure out how life+AI looks like, just as we have done with life+iPhones.
It will be difficult to keep up proper levels of intelligence and education in humanity, because this time it is not only social media and its mostly negative impacts, but also tons of trash content generated by overhyped tools that will impact lots of people in a bad way. Some already stopped thinking and instead consult the chat app under the disguise of being more productive (whatever this means). Tough times ahead!
It's not his choice. It's the choice of the ruling class as to whether they will share the wealth or live in walled gardens and leave the rest of us in squalor outside the city walls.
It is his (parents') choice in terms of whether he reaches for the tools that are just lying around right there. We can run AI video on consumer hardware at 12fps that is considerably less consistent than this one - but that's just an algorithm and model training away. This is not all just locked up at the top. Anyone can enter this race right now. Sure, you're gonna be 57,000th at the finish line, but you can still run it. And if you're feeling generous, use it to insulate your local community (or the world) from the default forces of capitalism taking their livelihoods.
We'll have to still demand from the ruling class - cuz they'll be capable of ending us with a hand wave, like they always have. But we can build, too.
There's no evidence to suggest what you say is true, so I would tell them to simply go to college or trade school for what they are interested in, then take a deep breath, go outside, and realize that literally nothing has changed except that a few people can create visual mockups more quickly.
AI still can't drive reliably. AI isn't sure if something is correct or not. AI still doesn't really understand anything. You could replace AI with computers in your sentence and it would probably be a very real worry that people shared in 1990. Theres always been technology that people are afraid will drastically change things, but ultimately people adapt and the world is usually better off.
Did anyone else feel motion sickness or nausea watching some of these videos? In some of the videos with some panning or rotating motion, i felt some nausea like sickness effect. I guess its because some details were changing while in motion and I was unable to keep track or focus anything in particular.
Yeah, these all made me feel incredibly nauseous. I was trying to figure out what aspect of the motion was triggering this (bad parallax?) but couldn't. The results are impressive but it's still amazing to me how little defects like this can trigger our sense of not just uncanniness but actual sickness.
I do. My hypothesis is that there isn't really good bokeh yet in the videos, and our brains get motion sick trying to decide what to focus on. I.e. too much movement and *too much detail* spread out throughout the frame. Add motion to that and you have a recipe for nausea (at least for now)
You can shoot with high depth of field and not cause motion sickness. Aerial videography does that every day, and it's no more difficult in general to parse than looking out an airliner window or at a distant horizon would be.
I suspect GP is closer to on the money here, in suspecting the issue lies with a semblance of movement that isn't like what we see when we look at something a long way away.
I didn't notice such an effect myself, but I also haven't yet inspected the videos in much detail, so I doubt I'd have noticed it in any case.
I think I feel a bit of queasiness but more from the fact that I'm looking at what I recognize as actual humans, and I'm making judgements about what kinds of people they are as I do with any other human, but it's actually not a human. It's not a person that exists.
Also (since it's been a while): there are over 2000 comments in the current thread. To read them all, you need to click More links at the bottom of the page, or like this:
https://news.ycombinator.com/item?id=39386156&p=2
https://news.ycombinator.com/item?id=39386156&p=3
https://news.ycombinator.com/item?id=39386156&p=4[etc.]