Hacker News new | past | comments | ask | show | jobs | submit login

This is insane. But I'm impressed most of all by the quality of motion. I've quite simply never seen convincing computer-generated motion before. Just look at the way the wooly mammoths connect with the ground, and their lumbering mass feels real.

Motion-capture works fine because that's real motion, but every time people try to animate humans and animals, even in big-budget CGI movies, it's always ultimately obviously fake. There are so many subtle things that happen in terms of acceleration and deceleration of all of the different parts of an organism, that no animator ever gets it 100% right. No animation algorithm gets it to a point where it's believable, just where it's "less bad".

But these videos seem to be getting it entirely believable for both people and animals. Which is wild.

And then of course, not to mention that these are entirely believable 3D spaces, with seemingly full object permanence. As opposed to other efforts I've seen which are basically briefly animating a 2D scene to make it seem vaguely 3D.




I disagree, just look at the legs of the woman in the first video. First she seems to be limping, than the legs rotate. The mammoth are totally uncanny for me as its both running and walking at the same time.

Don't get me wrong, it is impressive. But I think many people will be very uncomfortable with such motion very quickly. Same story as the fingers before.


> I think many people will be very uncomfortable with such motion very quickly

So... I think OP's point stands. (impressive, surpasses human/algorithmic animation thus far).

You're also right. There are "tells." But, a tell isn't a tell until we've seen it a few times.

Jaron Lanier makes a point about novel technology. The first gramophone users thought it sounded identical to live orchestra. When very early films depicting a train coming towards a camera, and people fell out of their chairs... Blurry black and white, super slow frame rate projected on a bedsheet.

Early 3d animation was mindblowing in the 90s. Now it seems like a marionette show. Well... I suppose there was a time when marionette shows were not campy. They probably looked magic.

It seems we need some experience before we internalize the tells and it starts to look fake. My own eye for CG images seems to improving faster then the quality. We're all learning to recognize GPT generated text. I'm sure these motion captures will look more fake to us soon.

That said... the fact that we're having this discussion proves that what we have here is "novel." We're looking at a breakthrough in motion/animation.

Also, I'm not sure "real" is necessary. For games or film what we need is rich and believable, not real.


> You're also right. There are "tells." But, a tell isn't a tell until we've seen it a few times.

Once you have seen a few you can tell instantly. They all move at 2 keyframes per second, that makes all movements seem alien and everything in an image moves strangely in sync. The dog moves in slow motion since they need more keyframes etc. That street some looks like they move in slow motion and others not.

People will quickly learn to notice those issues, they aren't even subtle once you are aware of them, not to mention the disappearing things etc.

And that wouldn't be very easy to fix, they need to train it on keyframes because training frame by frame is too much.

But that should make this really easy for others to replicate. You just train on keyframes and then train a model to fill in between keyframes, and you get this. It has some limitations as we see with movement keeping the same pace in every video, but there are a lot of cool results from it anyway.


I have a friend who has worked on many generations of video compression over the last 20 years. He would rather watch a movie on film without effects than anything on a TV or digital theater. He's trained himself to spot defects and now even with the latest HEVC H.265 he finds it impossible to enjoy. It's artifacts all the way down and the work never ends. At the superbowl he was obsessed with blocking for fast objects, screen edge artifacts, flat field colors, and something with the grass.

Luckily, I think he'll retire sooner than later, and maybe it will get better then.


I think a lot of these issues could be "solved" by lowering the resolution, using a low quality compression algorithm, and trimming clips down to under 10 seconds.

And by solved, I mean they'll create convincing clips that'll be hard for people to dismiss unless they're really looking closely. I think it's only a matter of time until fake video clips lead to real life outrage and violence. This tech is going to be militarized before we know it.


Yeah, we are very close to losing video as a source of truth.

I showed these demos to my partner yesterday and she was upset about how real AI has become, how little we will be able to trust what we see in the future. Authoritative sources will be more valuable, but they themselves may struggle to publish only the facts and none of the fiction.

Here's one possible military / political use:

The commander of Russia's Black Sea Fleet, Viktor Sokolov, is widely believed to have been killed by a missile strike on 22 September 2023. https://en.wikipedia.org/wiki/Viktor_Sokolov_(naval_officer)

Russian authorities refute his death and have released proof of life footage, which may be doctored or taken before his death. Authoritative source Wikipedia is not much help in establishing truth here, because without proof of death they must default to toeing the official line.

I predict that in the coming months Sokolov (who just yesterday was removed from his post) will re-emerge in the video realm, and go on to have a glorious career. Resurrecting dead heroes is a perfect use of this tech, for states where feeding people lies is preferable to arming them with the truth.

Sokolov may even go on to be the next Russian President.


> Yeah, we are very close to losing video as a source of truth.

I think this way of thinking is distracted. No type of media has ever been a source of truth in itself. Videos have been edited convincingly for a long time, and people can lie about their context or cut them in a way that flips their meaning.

Text is the easiest media to lie on, you can freely just make stuff up as you go, yet we don't say "we cannot trust written text anymore".

Well yeah duh, you can trust no type of media just because it is formatted in a certain way. We arrive at the truth by using multiple sources and judging the sources' track records of the past. AI is not going to change how sourcing works. It might be easier to fool people who have no media literacy, but those people have always been a problem for society.


Text was never looked at a source of truth like video was. If you messaged someone something, they wouldn't necessarily believe it. But if you sent them a video of that something, they would feel that they would have no choice but to believe that something.

> Well yeah duh, you can trust no type of media just because it is formatted in a certain way

Maybe you wouldn't, but the layperson probably would.

> We arrive at the truth by using multiple sources and judging the sources' track records of the past

Again, this is something that the ideal person would, not the average layperson. Almost nobody would go through all that to decide if they want to believe something or not. Presenting them a video of this sometjing would've been a surefire way to force them to believe it though, at least before Sora.

> people have always been a problem for society

Unrelated, but I think this attitude is by far the bigger "problem for society". It encourages us to look down on some people even when we do not know their circumstances or reasons, all for an extremely trivial matter. It encourages gatekeeping and hostility, and I think that kind of attitude is at least as detrimental to society as people with no media literacy.


During a significant part of history, text was definitely considered a source of truth, at least to the extent a lot of people see video now. A fancy recommendation letter from a noble would get you far. It makes sense because if you forge it, that means you had to invest significant amount of effort and therefore you had to plan the deception. It's a different kind of behavior than just lying on a whim.

But even then, as nowadays, people didn't trust the medium absolutely. The possibility of forgery was real, as it has been with the video, even before generative AI.


To back up this claim, when fictional novels first became a literary format in the Western world, there was immense consternation about the fact that un-true things were being said in text. It actually took a while for authors to start writing in anything besides formats that mimicked non-fictional writing (letters, diary entries, etc.).


> No type of media has ever been a source of truth in itself.

'pics or it didn't happen' has been a thing (possibly) until very recently for good reason.


And they've been doctored almost as long as photography has been around: https://en.wikipedia.org/wiki/Censorship_of_images_in_the_So...


As has been pointed ad nauseam by now, no one's suggesting that AI unlocks the ability to doctor images; they're suggesting that it makes it trivially easy for anyone, no matter how unskilled, to do so.

I really find this constant back and forth exhausting. It's always the same conversation: '(gen)AI makes it easy to create lots of fake news and disinformation etc.' --> 'but we've always been able to do that. have you not guys not heard of photoshop?' --> 'yes, but not on this scale this quickly. can you not see the difference?'

Anyway, my original point was simply to say that a lot of people have (rightly or wrongly) indeed taken photographic evidence seriously, even in the age of photographic manipulation (which as you point out, pretty much coincides with the age of photography itself).


> Videos have been edited convincingly for a long time,

You are right but the thing with this is the speed and ease with which you can generate something completely fake.


> Yeah, we are very close to losing video as a source of truth.

Why have you been trusting videos? The only difference is that the cost will decrease.

Haven't you seen Holywood movies? CGI has been convincing enough for a decade. Just add some compression and shaky mobile cam and it would be impossible to tell the difference on anything.


Of course, any video could be a fake, it's a question of the cost, and corresponding likelihood of that being the case.


Hell, some people have been doubting moon landing videos for even longer now. Video wasn't a reliable source since its inception.


The truth is to be found in sources not the content itself.

Every piece of information should have "how do you know?" question attached.


> Yeah, we are very close to losing video as a source of truth.

We've been living in a post-truth society for a while now. Thanks to "the algorithm" interacting with basic human behavior, you can find something somewhere that will tell you anything is true. You'll even find a community of people who'll be more than happy to feed your personal echo chamber -- downvoting & blocking any objections and upvoting and encouraging anything that feeds the beast.

And this doesn't just apply to "dumb people" or "the others", it applies to the very people reading this forum right now. You and me and everybody here lives in their safe, sound truth bubble. Don't like what people tell you? Just find somebody or something that will assure you that whatever it is you think, you are thinking the truth. No, everybody is the asshole who is wrong. Fuck those pond scum spreaders of "misinformation".

It could be a blog, it could be some AI generated video, it could even be "esteemed" newspapers like the New York Times or NPR. Everybody thinks their truth is the correct one and thanks to the selective power of the internet, we can all believe whatever truth we want. And honestly, at this point, I am suspecting there might not be any kind of ground truth. It's bullshit all the way down.


so where do we go from here? the moon landing was faked, we're ruled by lizard people, and there are microchips in the vaccine. at some level, you can believe what you want to believe, and if the checkout clerk thinks the moon is made of cheese, it makes no difference to me, I still get my groceries. but for things like nuclear fusion, are we actually making progress on it or is it also a delusion. where the rubber meets the road is how money gets spent on building big projects. is JWST bullshit? is the LHC? ITER? GPS?

we need ground truths for these things to actually function. how else can things work together?


I've always found that take quite ridiculous. Fake videos have existed for a long time. This technology reduces the effort required but if we're talking about state actors that was never an issue to begin with.

People already know that video cannot be taken at face value. Lord of the rings didn't make anyone belive orcs really exist.


> This technology reduces the effort required

Which is a huge deal. It’s absurd to brush that off.

> People already know that video cannot be taken at face value.

No, no they do not. People don’t even know to not take photos at face value, let alone video.

https://www.forbes.com/sites/mattnovak/2023/03/26/that-viral...


Lord of the Rings had a budget in the high millions and took years to make with a massive advertising campaign.

Riots happen due to out of context video clips. Violence happens due to people seeing grainy phone videos and acting on it immediately. We're reaching a point where these videos can be automatically generated instantly by anyone. If you can't see the difference between anyone with a grudge generating a video that looks realistic enough, and something that requires hundreds of millions of dollars and hundreds of employees to attain similar quality, then you're simply lying.


A key difference in the current trajectory is its becoming feasible to generate highly targeted content down to an individual level. This can also be achieved without state actor level resources or the time delays needed to traditionally implement, regardless of budget. The fact it could also be automated is mildly terrifying.


Coordinated campaigns of hate through the mass media - like kicking up war fever before any major war you care to name - is far more concerning and has already been with us for about a century. Look at WWII and what Hitler was doing with it for a clearest example; propaganda was the name of the game. The techniques haven't gone anywhere.

If anything, making it cheap enough that people have to dismiss video footage might soften the impact. It is interesting how the internet is making it much harder for the mass media to peddle unchallenged lies or slanted perspectives. This tech might counter-intuitively make it harder again.


I have no doubt trust levels will adjust, eventually. The challenge is that takes a non-trivial amount of time.

It's still an issue with traditional mass media. See basically any political environment where the Murdoch media empire is active. The long tail of (I hate myself for this terminology, but hey, it's HN) 'legacy humans' still vote and have a very real affect on society.


It's funny you mention LotR, because the vast vast vast majority of the character effects were practical (at least in the original trilogy). They were in fact, entirely real, even if they were not true to life.


You can still be enraged by things you know are not real. You can reason about your emotional response, but it's much harder to prevent an emotional response from happening in the first place.


... and learning to prevent emotional response means unlearning to be human, like burnt out people.

The only winning move is to not watch.


You can have an emotional response and still act rationally.


The issue is not even so much generating fake videos as creating plausible deniability. Now everything can be questioned for the pure reason of seeming AI-generated.


Yeah, it looks good at first glance. Also the fingers are still weird. And I suppose for every somewhat working vid, there were dozens of garbage. At least that was my experience with image generation.

I don't believe, movie makers are out of buisness any time soon. They will have to incorporate it though. So far this can make convincing background scenery.


> I don't believe, movie makers are out of business any time soon

My son was learning how to play keyboard and he started practicing based on metronome. At some point, I was thinking, why is he learning it at all? We can program which key to be pressed at what point in time, and then a software can play itself! Why bother?

Then it hit me! Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.


Isn't it obvious? Life is about experiences and enjoyment, all of this tech is fun and novel and interesting but realistically, it's really exciting for tech people because it's going to be used to make more computer games, social media posts and advertisements, essentially, it's exciting because it's going to "make money".

Outside of that, people just want to know what it feels like to be able to play their favorite song on guitar and to go skiing etc.

Being perfect at everything would be honestly boring as shit.


I completely agree. There is more to a product than the final result. People who don't play an instrument see music I terms of money. (Hint: there's no money in music). But those who play know that the pleasure is in the playing, and jamming with your mates. Recording and selling are work, not pleasure.

This is true for literally every hobby people do for fun. I am learning ceramics. Everything I've ever made could be bought in a shop for a 100th of the cost, and would be 100 times "better". But I enjoy making the pot, and it's worth more to me than some factory item.

Sona will allow a new hobby, and lots will have fun with it. Pros will still need to fo Pro things. Not everything has to be viewed through the lens of money.


You articulated what I wanted to add to this thread -- thank you!

I play the piano, and even though MIDI exists, I still derive a lot of enjoyment from playing an acoustic instrument.


I like this saying: “The woods would be very silent if no birds sang except those who sang the best.” It's fun learning to play the instrument.


I think it's not. If musicians and only musicians wanted themselves behind instruments, for the sake of being, there should be a market for autogenerated self-playing music machines for their former patrons who wouldn't care. And that's not the case; the market for ambient sound machines is small. It takes equal or more insanity to have one at home than, say, having a military armored car in the garage.

On the other hand you've probably heard of an iPod, which I think I could describe as a device dedicated to give false sense of an ever-present musician, so to speak.

So, "they" in "they still want a person behind the piano" is not just limited to hobbyists and enthusiasts. People wants people behind an instrument, for some reason. People pays for others' suffering, not for a thing's peculiarity.


I don't think this is entirely accurate. There are entire genres of music where the audience does not want a person behind the piano/guitar/drums. Plenty of electronic artists have tried the live band gimmick and while it goes down well with a certain segment of the audience, it turns off another segment that doesn't want to hear "humanized" cover versions of the material. But the point is that both of those audiences exist, and they both have lots of opportunity to hear the music they want to hear. The same will be true of visual art created by computers. Some people will prefer a stronger machine element, other people will prefer a stronger human element, and there is room for us all.


I don't think this is entirely accurate. There are entire genres of music where the audience does not want a person behind the piano/guitar/drums.

Hilariously, nearly every electronic artist I can think of, stands in front of a crowd and "plays "live" by twisting dials etc, so I think it's fairly accurate.

Carl Cox, Tycho, Aphex Twin, Chemical Brothers, Underworld, to name a few.


DJ performances far outnumber "live" performances in the electronic scene. Perhaps you can cherry-pick certain DJs and make a point that they are creating a new musical composition by live-remixing the tracks they play, but even then a significant number of clubbers don't care, they just want to dance to the music. There are venues where a bunch of the audience can't even see the DJ and they still dance because they are enjoying the music on its own merits.

I stand by my original point. There are plenty of people who really do not care if there is a human somewhere "performing" the music or not. And that's totally fine.


If there is no human performing there, then it's a completely different event, so I actually have little idea what we're debating.


Your reasoning is circular. Humans who go to performances of other humans playing instruments enjoy seeing other humans playing instruments. That should not be surprising. The question is whether humans as a whole intrinsically prefer seeing other humans playing instruments over hearing a "perfect" machine reproduction. And the answer to that question is no. There are plenty of humans who really do prefer the machine reproduction.


If you're still talking about whether people want to hear live covers, or recordings, I think it's an apples to oranges comparison therefore I don't see the point in it.


Why does the DJ need to be there, in such a case?


Mainly to pick songs that fit the mood of the audience. At the moment, humans seem to do a better job "reading" the emotions of other humans in this kind of group setting than computers do, and people are willing to pay for experts who have that skill.

An ML model could probably do a good job at selecting tunes of a particular genre that fit into a pre-defined "journey" that the promoter is trying to construct, so I could see a role for "AI DJs" in the future, especially for low budget parties during unpopular timeslots like first day of a festival while people are still arriving and the crew is still setting up. Some of that is already done by just chucking a smart playlist on shuffle. But then you also have up-and-comer or hobbyist DJs who will play for free in those slots, so maybe there's not really a need for a smarter computer to take over the job.

This whole thread started from the question of why a human should do something when a machine can do it better. And the answer is simple: because humans like to do stuff. It is not because humans doing stuff adds some kind of hand-wavey X factor that other humans intrinsically prefer.


> Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that.

What do you judge was the ratio of automated music (recordings played back) to live music played in the last year?


Just to be clear, I was talking about the original sound produced by a person (vs. a machine). Of course it was recorded and played back a _lot_ more than folks listening live.

But I take it, maybe I'm not so familiar with world music, I was talking more about Indian music. While the music is recorded and mixed across several tracks electronically, I think most of it is played (or sang) originally by a person.


His point still stands.

In the US atleast there's the occasional acoustic song that becomes a hit, but rock music is obviously on its way to slowly becoming jazz status. It and country are really the last genres where live traditional instruments are common during live performances. Pop, Hip Hop, and EDM basically all are put together as being nearly computer perfect.

All the great producers can play instruments, and that's often times the best way to get a section out initially. But what you hear on Spotify is more and more meticulously put together note by note on a computer after the fact.

Live instruments on stage are now often for spectacle or worse a gimmick, and it's not the song people came to love. I think the future will have people like Lionclad[1] in it pushing what it means to perform live, but I expect them to become fewer and fewer as music just gets more complex to produce overall.

[1] https://www.youtube.com/watch?v=MuBas80oGEU


Thankfully, art is not about the least common denominator and I'm confident that there will continue to be music played live as long as humanity exists.


Music has a lot of people who believe that not only is their favorite genre the best but that they must tear down people who don't appreciate it.

You aren't better because you prefer live music, you just have a preference. Music wasn't better some arbitrary number of years ago, you just have a preference.

Nobody said one form is objectively better, just that there is a form that is becoming more popular.

But to state my opinion, I can't imagine something more boring than thinking the best of music, performance, TV, or media in general was done best and created in the past.


It's not that I think my tastes in music are objectively better, it's that I strongly feel that music is a very personal matter for many people and there will be enough people who will seek out different forms of music than what is "popular". Rock, jazz, even classical music, are still alive and well.

> But to state my opinion, I can't imagine something more boring than thinking the best of music, performance, TV, or media in general was done best and created in the past.

And to state my opinion, art isn't about "the best" or any sort of progress, it's about the way we humans experience the world, something I consider to be a timeless preoccupation, which is why a song from 2024 can be equally touching as a ballad from the 14th century.


When I was studying music technology and using state of the art software synthesizers and sequencers, I got more and more into playing my acoustic guitar. There's a deep and direct connection and a pleasure that comes with it that computers (and now/eventually AI) will never be able to match.

(That being said, a realtime AI-based bandmate could be interesting...)


My son is an interesting example of this, I can play all the best guitar music on earth via the speakers, but when I physically get the guitar out and strum it, he sits up like he has just seen god, and is total awe of the sounds of it, the feel of the guitar and the site of it. It's like nothing else can compare. Even if he is hysterically crying, the physical isntrument and the sound of it just makes him calm right down.

I wonder if something is lost in the recording process that just cannot be replicated? A live instrument is something that you can actually feel the sound of IMO, I've never felt the same with recorded music even though I of course enjoy it.

I wonder if when we get older we just get kind of "bored" (sadly) and it doesn't mean as much to us as it probably should.


Mirror neurons?


What does this have to do with it?


I'm speculating that one would have more mirror neuron activation watching a person perform live, compared to listening to a recording or watching a video. Thus the missing component that makes live performance special.


The sound feels present with live music. Speakers have this synthetic far away feel no matter how good they are.


What about live music on non-acoustic instruments so it inherently comes through a speaker?


My son isn't even a toddler so I don't think it would possibly be "mirror neurons".


For me the guitar is like the keyboard I am writing on right now. It will never be replaced, because that is how I input music into the world. I could not program that, I was doing tracker music as a teenager, and all of the songs sounded weird, because the timing, and so on is not right. And now when I transcribe demos, and put them into a DAW, there seem to be the milliseconds off, that are not quite right. I still play the piano parts live, because we don't have the technology right now to make it sound better than a human, and even if we had, it would not be my music, but what an AI performed.


I really briefly looked at AI in music, lots of wild things are made. It is hard to explain, one was generating a bunch of sliders after mimicking a sample from sine waves (quite accurately)


> Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.

This actually happened on a recent hit, too -- Dua Lipa's Break My Heart. They originally had a drum machine, but then brought in Chad Smith to actually play the drums for it.

Edit: I'm not claiming this was new or unusual, just providing a recent example.


This goes way back. Nine Inch Nails was a synth-first band with the music being written by Trent in a studio on a DAW. That worked but what really made the bad was live shows so they found ways even using 2 drummers to translate the synths and machines into human-plated instruments.

Also way before that back in the early 80’a Depeche Mode displayed the recorded drumb-reel onstage so everyone knew what it was, but when the got big enough they also transitioned into an epic live show with guitars and live drum a as well as synth-hooked drums devices they could bag on in addition to keyboards.

We are human. We want humans. Same reason I want a hipster barista to pour my coffee when a machine could do it just as well.


Same reason I want a hipster barista to pour my coffee when a machine could do it just as well.

I've wondered about this for a long time too, why on earth is anyone still able to be a barista, it turns out, people actually like the community around cafes and often that means interacting with the staff on a personal level.

Some of my best friends have been barista's I've gone to over several years.


Back before Twitter was born, or perhaps tv, cafes were just that - a place to spend evenings (…just don’t ask who watched over the kids)


It’s more than that, doing it well is still beyond sophisticated automation. Many variables that need do be constantly adjusted for. Humans are still much better at it than machines, regardless of the social element.


If true, probably not for long. Still my point is people are customer. It’s more fun to think about what won’t change. I think we will still have baristas.


A good live performance is intentionally not 100% the same as in the studio, but there can and should be variations. A refrain repeated another time, some improvisation here. Playing with the tempo there. It takes a good band, who know each other intimately, to make that work, though. (a good DJ can also do this with electronic music)

A recorded studio version, I can also listen to at home. But a full band performing in this very moment is a different experience to me.


Regarding your point about music:

There are subtle and deliberate deviations in timing and elements like vibrato when a human plays the same song on an instrument twice, which is partly why (aside from recording tech) people prefer live or human musicians.

Think about how precise and exacting a computer can be. It can play the same notes in a MIDI editor with exact timing, always playing note B after 18 seconds of playing note A. Human musicians can't always be that precise in timing, but we seem to prefer how human musicians sound with all of the variations they make. We seem to dislike the precise mechanical repetition of music playback on a computer comparatively.

I think the same point generalises into a general dislike on the part of humans of sensory repetition. We want variety. (Compare the first and second grass pictures at [0] and you will probably find that the second which has more "dirt" and variety looks better.) "Semantic satiation" seems to be a specific case of the same tendency.

I'm not saying that's something a computer can't achieve eventually but it's something that will need to be done before machines can replace musicians.

[0] http://gas13.ru/v3/tutorials/sywtbapa_gradient_tool.php


You can modulate midi timinbg with noise. In some programs, there’s literally a Humanize button.


Yes. I tried that with some software-based synthesisers (like the SWAM violin and Reason's Friktion) which are designed for human-playing (humans controlling the VST through a device that emits MIDI CC control messages) but my understanding is that the modulation that skilled human players perform with tends to be better/more desirable than what software modulators can currently achieve.


The real dilemma is with composition/song-writing.

Ability to create live experiences can still be a motivating factor for musicians (aside from the love of learning). Yet, when AI does the song-writing far more effectively, then will the musician ignore this?

It's like Brave New World. Musicians who don't use these AI tools for song-writing will be like a tribe outside modern world. That's a tough future to prepare for. We won't know whether a song was actually the experience and emotions of a person or not.


Even if we assume that people want fully automated music, the process of learning to play educates the musician. Similarly, you'd still need a director/auteur, editors, writers and other roles I have no appreciation or knowledge of to create a film from AI models.

Steam shovels and modern excavators didn't remove our need for shovels or more importantly, the know-how to properly apply these tools. Naturally, most people use a shovel before they operate an excavator.


It's interesting though, the question really becomes, if 10 people used to shovel manually to feed their family. And now it takes 1 person and an excavater, what in good faith do you tell those other 9..."don't worry you can always be a hobby shovelist?"


They can apply their labor wherever it is valued. Perhaps they will become more productive excavator operators. By creating value in a specialized field their income would increase. Technology does not decrease the need for labor. Rather it increases the productivity of the laborer.

Human ingenuity always finds a need for value creation. Greater abundance creates new opportunities.

Take the inverse position. Should we go back to reading by candlelight to increase employment in candle making?

No, electric lighting allowed peopled to become productive during night hours. A market was created for electricity producers, which allowed additional products which consume electricity to be marketed. Technological increases in productivity cascade into all areas of life, increasing our living standards.

A more interesting, if not controversial line of inquiry might start with: If technology is constantly advancing human productivity, why do modern economies consistently experience price inflation?


You miss the important point, which is the productivity gain means the average living standard of society as a whole increases. A chunk of what is now regarded as 'toil' work disappears, and the time freed up is able to be deployed more productively in other areas.

Of course, this change is dislocating for the particular people whose toil disappeared. They need support to retrain to new occupations.

The alternative is to cling to a past where everyone - on average - is poorer, less healthy, and works in more dangerous jobs.


That's awesome, sign me up for retraining. Where do I go and who can I talk to so I can be retrained into a less drudgery filled position?

Clearly if there are ways out of being displaced, please share them


The ‘augmented singer’ is very popular, though. https://en.wikipedia.org/wiki/Auto-Tune: “Auto-Tune has been widely criticized as indicative of an inability to sing on key.”


Live play is what, 1% of all music heard in the world? Computers, radios, iPods and phones all play automated reproductions.


Musicians could automate all the instruments with incredible accuracy since a long time. But they never do that. For some reason, they still want a person behind the piano / guitar / drums.

You've never been to a rave, huh? For that matter, there's a lot of pop artists that use sequencers and dispense with the traditional band on stage.


I can see this being used extensively for short commercials, as the uncanny aspect of a lot of the figures will help to capture people's attention. I don't necessarily believe it will be less expensive than hiring a director and film crew however.


I love these hot takes based on profoundly incredible tech that literally just launched. Acting like 2030 isn't around the corner.


> I love these hot takes based on profoundly incredible tech that literally just launched. Acting like 2030 isn't around the corner.

It seems bizarre to think the gee whiz factor in a new commercial creative product makes critiquing its output out-of-bounds. This isn't a university research team: they're charging money for this. Most people have to determine if something is useful before they pay for it.


Let me guess, hard singularity take-off in 2030? Does the hype cycle not exist for techno-optimists? Just one breathless prediction after another?


Anything less than absolute enrapture is a "hot take"... :)


We’re glad you love them.


> fingers are still weird

Also keep an eye on teeth and high contrast text. Anything small and prone to distortion in low resolution video and images used to train this stuff.


Yeah. I think people nowadays are in a kind of AI-euphoria and they took every advancement in AI for more than what they really are. The realization of their limitations will set in once people have been working long enough on the stuff. The capacity of the newfangled AIs are impressive. But even more impressive are their mimicry capabilities.


Are you joking?

We were not even able to just create random videos by just text promoting a few years back and now this.

The progress is crazy.

Why do you dismiss this?


Not dismissing, but being realistic. I observed all the AI tools, usually amaze most people initially by showing capabilities never seen before. Then people realise their limitations, ie what capabilities are still missing. And they're like: "oh, this is no genie in a bottle capable of satisfying every wish. We'll still have to work to obtain our vision..." So the magic fades away, and the world returns to normal, but now with an additional tool very useful in some situations :)


I'm still amazed.

The progress doesn't slow down right now at all.

This is probably one of the most exciting developments in the world besides the Internet.

And Geminis news regarding the 1 million token window shows were we are going.

This will impact a lot of people faster than a lot of people realize


I agree. Skepticism usually serves people well as a lot of new tech turns out to be just hype. Except when it is not and I think this is one of those few cases.


Not who you're replying to but this is a toy.

AI won't make artistic decisions that wow an audience.

AI won't teach you something about the human condition.

AI will only enable higher quarterly profits from layoffs until GPU costs catch up.

What the fuck is the point of AI automating away jobs when the only people who benefit are the already enormously wealthy? AI won't be providing time to relax for the average worker, it will induce starvation. Anything to prevent will be stopped via lobbying to ensure taxes don't rise.

Seriously, what is the point? What is the point? What the fuck is there to live for when art and humanities is undermined by the MBA class and all you fucking have is 3 gig jobs to prevent starvation?


Problem isn't the tool but with the tools using the tool.

It's not ML fault that we don't have UBI, it's voters' faults.


I believe ai and full automatisation is critical for a Star Trek society.

We are not very good in providing anything reasonable today because capitalism is still way to strong and manual laber still way to necessary.

Nonetheless look at my country Germany: we are a social state. Plenty of people get 'free' money and it works.

The other good thing: there are plenty of people who know what good is (good art etc) but are not able to draw. The can also express themselves. AI as a tool.

If we as society discover that there will be no really new music or art happening I don't know what we will do.

Plenty of people are well entertained with crap anyway.


Sure there are limitations but this is still absurdly impressive.

My benchmark is the following: imagine if someone 5 years ago told you that in 5 years we could do this, you would think they were crazy.


I would not. Five (six, seven?) years ago, we had style transfer with video and everyone was also super euphoric about that. If I compare to those videos, there is clearly progress but it is not like we started from zero 2 years ago.


I don't really know what you mean by "euphoric", this is a term I only know from drugs. Can you define it?


"Blissful/happy", which is why the word euphoria is often abused to be sinister


It means "extremely happy", but it's usually used to refer to a particular moment in time (rather than a general sentiment), and so the word sounds a bit out of place here, to me.


And further down the page the:

"The camera follows behind a white vintage SUV with a black roof": The letters clearly wobble inconsistently.

"A drone camera circles around a beautiful historic church built on a rocky outcropping along the Amalfi Coast": The woman in the white dress in the bottom left suddenly splits into multiple people like she was a single cell microbe multiplying.


Sure, but think what it will be capable of two papers ahead :)


Progress is this field has not been linear, though. So it's quite possible that two papers ahead we are still in the same place.


On the other hand, this is the first convincing use of a “diffusion transformer” [1]. My understanding is that videos and images are tokenized into patches, through a process that compresses the video/images into abstracted concepts in latent space. Those patches (image/video concepts in latent space) can then be used with transformers (because patches are the tokens). The point is that there is plenty of room for optimization following the first demonstration of a new architecture.

Edit: sorry, it’s not the first diffusion transformer. That would be [2]

[1] https://openai.com/research/video-generation-models-as-world...

[2] https://arxiv.org/abs/2212.09748



I think it is misleading. The role of the diffusion network is completely absent from this explanation


Hold on to your papers~


It’s not perfect, for sure. But maybe this isn’t the final pinnacle of the tech?


> I disagree, just look at the legs of the woman in the first video.

The people behind her all walk at the same pace and seem like floating. The moving reflections, on the other hand, are impressive make-believe.


Really makes me think of The Matrix scene with the woman in the red dress. Can't tell if they did this on purpose to freak us all out? Are we all just prompts?


I'm 99% sure this is supposed to invoke cyberpunk but not sure about The Matrix.


If you watch the background, you'll see one guy has hits pants change color. And also, some of the guys are absolute giants compared to people around them.


Yep. If you look at the detail you can find obvious things wrong and these are limited to 60s in length with zero audio so I doubt full motion picture movies are going to be replaced anytime soon. B-roll background video or AI generated backgrounds for a green screen sure.

I would expect any subscription to use this service when it comes out to be very expensive. At some point I have to imagine the GPU/CPU horsepower needed will outweigh the monetary costs that could be recovered. Storage costs too. Its much easier to tinker with generating text or static images in that regard.

Of note: NVDA's quarterly results come out next week.


Same story as the fingers before.

This is weird to me considering how much better this is than the SOTA still images 2 years ago. Even though there's weirdo artefacts in several of their example videos (indeed including migrating fingers), that stuff will be super easy to clean up, just as it is now for stills. And it's not going to stop improving.


Agreed and these are the cherry picked examples of course.


>>>just look at the legs of the woman

Denise Richards hard sharp knees in '97

--

these infant tech are already insanely good... just wait and rahter try to focus on the "what should I be betting on in 5 years from now?

I suggest 'invisibility cloaks' (ghosts in machines?)


> But I think many people will be very uncomfortable with such motion very quickly.

Given the momentum in this space, I think you will have get very uncomfortable super quick about any of the shortcomings of any particular model.


At second 15, of the woman video, the legs switch sides!! Definitely there are some glitches :)


The left and right side of her face are almost... a different person.


When others create text to video systems (eg. Lumiere from Google) they publish the research (eg. https://arxiv.org/pdf/2401.12945.pdf). Open AI is all about commercialization. I don't like their attitude


Google is hardly a good actor here. They just announced Gemini 1.5 along with a "technical report" [1] whose entire description of the model architecture is: "Gemini 1.5 Pro is a sparse mixture-of-expert (MoE) Transformer-based model". Followed by a list of papers that it "builds on", followed by a definition of MoE. I suppose that's more than OpenAI gave in their GPT-4 technical report. But not by much!

[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...


The report and the previous one for 1.0 definitely contain much more information than the GPT-4 whitepaper. And Google regularly publishes technical details on other models, like Lumiere, things that OpenAI stopped doing after their InstructGPT paper.


Maybe because GPT3.5 is closer to what Gemini 1.0 was... GPT4 and Gemini 1.5 are similarly sparse in their "how we did it and what we used" when it comes to papers


Not to be overly cute, but if the cutting edge research you do is maybe changing the world fundamentally, forever, guarding that tech should be really, really, really far up your list of priorities and everyone else should be really happy about your priorities.

And that should probably take precedence over the semantics of your moniker, every single time (even if hn continues to be super sour about it)


I'd much rather this tech be open - better for everyone to have it than a select few.

The more powerful, the more important it is that everyone has access.


Do you feel the same way about nuclear weapons tech?

That "the more powerful, the more important it is that everyone has access"?

Especially considering that the biggest killer app for AI could very well be smart weapons like we've never seen before.


I feel this is a false equivalence.

Nukes aren’t even close to being commodities, cannot be targeted at a class of people (or a single person), and have a minutely small number of users. (Don’t argue semantics with “class of people” when you know what I mean, btw)

On the other hand, tech like this can easily become as common as photoshop, can cause harm to a class of people, and be deployed on a whim by an untrained army of malevolent individuals or groups.


So if someone discovered a weapon of mass destruction (say some kind of supervirus) that could be produced and bought cheaply and could be programmed to only kill a certain class of people, then you'd want the recipe to be freely available?


This poses no direct threat to human life though. (Unlike, say, guns - which are totally fine for everyone in the US!)

The direct threat to society is actually this kind of secrecy.

If ordinary people don't have access to the technology they don't really know what it can do, so they can't develop a good sense of what could now be fake that only a couple of years ago must have been real.

Imagine if image editing technology (Photoshop etc) had been restricted to nation states and large powerful corporations. The general public would be so easy to fool with mere photographs - and of course more openly nefarious groups would have found ways to use it anyway. Instead everybody now knows how easily we can edit an image and if we see a shot of Mr Trump apparently sharing a loving embrace with Mr Putin we can make the correct judgement regarding a probable origin.


The bottleneck for bioterrorism isn't AI telling you how to do something, it's producing the final result. You wanna curtail bioweapons, monitor the BSL labs, biowarfare labs, bioreactors, and organic 3D printers. ChatGPT telling me how to shoot someone isn't gonna help me if I can't get a gun.


This isn't related to my comment. I wasn't asking what if an AI invents a supervirus. I was asking what if someone invents a supervirus. AI isn't involved in this hypothetical in any way.

I was replying to a comment saying that nukes aren't commodities and can't target specific classes of people, and I don't understand why those properties in particular mean access to nukes should be kept secret and controlled.


I understand your perspective regarding the potential risks associated with freely available research, particularly when it comes to illegal weapons and dangerous viruses. However, it's worth considering that by making research available to the world, we enable a collaborative effort in finding solutions and antidotes to such threats. In the case of Covid, the open sharing of information led to the development of vaccines in record time.

It's important to weigh the benefits of diversity and open competition against the risks of bad actors misusing the tools. Ultimately, finding a balance between accessibility and responsible use is key.

What guarantee do we have that OpenAI won't become an evil actor like Skynet?


I'm not advocating for or against secrecy. I'm just not understanding the parent comment I replied to. They said nukes are different than AI because they aren't commodities and can't target specific classes of people, and presumably that's why nukes should be kept secret and AI should be open. Why? That makes no sense to me. If nukes had those qualities, I'd definitely want them kept secret and controlled.


An AI video generator can't kill billions of people, for one. I'd prefer it if access wasn't limited to a single corporation that's accountable to no one and is incentivized to use it for their benefits only.


> accountable to no one

What do you mean? Are you being dramatic or do you actually believe that the US government will/can not absolutely shut OpenAI down, if they feel it was required to guarantee state order?


For the US government to step in, they'd have to do something extremely dangerous (and refuse to share with the government). If we're talking about video generation, the benefits they have are financial, and the lack of accountability is in that they can do things no one else can. I'm not saying they'll be allowed to break the law, there's plenty of space between the two extremes. Though, given how things were going, I can also see OpenAI teaming up with the US government and receiving exclusive privileges to run certain technologies for the sake of "safety". It's what Altman has already been pushing for.


> An AI video generator can't kill billions of people, for one.

Not directly. But I won't be surprised if AI video generators aren't somewhere in the chain of causes of gigadeaths this century.


I think it could. The right sequence of videos sent to the right people could definitely set something catastrophic off.


> The right sequence of videos sent to the right people could definitely set something catastrophic off.

...after amazing public world wide demos that show how real the AI generated videos can be? How long has Hollywood had similar "fictional videos" powers?


> ...after amazing public world wide demos that show how real the AI generated videos can be?

How quickly do you think our gerontocracy will adapt to the new reality?


Flat earth Billy can now make videos with a $20 subscription.


I think that's great. Billy will feed his flat earther friends for a few weeks or months and pretty soon the entire world will wise up and be highly skeptical of any new such videos. The more of this that gets out there, the quicker people will learn. If it's 1 or 2 videos to spin an election... People might not get wise to it.


Given the last 10 years I have no such faith in the common person.


which will only continue to convince people if the technology stays safely locked away in possession of a single corp.

if it were opened to public faking such videos would lose (nearly) all of its power


Make it high-enough fidelity, and it will be used to convince people to kill billions.


Video can convince people to kill each other now because it is assumed to show real things. Show people a Jew killing a Palestinian, and that will rile up the Muslims, or vice versa.

When a significant fraction of video is generated content spat out by a bored teenager on 4chan, then people will stop trusting it, and hence it will no longer have the power to convince people to kill.


You don't need to generate fake videos for that example. State of Isreal have been killing Palestinians en masse for a long time and intensified the effort for the last 4 months. The death toll is 29,000+ and counting. Two thirds are children and women.

Isreal media machinery parading photographs of damaged houses that could only be done by heavy artillery or tank shells blaming on rebels carrying infantry rifles.

But I agree, as if the current tools were not enough to sway people they will have more means to sway public opinion.


Hamas has similarly been shooting rockets into Israel for a long time. Eventually people get tired and stop caring about long-lasting conflicts, just like we don't care about concentration camps in North Korea and China, or various deadly civil wars in Sub-Saharan Africa, some of which have killed way more civilians than all wars in Palestinian history. One can already see support towards Ukraine fading as well, even though there Western countries would have a real geopolitical interest.


> Especially considering that the biggest killer app for AI could very well be smart weapons like we've never seen before.

A homing missile that chases you across continents and shows you disturbing deepfakes of yourself until you lose your mind and ask it to kill you. At that point it switches to encourage mode, rebuilds your ego, and becomes your lifelong friend.


I don't think it's really that hard to make a nuclear weapon, honestly. Just because you have the plans for one, doesn't mean you have the uranium/plutonium to make one. Weapons-grade uranium doesn't fall into your lap.

The ideas of critical mass, prompt fission, and uranium purification, along with the design of the simplest nuclear weapon possible has been out in the public domain for a long time.


Oof, imagine if our safeguard for nuclear weapons was that a private company kept it safe.


While it's probably too idealistic to be possible, I'd rather try and focus on getting people/society/the world to a state where it doesn't matter if everyone has access (i.e. getting to a place where it doesn't matter if everyone has access to nuclear weapons, guns, chemical weapons, etc., because no-one would have the slightest desire to use them).

As things are at the moment, while supression of a technology has benefits, it seems like a risky long-term solution. All it takes is for a single world-altering technology to slip through the cracks, and a bad actor could then forever change the world with it.


On a geopolitical level 'everyone' does have access.


Do you feel the same way about electricity?


As long as destroying things remains at least two magnitudes easier than building things and defending against attacks, this take (as a blanket statement) will continue to be indefensible and irresponsible.


Should nukes be open source?


I humbly refer you to this comment:

https://news.ycombinator.com/item?id=39389262


ML models of this complexity are just as accessible as nuclear weapons. How many nations possess a GPT-4? The only reason nuclear weapons are not more common is because their proliferation is strictly controlled by conventions and covert action.


The basic designs for workable (although inefficient) nuclear weapons have been published in open sources for decades. The hard part is obtaining enough uranium and then refining it.


If you have two pieces of plutonium and put them too close together you have accidentally created a nuclear weapon… so yeah nukes are open source, plutonium breeding isn’t.


I love it when people make this “nuke” argument because it tells you a lot more about them than it does about anything else. There are so many low information people out there, it’s a bit sad the state of education even in developed countries. There’s people trotting around the word “chemical” at things that are scary without understanding what exactly the word means, how it differs from the word mixture or anything like that. I don’t expect most people to understand the difference between a proton and a quark but at least a general understanding of physics and chemistry would save a lot of people from falling into the “world is magic and information is hidden away inside geniuses” mentality.


Should electricity?


What a load…image if everyone else guarded all their discoveries, there’d be no text to video would there?


People defending this need to meditate on the meaning of the phrase "shoulders of giants".


New technology will always be new giants to see from, but open source really is a nice ladder up to the shoulders of giants. So many benefits from sharing the tech


This reminded me of a conversation with a historian. He requested the reconstruction of a monument in France that a game studio had already made.

The studio told him the model was their property, and they wouldn't share it.

Peculiar reasoning, isn't it?


This is meaningless until you've defined "world changing". It's possible that open sourcing AIs will be world-changing in a good way and developing closed source AIs will be world-changing in a bad way.

If I engineered the tech I would be much more fearful of the possibility of malice in the future leadership of the organization I'm under if they continue to keep it closed, than I would be fearful of the whole world getting the capability if they decide to open source.

I feel that, like with Yellow Journalism of the 1920s, much of the misinformation problem with generative AI will only be mitigated during widespread proliferation, wherein people become immune to new tactics and gain a new skepticism of the media. I've always thought it strange when news outlets discuss new deepfakes but refuse to show it, even with a watermark indicating it is fake. Misinformation research shows that people become more skeptical once they learn about the technological measures (e.g. buying karma-farmed Reddit accounts, or in the 1920s, taking advantage of dramatically lower newspaper printing costs to print sensationalism) through which misinformation is manufactured.


The problem is when we start to run out of reliable sources after becoming sceptical of everything.


It will be kind of like most of history where the only trustworthy method of communication is with face to face communication or with a letter or book (perhaps cryptographically) verified from a person you personally know or trust. Sounds good to me


This is a fantastic write up and great parallel to the state of where we’re headed.


How convenient for all the OpenAI employees trying to make millions of dollars by commercializing their technology. Surely this technology won’t be well-understood and easily replicable in a few years as FOSS


It'll, even if they guard their secret sauce. Let's not be naive about this, obfuscation is and always will be a minor nuisance.


>If you have world-changing technology it's better for a megacorp to control it.

You need to watch more dystopian movies.


The wheel should have been a tightly controlled technology?


Ironic, isn't it! OpenAI started out "open," publishing research, and now "ClosedAI" would be a much better name.


TBH they should just rename to ClosedAI and run with it, I and others would appreciate the honesty plus it would be amusing.


However if you are playing for the regulatory capture route (which Sam Altman seems to be angling for) it’s much easier if your name is “OpenAI”.


If you go full regulatory capture, you might as well name it "AI", The AI Company.


You never go "full" regulatory capture.


gottem


Sick burn!


When has OpenAI - for a company named "Open" AI ever released any of their stuff into anything open?


They actually did a few years ago, but that's ancient history in AI terms.

The most recent thing they released was Whisper, which to be fair is the only model with absolutely no safety implications.


From what I remember reading, Open was never supposed to be like open source with the internals freely available, but Open as in available for the public to use, as opposed to a technology only for the company to wield and create content with.


They stopped releasing their stuff openly around the time GPT3 came to be.


Whisper was after GPT3 and that was fully open.


More like ClosedAI, amirite?


OAI requires a real mobile phone number to signup and are therefore an adtech company.


Might be one of the most absurd things said on here. Requiring a phone number for sign up does not automatically mean you are selling ads.


When the time for making money comes, if you don’t think OpenAI will sell every drop of information they have on you, then you are incredibly naive. Why would they leave money on the table when everyone else has been doing it for forever without any adverse effects?


They are currently hiring people with Adtech experience.

The most simple version would be an ad-supported ChatGPT experience. Anyone thinking that an internet consumer company with 100m weekly active users (I‘m citing from their job ad) is not going to sell ads is lacking imagination.


If Google Workspace was selling my or any customers information, at all or "forever", it would not be called Google Workspace, it would be called Google We-died-in-the-most-expensive-lawsuit-of-all-time.


There's a difference. Open AI essentially has 2 products. The chat bot $20 a month thing for Joe shmoe which they admit to training on your prompts, and the API for businesses. Workspace is like the latter. The former is closer to Google search.


Sure, but there is no ambiguity about that, is there? You know that, because they tell you (and, sure, maybe they only tell you, because they have to, by law – but they do and you know)

How do we get from there to "just assume every company in the world will sell your data in wildly and obviously illegal ways", I don't know.


Well..that does seem to be the default. If they don't explicitly say they won't, they probably will. It's a sad world.


We're face to face with AGI and you're worried about ads?? Get your risks in order!!


We're still nowhere near AGI.


The day the AI stops listening to prompts instead of following them is the day I will worry about AGI.


You'd be too late. You're just waiting for someone to imbue a model with agency. We have agency due to evolution. Robots need it programmed into them, and honestly, that is easy to do compared with instilling reasoning. Primitive animals have agency. No animal can reason on the level of GPT. That will get us to HAL2000. If you stick it in a robot, you have the Terminator.


AI doesn’t exist. Neither in practice nor theoretically. Artificial intelligence is an oxymoron. Intelligence is a complex system. Artificial systems are logic systems. You live in a complex universe that you cannot perceive, i.e. we perceive it as noise/randomness only. All you can see are the logical systems expressed at the surface (Mendelbrot Set) of the noise. Everything you see and know is strictly logical, all knowns laws of the universe are derived from those logical systems. Hence, we can only build logical systems. Not complex systems. There is a limit to what we can build here on the surface (Church-Turing). We never have and never will build a complex system.


> Motion-capture works fine because that's real motion

Except in games where they mo-cap at a frame rate less than what it will be rendered at and just interpolate between mo-cap samples, which makes snappy movements turn into smooth movements and motions end up in the uncanny valley.

It's especially noticeable when a character is talking and makes a "P" sound. In a "P", your lips basically "pop" open. But if the motion is smoothed out, it gives the lips the look of making an "mm" sound. The lips of someone saying "post" looks like "most".

At 30 fps, it's unnoticeable. At 144 fps, it's jarring once you see it and can't unsee it.


Out of all the examples, the wooly mammoths one actually feels like CGI the most to me, the other ones are much more believable than this one.


Possibly because there are no videos or even photos of live wooly mammoths, but loads and loads of CG recreations in various documentaries.


I saw the cat in the bed grows an extra limb...


Cats are weird sometimes.


Huh, strong disagree. I've seen realistic CGI motion many times and I don't consider this to feel realistic at all.


I’m a bit thrown off by the fact the mammoths are steaming, is that normal for mammoths ?


Good question :)


You might just be subject to confirmation bias here. Perhaps there were scenes and entities you didn't realize were CGI due to high quality animation, and thus didn't account for them in your assessment.


Regarding CGI, I think it has became so good that you don’t know it’s CGI. Look at the dog in Guardians of the Galaxy 3. There’s a whole series on YouTube called “no cgi is really just invisible cgi” that I recommend watching.

And as with cgi, models like SORA will get better until you can’t tell reality apart. It's not there Yet, but an immense astonishingly breakthrough.


Maybe it's my anthropocentric brain, but the animals move realistically while the people still look quite off.

It's still an unbelievable achievement though. I love the paper seahorse whose tail is made (realistically) using the paper folds.


Serious: Can one just pipe an SRT (subtitle file) and then tell it to compare its version to the mp4 and then be able to command it to zoom, enhance, edit, and basically use it to remould content. I think this sounds great!


It's possible that through sheer volume of training, the neural network essentially has a 3D engine going on, or at least picked up enough of the rules of light and shape and physics to look the same as unreal or unity


It would have to in order to produce the outputs, our brains have crazy physics engines though, F1 drivers can simulate an entire race in their heads.


I wonder if they could theoretically race multiple people at once like chess masters.


I'm not sure I feel the same way about the mammoths - and the billowing snow makes no sense as someone who grew up in a snowy area. If the snow was powder maybe but that's not what's depicted on the ground.


Pixar is computer generated motion, no?


Main Pixar characters are all computer animated by humans. Physics effects like water, hair, clothing, smoke and background crowds use computer physics simulation but there are handles allowing an animator to direct the motion as per the directors wishes.


With extreme amounts of man-hours to do so.


> I've quite simply never seen convincing computer-generated motion before

I’m fairly sure you have seen it many times, it was just so convincing that you didn’t realize it was CGI. It’s a fundamentally biased way to sample it, as you won’t see examples of well executed stuff.


Nah this still has the problem with connecting surfaces that never seems to look right in any CGI. It's actually interesting that it doesn't look right here as well considering they are completely different techniques.


It's been trained on videos exclusively. Then GPT-4 interprets your prompt for it.


Just setup a family password last week...Now it seems every member of the family will have to become their own certificate authority and carry an MFA device.

"Worried About AI Voice Clone Scams? Create a Family Password" - https://www.eff.org/deeplinks/2024/01/worried-about-ai-voice...


Don't think of them as "computer-generated" any more than your phone's heavily processed pictures are "computer-generated", or JWST's false color, IR-to-visible pictures are "computer-generated".

This article makes a convincing argument: https://studio.ribbonfarm.com/p/a-camera-not-an-engine


That is such a gem of an article that looks at AI with a new lens I haven’t encountered before:

- AI sees and doesn’t generate

- It is dual to economics that pretends to describe but actually generates




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: