Hacker News new | past | comments | ask | show | jobs | submit login
Disney Combines CGI with Neural Rendering to Tackle the ‘Uncanny Valley’ (unite.ai)
139 points by Hard_Space on Nov 30, 2021 | hide | past | favorite | 131 comments

In the sphere of music we observe that autotune impacted the voice of new generations of singers, making them align more towards the fake/uncanny valley, instead of autotune becoming better to get out of the uncanny valley. This effect is hypothesized to come from the abundance of autotuned voice exposure new singers impress and train their ears with starting from early age.

This generates a self feeding loop which on a societal level shifts the location of the uncanny valley. Eventually, to the vast majority of people, old not-autotuned music will sound uncanny or off.

Adam Neely made a interesting episode about this: https://youtu.be/yxX2u8iggYI (around 5:00)

I wonder if we will observe something similar in acting and social interaction, where more and more bad or CGI acting gets corrected by uncanny valley methods, which new generations get on mass exposed with and thus change their own habits and what it means for facial expressions to be uncanny.

There is an argument to be made that due to economic reasons, eventually it will not pay off to invest into reaching 100% out of uncanny valley but rather a “good enough” implementation will gain mass traction and thus change perception of what the uncanny valley is. Old movies will be the uncanny valley.

Not exactly what you’re thinking of, but there is a certain way this already happened: 24 FPS movies. Us old timers who grew up watching movies on actual film all think that higher frame rates look like BBC television. Our kids grew up with 60fps games and YouTube and 120fps interpolating TVs don’t mind high frame rates and don’t think they look weird, but they sometimes notice how choppy and low-tech 24 fps film looks. (Even though I’m partial to film, these days I hate it when there are horizontal pans in 24fps film, they are unwatchable messes. This didn’t used to bother me when I was younger.) So it’s not exactly uncanny valley, but old movies are now in the not-as-good zone. I guess maybe color film really did the same thing a generation earlier, and now washed out colors are a solid ‘retro’ indicator.

To be absolutely fair, if you grew up before optical audio (or in my case grew up seeing a lot of pre-optical audio media), 24 fps is odd. Before 24fps , frame rates differed anywhere down from 12 up to the 40s.

Then there's animation. If you watch a lot of anime (on threes - 8fps minimum) or Western animation (on twos so 12fps minimum), 24 again doesn't hold any magical value.

Personally, I prefer higher frame rates for the one reason that so many inexperienced cinematographers will not account for strobing in 24fps camera movements.

Auto interpolated TV is crap though. I hate it. I can always see the artifacts and the issue is that it messes with the motion.

When I worked in animation, you'd see someone reviewing content and forget to disable motion interpolation. All your movement is soft, because the TV doesn't know what an impact looks like. So any foot step will get smoothed out. It totally ruins the character

I usually have to turn off interpolation too, I also hate it, but whenever I do I end up cringing at the horizontal pans. :P Interpolation does wonderfully good things to horizontal pans, but terrible things to action sequences. A difficult tradeoff.

> Then there’s animation.

Random side note, but I never appreciated how well animators leverage variable frame rates, mixing ones, twos, and threes, until I watched an ex-Disney animator hand-animate a sequence in person. The pauses on threes and flashes of fast motion on ones adds a lot of character that I never really noticed when just watching cartoons as a kid. And yes, interpolation smooshes this into a muddy mess.

I wonder if there is a motion interpolation version of something like an edge preserving smoothing filter. In 2D image processing these are used extensively.

Some TVs do a pretty good job of interpolating the actual lines in 2D media.

The issue is almost always down to the motion though.

Imagine a ball bounce. When it hits the ground, the amount of time it stays on the ground and the deformation it makes, is how an animator signals to the viewer what the ball is made off.

With a character who has personality, their walk cycle shows you their personality, and that ground to foot impact is super important.

However with interpolation, even if they preserve the distinct edges well, they'll smooth out that motion. So what was meant to be a hard hit, will suddenly be soft. What was meant to be a big brute, will feel like a balloon.

Yes, I understood that. What I was wondering is if the motion interpolation could be designed to 'understand' the sharp edges in the motion profile, much like the 2D smoothing is able to preserve edges even when blurring.

At this point it would probably be easier to do it with some kind of DNN motion interpolation...

Doesn’t basic MPEG preserve edges automatically, as long as the motion interp doesn’t cross too many frames? I thought the idea with most motion block compression schemes was to essentially have tiled flow vectors that slide the source block a little while preserving the internal structure?

This is an interesting point, because we didn't get this at all with resolution— most of the 20th century film catalogue was high definition from the get-go and happily got re-scanned and re-sold as VHS, then DVD, then Blu-Ray, and now as 4K and 8K digital content.

But framerate is different, and there will probably be a debate like there was with colorizing ("Tell Ted Turner to keep his crayons away from my movie!") around whether it's appropriate or desirable to interpolate legacy media up to modern framerates.

As a young person it blows my mind that people still make content at 24 fps when 24 doesn't every divide 60.

24 is the right balance for a lot of things. Primarily

- cost (everything gets more expensive the higher you go e.g CG, grading etc..)

- lighting gets harder at higher frame rates since you have to be shoveling a lot more light in scenarios where you need to be close to open shutter

- style, because so many people are used to 24

Also everyone in the industry is used to 3:2 pulldowns and the like to work with indivisible ratios.

>Also everyone in the industry is used to 3:2 pulldowns and the like to work with indivisible ratios.

Maybe the industry in used to it, but for me it just makes it more of a choppy mess.

Why 60 specifically? (I’m aware of some good reasons, of course, but always good to examine the assumptions and question the goal. Europe was at 50, we have some US movies at 48, TVs and some games are now going to 120. Should 60 really be the goal?)

One reason 24 is still here is that there’s been a lot of howling and complaining whenever people try to use high frame rates on movies. A lot of people didn’t like the 48fps Hobbit movies because of the high frame rate (and surely not because of the many boring orc battles). It doesn’t help that the “experts” with strong/loud opinions tend to be older people who prefer the old low FPS style.

But, quietly and surely, things are trending in the higher frame rate direction. As the young people become accustomed to it with online video and TVs that interpolate by default, and the old people who prefer 24fps become the relative minority of the market, the standard will swing.

It's the most common refresh rate. The TVs in my house, most of my monitors, and my phone all have a 60Hz refresh rate. I do have a 144Hz which is technically able to accurately play a 24Hz video, but video is usually encoded in 30 or 60 fps before I can even play it.

Another area where the same thing has happened is MP3 audio compression.

I saw an article a while ago about a college media professor, who surveyed students each year, asking them to compare the perceived sound quality of lossless audio as compared to MP3 compression. Each year, the students' preference slanted more towards the MP3 -- every progressively younger set of students had grown up with the compression artifacts sooner in their lives and just heard it as being normal.

While visiting some relatives, their kids were watching an animated TV show on Apple TV. It was skipping and jumping around every few minutes, so the plot was impossible to follow. The kids didn't seem to mind.

Maybe our future is all just glitch. People become oblivious to past things like color correction, smooth motion, aspect ratios, continuity, conversation, acting, plot, what you have...

not sure I agree with this. films are still shown in 24 frames/sec.

the tv interpolation tech is trash and without a doubt makes films look like a cheap set on BBC

Interpolation is bad. Native 60fps is great. It’s kinda hard to describe, but watching 60fps video on YouTube feels like a relief and is very satisfying. It’s immediately obvious when the video is 60fps.

> without a doubt makes films look like a cheap set on BBC

That’s what I see with HFR too, but the kids do not agree. And reality is high frame rate. These facts prove that we somehow learned to prefer 24 fps, even though it’s objectively less good than higher frame rates. There is definitely something strange going on perceptually that you and I have a hard time un-seeing.

For the first time in my life, this year I have been listening to a lot of music with autotune all over the vocals (K-pop).

Now when I listen to songs with non-autotuned vocals, even very small pitch variations really jump out for me in a really wonderful way. What's interesting is (assuming it's music I like) I have that wonderful reaction regardless of whether the singer is using pitch variations in a very precise expressive way, or whether, to the contrary, they just don't have good control of pitch.

But I do really enjoy both. There's a Rick Beato video where he complains about the harmony vocals in Lenny Kravitz' Fly Away being autotuned - but I love that sound, it's great!

The precise bluesy inflections in Labelle's Lady Marmalade, versus the autotuned perfection of Shinee's Lucifer (a song which reminds me of the former), versus the wonderfully out-of-tune vocal performance on Motley Crue's Live Wire --- it's all great.

I would just worry about future singers growing up only listening to and copying and internalising the sound of autotuned vocals. But is that going to be the case, with such easy access to music from all eras?

I don't think anyone thinks autotune sounds normal, especially when used heavily. They just think it sounds good. Sounding like a natural human is not a requirement, or even a goal for most music. Just as guitar players have been using a myriad of effects pedals for decades, some of which create sounds barely recognizable as guitar. Which reminds me of the MXR talk box which combines vocals and guitar, perhaps most famously used by Peter Frampton and Bon Jovi.

This also reminded me of an older musician I knew that lamented all use of amplification in music. He thought that if the musicians were not moving the air themselves with their own bodies or instruments, then they were giving up some level of purity. I can understand that perspective, and to me overusing auto tune has the same problem, but I don't think the uncanny valley applies here.

Emma Watson in Beauty in the Beast doesn't sound less uncanny to me even after years of liking autotune in music.

> Sounding like a natural human is not a requirement, or even a goal for most music.

You should hear what Chinese operatic singing sounds like.

I think it sounds normal provided the singer is already close to the correct pitch. The only difference I hear is that now the singing sounds better.

There are at least two uses for autotune:

1) making a voice sound artificial, like that Cher song[]

2) Making a voice sound correct.

The GP is talking about 1,while you're talking about 2.

[] https://youtu.be/nZXRV4MezEw

I guess I was talking about both. My real point is that humans are much more forgiving about music than we are about human faces, so the whole idea of the uncanny valley doesn't really apply. Though as I say that I think it may apply if autotune were used on the speaking voices in a film or tv show.

I'm not so sure. Heavily can either mean that they had to correct the pitch from a long distance or it could mean that a lot of the song had to be pitch corrected.

Usually “heavy” autotune refers to using a very fast pitch correction speed, which is what creates the pitch snapping effect

I talked with a friend about a phenomenon I see more often in recent family photos which is a mouth-only smile while the with rest of their face the subject appears to be wishing they were somewhere else, or that they would like the camera-person to tell them whether they have broccoli in their teeth. My friend suggested (correctly, I think) that it's because when we were kids our parents weren't taking our pictures literally hundreds of times per day. It's not just that they're having their pictures taken, but that they're being asked to light up the room with a smile every time, and it's tiring.

However, your comment made me wonder whether that uncanny, dead-eyed smile might become more normal or even something to strive for. I don't really think it'll happen but imagine a group of friends being like, "hey, Suzn, why's your smile so over-the-top all the time in our selfies?"

I've discovered a weird but related phenomenon.

At some point, I started consciously trying to true-smile (including the eyes, among other things) for all appropriate photos.

After awhile, I realized that it was having a positive effect on my mood. I don't know if there's some study around that, but it appears that the process of trying to genuinely smile _created_ a positive mood, rather than resulting from one.

This is a known phenomenon: https://www.sclhealth.org/blog/2019/06/the-real-health-benef...

Though, I do remember reading about a collection of studies that failed to reproduce and this was one of the hypotheses. Perhaps the difference is that you genuinely wanted to smile :)

This has also happened with constant photoshopping of models in advertising and ubiquitous CGI in car commercials. I often get deep uncanny valley reactions from advertising which is extremely off putting, but I can only assume both advertisers and the general public are so used to this stuff they treat it as normal.

Ex: https://i.pinimg.com/originals/8c/6f/73/8c6f73cd70afbdea6a2f... I can’t help but see this and think vampires, though that might be intentional.

> constant photoshopping of models

There was one adult site I went on once where the pictures of all the models had their eyes photoshopped to be extremely white. Like, #FFFFFF white. Also, the saturation of the irises was turned way up, not to mention directed straight ahead. It was absolutely creepy.

I look at that, and I look at many of the posts to /r/InstagramReality [0], and I have to wonder...who actually thinks this looks good?

A minor touch-up is one thing, but so many people filter themselves to look like entirely different people, and they all look the same.

Your example seems interesting, it reminds me of Blade Runner that I recently watched for the first time. The style should be used sparingly and for specific purposes, though - like teal and orange color grading in blockbuster movies isn't (is that getting better recently?).

The fake Tom Cruise on TikTok, while not "photo-real" in any sense of the word, seems to overcome any uncanny valley effects through shear force of personality. We don't just observe the face, we react to the spirit behind it. In that case, the effects artist found "Vienna's foremost Tom Cruise impersonator" to provide the mocap. And then painstakingly mapped the face overlay to the emotional cues in musculature. And those seem to be the feature detectors we pick up on, rather than the highest fidelity.

So much of TikTok seems firmly in what might be considered the uncanny valley. Lots of just slightly speed up footage, advanced (but not perfect) tts, and video effects and filters are everywhere. I think we might just be getting better at tuning out the imperfections than anything else.

Something to think about is what would you ten years ago think of that video? Would your brain smooth over the imperfections like you do today, or would it jump out and be nearly unwatchable?

It is fascinating to me how much we are influenced by the things we see and hear.

I can practically detect which women I meet are active users of TikTok because they display certain nuances and tics which are fairly specific to women who upload videos to that platform. It doesn't bother me, but it is interesting from an ethnographic POV.

Behavior specific to a cultural group is... the norm everywhere. Group norms are how you learn what you're supposed to do.

There's a famous result showing that Japanese women pitched their voices higher when speaking (native) Japanese than they did when speaking (acquired) English.

> I wonder if we will observe something similar in acting and social interaction, where more and more bad or CGI acting gets corrected by uncanny valley methods, which new generations get on mass exposed with and thus change their own habits and what it means for facial expressions to be uncanny.

I don’t think this will happen with CGI. For most people, the vast majority of music they consume is pre-recorded. Thus, whatever is common in pre-recorded music becomes their definition of music. However, with regards to faces and expressions, people still interact with real faces and expressions in real life and movies are just a small proportion of the faces and expressions they encounter. Thus real life will still set the baseline standard and won’t be able to be shifted like what happened to music.

> Eventually, to the vast majority of people, old not-autotuned music will sound uncanny or off.

Yeesh, what a horrifying thought.

Fortunately, I really don't think that will happen. We are exposed to real voices all the time, and we each have one of our own. The "inaccuracies" that autotune obliterates carry character and meaning, and I think there's already the beginnings of a push back against the homogenised corporate sound even beyond the part of the population that never liked it in the first place.

It has its place but it's an "effect", not an "improvement", and it can't turn a bad singer into a good one. Maybe some sort of neural hybrid audio thing like the article discusses for graphics could, though...

But in the case of music people to listen to very little live music and a huge amount of recorded music. It is pretty dystopian to imagine a future where we interact with very few real faces and huge amounts of CGI ones.

That's not the future, that's already happening. And you don't even need to get into "Hatsune Miku" territory.

I was listening to the radio the day after a local Madonna concert, and plenty of people called to mention how old she looked. Sure, she's in great shape for someone over 60 (hell, she's in better shape than me), but with her music being auto-tuned, her pictures and videos being photoshopped, and her public appearances being heavily orchestrated, the "public" Madonna looks and sounds a lot different than the "real" one. Most people where expecting someone in their early 40s. And she's far from being the only one.

> I wonder if we will observe something similar in acting and social interaction, where more and more bad or CGI acting gets corrected by uncanny valley methods, which new generations get on mass exposed with and thus change their own habits and what it means for facial expressions to be uncanny.

I like this comment and have had similar conversations with friends lately -- to your point, it looks like this phenomenon is already happening on TikTok where body language is modulating to emulate uncanny valley CGI.

Also Instagram face filters, used to the point that when you meet the person in real life you don't recognize it.

I think it's had the opposite effect - that people listen to old music a lot more. Even young kids know a lot of eighties music.

New music has something uncanny about it and it's tiring to listen to. I think it's related to how it's mixed (for example guitars have a lot of bass) or how the tempo is quantized too.

So people who are actually talented singers will sound weird in the future? Now that's dystopian.

I honestly hear auto-tune pretty quick. It bothers me that other people don't seem to notice it so easily.

Also see: makeup.

At least for hip hop, I'm definitely not seeing this. Maybe some mainstream stuff will heavily use auto tune, but even acts which are mainstream, like Nas have no need for it.

By and large it just depends on what you're in the mood for. Something I love about the internet is how easy it is to find niche music.

> Eventually, to the vast majority of people, old not-autotuned music will sound uncanny or off.

Only if lockdowns continue for like half a generation. Otherwise people will not stop singing in person outside their family group.

A more likely result is that the acceptable range just gets broader. Just as acceptable sexual orientations, etc more inclusive.

Note to people making videos like this. PLEASE pay some attention to audio recording and mouth noise. I had so much trouble paying attention to the subject matter due to the terrible recording of the voiceover. It sounds like a chocolate ad for squids. I'm very sensitive to this as I've done a lot of voice recording work, but this is an extreme example if what not to do. A combination of a simple pop shield, some post-proc eq and muting, and drinking a glass of water before recording would have mostly eliminated the problems.

I know this is off-topic, but this is arguably as important as getting the visuals right.

People often argue that bad video with good audio is much more palatable than good video with bad audio.

I used to work for our state broadcaster. It was well known within the industry that if audio channels went down, or were wrong you'd get more complaints than if video quality dropped.

You can close your eyes but not your ears.

As a species we communicate primarily via sound, typically visual cues are secondary.

Last presentation I recorded for a conference, I just started the voice recorder on my phone and held it near my mouth. The sound quality was very surprisingly good. I could do that because I wasn't also videoing myself, so I didn't have to synchronise them up afterwards. Anything is better than the microphone built into the average laptop.

Bought a $35 usb condenser mic for online meeting. It is good even for recording songs (Tested by a friend of mine). It's better to not be seen clearly than not being understood due to poor microphone (Although I did get a better webcam).

Yeah I started to notice it after a minute then found it super distracting.

Another easy solution is simply to stand further back from the mike and talk louder, putting the gain down.

I found this very distracting as well. "Chocolate ad for squids" is a hilarious description — it totally cracked me up and made my day!

Strange, the more un-uncanny the more uncanny it feels. For example, the way the eyes are oriented seems a bit off when compared to the posture of the muscles around the eyes: the eyes seems to look in one direction while these muscles seem to look to another. Is it just me ?

I once talked to a painter who painted stunning portraits. He said that it took him less than a day to paint the portrait but then days to make the eyes and mouth believable.

I confirmed. Even when doing sketch, the key thing to get right is the eyes.

I agree, just look at the chin of the blue background face at 7:00. They've just dug a deeper valley.

But it works so well for stills that I could imagine it taking over fast fashion catalog shots: the nameless manufacturer/designer back-end provides basic pictures (with a lighting reference ball conveniently in frame), front-end brand/store chain/online shop composes in a cgi face from their brand identity library if they don't feel like keeping the of supplied.

I think that means we're still on the far side of the uncanny valley

With increased use of fillers/botox/surgery in the cosmetics industry, the real world and the uncanny valley will converge somewhere in the middle.

The beginning of article seems to hint at improving on Rogue One’s results, but it seems from the rest of the article and the video they are just trying to automate the inpainting, which gives worse results than Rogue One but a lot easier and less expensive.

Like this is a step backwards into the uncanny valley. The “improvement” is not having to pay artists to manually paint hair, eyes, and teeth.

Yes, I was hoping to see some progress out of the uncanny valley. But this got even deeper in the valley. My 7 year old, was next to me, and she got scared of all these

I wonder if in the future a new face will be created for every character in film or TV. New generations will think it strange that we had characters in different shows looking exactly the same because they were played by the same actor. And they'll never have to kill off a character because the actor quit.

I think it will be the opposite, like in the movie "Congress" - where a popular actress sells the rights to her likeness, voice and behaviour to a movie studio, she gets fully scanned and then they proceed to make movies with "her" in it using CGI. I think people like seeing specific actors, after all that's why celebrities are a thing.

AI would need to learn acting before it replaces human performers, but I think having a unique face for characters played by the same actor would be interesting.

We kind of see that today with voice actors for animated characters. Audiences still recognize the person behind the character by the voice and manner of speaking, but a different “face” doesn’t diminish the experience. If anything, it enhances it.

I feel like both will happen.

I can even easily imagine a world where generated actors are the norm, and there are a handful of celebrities who bring enough of a crowd to license their likeness (and emotional characteristics) to big movies that people flock to see.

There are a lot of long-running TV shows where the same guest actor appears in more than one episode playing different characters. Murder, She Wrote is a good example of this. "Now, where have I seen that actor before? ... Oh, in a previous episode of this same show." Seeing through the illusion often enhances my enjoyment, actually.

That seems to be a feature in mystery shows; Columbo did the same thing. Some actors played three or four different murderers, but also more out of the spotlight, there were recurring bit-players.

Nero Wolfe made a feature of that, using a repertory cast for all the roles except the leads https://en.wikipedia.org/wiki/Nero_Wolfe_(2001_TV_series)

There are also a bunch in Star Trek, and even a few that hardcore fans miss!

Of course, in Star Trek different make-up can make it harder to spot. For example, James Sloyan ( https://memory-alpha.fandom.com/wiki/James_Sloyan ), who also appears in multiple Murder, She Wrote episodes, both as different characters and as the same recurring character!

There's one particularly galling one:

Marc Alaimo has a major role as the primary villain of Deep Space Nine, Gul Dukat, appearing in 20% of the episodes.

He also appears in various capacities as random single-episode characters in The Next Generation. This is mostly fine. But one of those random characters is a Cardassian Gul. (Gul Macet.) This takes place before DS9 starts airing. But it's incredibly jarring if you've already seen DS9.

Agreed. He is very recognizable in tone and mannerisms, so having him be two different Cardassian Guls raises some plot-line questions.

Especially if you watched reruns on TV (for the youngins: they regularly showed episodes out of order, and if you were in to a show you would have to keep track of the timeline and story arcs yourself)

Robert Culp always seems to play the murderer in Columbo.

Him, Jack Cassidy, and Patrick McGoohan.

That would be too bad IMO, even with animated comics like Archer, I really enjoy this kind voice cameos and just recognizing voices from other VO work or even reality - e.g. the characters Malory Archer and Len Trexler.

When compute becomes cheap, they may do that for every viewer of every show.

For example, game of thrones might have it's whole cast replaced with people like locals from your hometown. None of them will be using the same brand of car as you - they'll all use whichever brand wants your custom most.

I did a startup for this specific purpose. I wrote an acquired a global patent on automated actor replacement in filmed media, which included replacing actors, set pieces, backgrounds, audio elements, and pretty much anything that could be modified with personally relevant replacements. I wrote the patent in 2004, and with great personal expense it was global by 2008. My team was the VFX Academy Award winning team behind the film "Babe" (the talking pig), and our system worked. But we launch in 2008, right as the financial crisis hit, nobody believed our demonstrations of inserting their own people into video right before their eyes, but they wanted to pursue pornography. I refused to pursue phonography. Three times I managed to put together angel investment teams only to have them group-wide choose to pursue porn. I was not going to be "that guy" who ends up hated by every female alive, after a public fantasy sex site allowing anyone to Deep-Porn-Fake anyone they want goes public. So, anyway, I spent far too long pursuing the idea, went personally bankrupt, and switched to working in facial recognition. At least there nobody wants to do porn.

Ooh... So it expires in 2024? Thats when all this will start!

Due to filing the preliminary early, the expiration was 2017. Interesting about that date: the day after the patent expired Facebook announced their AR kit/library. https://patents.justia.com/assignee/flixor-inc

...I guess pushing that idea further, they could also replace all the product placement after a few years. For a quality show like The Sopranos, where people are still rewatching it and new generations discovering it that could be a good new source of revenue.

Change all the vintage Fords to similar vintage GMs. Change Coke to Pepsi.

The novelty of this happening the first time would probably get people watching by itself.

I remember seeing promotion for a system that places localized ads in real time in sports broadcast. Isn't that or something like it used nowadays?

Yeah nhl is using the ice to show cg ads the localization I noticed was a wataburgur ad on the ice in MN during a Dallas Stars game, but I was in NC so the localization wasn’t tailored to my location, just to average stars fan location.

Most football on live TV has ads superimposed onto the locations in the stadium where ads would normally appear, so that people in different countries/regions can see different ads.

I noticed this in MLB as well. Having been to the Rogers Centre/SkyDome many times in my life, it's really jarring to see them throw ads in places they've never been.

Add to that the inconsistencies, and it's just a weird experience. One moment the black-cloth-covered-out camera section of the stadium is just that, the camera pans slightly and all of a sudden a large part of the 200 section is an ad for insert advertiser here. Other times you'll see what looks like on-deck circles but their location, size, and content seem to change every time that section of field is on-screen.

Thats just badly done... The good ones use very accurate angle sensors in the camera mountings so the ads are always placed correctly, even when partially offscreen.

The story itself will be made to suit you as well. They already do this with the news, so why not with fantasy universes?

If you're the cynical type who things large orgs are all messed up, you get a story where the protagonist looks like you and gets stuck in the same BS that you got stuck in.

>I wonder if in the future a new face will be created for every character in film or TV. New generations will think it strange that we had characters in different shows looking exactly the same because they were played by the same actor

so, in the future media universe there would be no doppelgangers? Seems unnatural.

Can someone tell me why almost all of the “realistic” 3D rendering is about superficial skin deformation?

Hear me out:

- Train an AI to determine fascial muscle activation from mocap data

- Build your animation pipeline to use muscle activation to drive model deformation

- Take the actor’s mocap acting and have the system convert their muscle movements to the 3d model’s muscle movements

- Tackle the uncanny valley by refining how well we implement the bone/skin/muscle structure of human faces

- Bonus points: Voice. So much of the sound of a person’s voice is about the shaped made by their mouth and throat (and torso). We could have an AI recreate the whole vocal path for a 3D character, resulting in more realistic voice acting.

As someone who's worked on very cutting edge digital humans before, muscle simulation has a very poor cost to benefit ratio for facial animation.

It's almost always better to learn external deformations that emulate the muscle triggers than to simulate it.

The reasons are usually:

- setting up muscles takes a long time and isn't very scalable to many characters

- animators or even actors will do a motion that is needed for the shot that the muscle rig will fail at. Especially for things like high velocity impacts.

- muscles are really hard to art direct. Sometimes you want to end up sculpting a shape into something just slightly different.

- simulations need to run linearly in time and are slow. So for the situations where you need to be able to skip around in time or for realtime, you want an emulation of muscles rather than a simulation

Doing more superficial muscle emulations like https://zivadynamics.com/zrt-face-trainer give a better mix of control and realistic behaviour.

Of course, for bodies it becomes a lot easier to use muscle simulations ( and Ziva's own tetrahedral muscle solver is amazing for that )

I've seen musculoskeleton simulation (although passive and not active) used throughout VFX quite a lot, its basically what Weta Digital has been doing for years. (For example from the Hobbit movie 8 years ago: https://www.youtube.com/watch?v=r45e5Xky35k). Hell, Maya even has its own muscle system (Maya Muscle) that animators can already use out of the box. So the technology to simulate these kinds of things (at adequate realism for animations and movies) is already there.

The problem is that it takes a painstakingly long time to do the modelling for these kinds of stuff. Anatomy is hard to get right, high quality commercial human template models cost you $10000+, and even then you're going to take a lot of time tweaking mesh geometry to make the simulation work. And the musculoskeletal models you're creating are all going to have different body shapes and might not even be human, so you need to create a lot of things from scratch (there was a recent paper on a musculoskeleton design tool for precisely this: https://www.dgp.toronto.edu/projects/modeling-musculoskeleta...) So you need a lot of work to do even just passive simulation (where the articulated skeleton just kinematically mimics the animation and the muscle effects are secondary)

Though the bullet points you've listed are definitely all good research topics in itself. DeepMimic-style active musculoskeleton simulations (where models physically mimic given animation data using only muscle activations) are currently possible for line-segment muscles (the paper: https://mrl.snu.ac.kr/research/ProjectScalable/Page.htm) It uses Deep Reinforcement Learning with lots of training and fine-tuning though, it's not plug-and-play in its current state. Maybe such a similar thing can be also done with volumetric muscles too, but that's definitely going to be a hard paper to write since FEM simulation is incredibly costly and finicky to run stably, and you need a lot of simulation frames to use DRL.

But I think the fundamental issue is: does this all have to do with actually making a good film/animation, outside of the realm of biomechanics? I really don't think so. The latest remastered Disney films all lack the fluidity of 2D animation of the past; the remade Lion King has one of the best realistically simulated animal characters but is utterly void of any artistic merit. And these graphical improvements don't seem to make the artist's life easier and is making animation even more labor-intensive. Nowadays it feels like artists in those big industries have less room to experiment because of all these "realistic" technical requirements. From this viewpoint, it makes sense why Disney/Pixar has still focused more on superficial skin deformation instead of a more realistic biomechanical approach: when making cartoonish 3D animations you need a lot of squash-and-stretch, and that is hard to do with rigid muscle systems. Better to model your character as a blob of mass, it's easier to animate, keyframe, and employ all kinds of effects to express artistic intent.

As a VFX veteran and 3D graphics production pipeline developer, this is 100% true. Muscle-skeleton simulation systems are very present, and used throughout the VFX industry. When some fantasy character with 4 arms or a fantasy creature is needed, the skeleton and desired range of motions are designed first, the muscle system is used to finish the character's form, and a large amount of secondary physics simulation is added to create a realistic motion performance.

Tried to check out the website linked in your profile and it has some very strange redirect behavior that gets flagged by my anti-fishing browser plugins.

My 3D Avatar Store site closed in 2015. The URL is who knows now. I'll put a notice on my profile I no longer control that link's destination. (The change has been made.)

This is such a wealth and a great relief. Thank you. I agree with you that realism and “realism” are secondary to all the other parts that make good film/animation, this particular pain point for me is more about when I’ve been ripped out of immersion during an otherwise great movie.

From the outside it has definitely seemed like either the majority of the industry was unaware of the correlation between muscle and skin, or they just didn’t think that doing it accurately was a priority and that has been frustrating to reason through from the outside.

It’s too bad that the tech is prohibitive right now, but the system Weta is using removes all my concerns and I can see now that it’s just a matter of time.

Possibly even more now, I still think that the future is the kind of “AI assisted kinematic translation” that I talked about above. If we were to fully “capture” the ~600 muscles in the body of a performer, along with fascial layers (to understanding the performer’s bodily restrictions), then feed it through a pipeline that can human to human or human to non-human, then do simulations on that character so that it has it’s own restrictions and “preferences” (like steadying with gusts of air from wings instead of from waving arms), I think we’d have less workload on he 3D teams, more flexibility in character design, AND a better looking result.

I think the skin is easier at this point since there's so much data on it compared to muscles. I think your solution might be better but I can see why they are focused on skin.

Personally, I think they are a long way from overcoming the uncanny valley in rendering people.

A lot of the examples look creepy. In my opinion, it is the evil/serious/focused eyes, which do not correspond to the big smile. I believe this tension between contrasting expressions seems dishonest.

I assume the end goal of this is rendering photo-realistic movies of humans without filming actors. Neural voice generation is also quite good at this point. Quite interesting where this could lead to.

It's probably a good hedge for Disney against the power/presence of individual stars affecting their long running franchises.

It's a double whammy in that they can make more product simultaniously and they get full lifetime control over presentation.

It's easy to contract someone to keep their mouth shut and behave in a certain way for a few years but over 50-60 years it's much harder.

i.e. The presence of someone with strong political opinions like Harison Ford in Star Wars affecting long term sales in certain markets.

If you're an executive more interested in money than politics it must be very tempting.

Even having CGI replacement as a threat to keep actors in line is probably useful.

God that's despressing. :(

In 2030 I'll type in "movie about XYZ", GPT-7 comes up with the script, it crunches in the Disney+ cloud for a few hours, and later I sign on to watch the full generated movie?

probably not going to be mainstream for feature length movies, but could be a good tool for prototyping movie ideas and let target groups watch the demo reel very, very early on in the concept phase.

Why use a target group? when you can run a regression model on what hits the nerve for people, using data that's being collected now. :) Then it'll be ML models creating movies for other models to be validated and et voila, you get a super hit movie.

Why even bother releasing it at that point? The humans won't appreciate the majesty of what was produced. I say send them off to camps.

I like the way you think ;) But 'taste' will be really difficult to model for a long-long time.

For actors that seem to play the same character in every movie, it’d be kind of funny to create a stock footage of actors, and just use that to make movies. Tom Cruise running! The Rock punching someone through a wall! And so on.

I think this tech is still only for faces mostly. Aren't any cgi'd actor have someone else playing the role and their face edited?

I think static image canny is getting good. "This(x)doesn't exist.com" proves a lot of realism is down the rabbit hole. Those faces are believable.

But I think moving image uncanny Valley is alive and well. Probably I'm setting myself up to be proved wrong.

The examples have a distinct "police mugshot" look to them.

Lol so true

Original video link in the article doesn't seem to be working.

This looks like a reupload of the same video, so far as I can guess https://www.youtube.com/watch?v=TwpLqTmvqVk

Thanks! What still triggers it the uncanny aspect for me is that these faces are all way too symmetrical. Even the most attractive actors have more left-right variation. There's no doubt the light diffusion on the skin is looking way better in these though.

This seems like only the first step of many to make something usable in a film... Like what about handling side views of faces? How do you remove the background and blend it into the actual scene? What when the character walks through a shadow and if falls unevenly over part of the face? How about characters interacting (kissing etc)? What about actions like 'putting on lipstick'?

All this work and probably going to bastardized by Big Facial Recognition. I would argue we don't need better facial recognition, it has no legit use cases. We are one freak deepfake incident away from this idea being mainstream.

they're still a long way off making it look realistic imho

Video at the bottom of the page or here https://www.youtube.com/watch?v=k-RKSGbWLng

animation starts around 6:11

To be fair they've been using StyleGAN2 and could probably get much better animations with alias free GAN (aka StyleGAN3). Watch the videos there : https://nvlabs.github.io/stylegan3/

Looks like the video was taken down -- weird.

Something feels really uncanny with them. Especially the nose.

Those faces with dead eyes look like paintings when they start to talk on movies. They look real statically in comparison to a low quality photos.

It will be interesting to see if the better results they are getting is from being able to generate their own superior training data (ground truth)

Why do some of the meshes seem to have extra skin outside of the face, to the sides of the jaw?

At the 3:37 mark in the video on their channel, the last face for the older man looks creepy.


He is the third person from the top. Are we sure that is the expression he made?

That video has now been removed, but it looks like they replaced it with this: https://youtu.be/TwpLqTmvqVk

That was quick. I was just watching it when I wrote the comment.

How far away are we from rendering realistic humans in real-time?

Slightly off topic but there is an interesting argument to be made against the existence of 'Uncanny Valley' – https://youtu.be/LKJBND_IRdI.

too bad the video doesn't work. is it just me, or taken down?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact