Hacker News new | past | comments | ask | show | jobs | submit login
The emergence of full-body Gaussian Splat deepfake humans (metaphysic.ai)
103 points by Hard_Space 11 months ago | hide | past | favorite | 60 comments



I'm sure the primary use case of this technology is in film. (e.g. Disney populating background extras)

But I would LOVE to see this transformed into a game engine/game dev IDE. Imagine if anyone with a smartphone could create photorealistic character models in under a minute. Hell, players could even provide themselves as character models ad-hoc without involving any devs.

It would be a renaissance moment for game development speed and accessibility, and make things like 'asset store games' irrelevant.

Imagine an LLM built to understand a base game engine template code, and also able to perform gaussian splat transformation on frames of input video.

It also seems like it could be the missing link for AR/VR crossing the uncanny valley.


> I'm sure the primary use case of this technology is in film. (e.g. Disney populating background extras)

If the recent history of tech is anything to go by, the most popular in-practice use case will be porn. All it will need is a corpus of training data: which it's safe to assume already exists.

After that, deepfaking $WorldLeader$ doing something embarrassing is probably #2 or #3. Just in time for the US elections...


I had a friend who worked in the Treasury Department and got access to government decisions and statistics releases before the general public did. He and his team would often make paper trades to try to guess the effect of these decisions. They were surprised how often they lost money.

I suspect in the upcoming election, deepfaking the candidates doing something out-of-character would have a similar surprising effect.


> I suspect in the upcoming election, deepfaking the candidates doing something out-of-character would have a similar surprising effect.

One frontrunner is about to be convicted and his corporate empire dissolved while the other frontrunner is facing an impeachment inquiry and his son is about to experience the full brunt of the Capitol Hill muckraking process.

Deepfakes aren't even going to register in this election (no pun intended).


Down ballot races are more vulnerable. Especially since we no longer have local journalism to help interpret events.


Political deep faking may be seen during elections, but imagine if someone faked corporate executives acting poorly to subvert/short stock prices. The sky is the limit on abuse.


People will have to learn that nothing can be believed no matter the medium unless there is a verifiable chain of custody to the information.

This is the way it always was back when our primary means of communication was verbal and hand written. Everyone knew anyone could lie. We then had this bizarre period of time when you had certain types of media, especially photographs and video, that were extremely hard to fake convincingly. That period is now over and we have returned to the norm: you can't believe it unless the provenance of the information is known and the chain of custody is trustworthy.

This needs to be drilled into people starting in elementary school.


> This is the way it always was back when our primary means of communication was verbal and hand written. Everyone knew anyone could lie. We then had this bizarre period of time when you had certain types of media, especially photographs and video, that were extremely hard to fake convincingly. That period is now over and we have returned to the norm: you can't believe it unless the provenance of the information is known and the chain of custody is trustworthy.

Which will result in different groups of people living in different realities - if what evidence you accept comes down to whom you trust as reliable, different societies (or sections thereof) will choose to trust different people

Which in itself is maybe a return to the historical norm - the truth of Catholicism was obvious to all in 12th century Rome, while the truth of Islam was equally obvious in 12th century Mecca


> Which will result in different groups of people living in different realities

People have always been living in different realities, since there have always been people who believe false statements, and people who doubt true statements.


True, but I think we are talking about the degree of that issue.

A big enough difference in degree, the same issue starts to look like a different one.


Back in that era (and continuing today) rumours were routinely believed and significant harm was done via rumour.

Convincing people that photos and video are as believable as a rumour is not as helpful as it should be.


Convincing people that photos and video are as believable as a rumour is not as helpful as it should be.

It is very helpful to some people, namely those parties who flourish among the ignorant and poorly-educated. Remember that the real goal of disinformation isn't to make you believe false things; it's to make you believe nothing.


> People will have to learn that nothing can be believed no matter the medium unless there is a verifiable chain of custody to the information.

How are "people" actually supposed to know there's "a verifiable chain of custody to the information"?

I think this case is instructive to the workability of that idea: https://en.wikipedia.org/wiki/Killian_documents_controversy. A major news organization was fooled by a forged document that was later identified as such by amateurs. How is any organization supposed to have "a verifiable chain of custody to the information" for leaked information? That's the avenue for a lot of important information, and without it you'll just have a lot more reporting on press releases.


Something something block chain something something AI something something tokens?


> the chain of custody is trustworthy

This seems like the hard part. How do I know that anything anyone is sharing is trustworthy? It's not like I'm going to trust any news org, government, non-profit, corporation, anonymous individual, etc.


And the other side of this coin now is that people can do truly horrific things that can dismissed as created by AI.


How do we verify/trust chain of custody of, idunno, "Mark Zuckerberg calling for the killing of all lefthanded people"?

What I think will happen is that it'll be like what we have now but worse: some podcaster will make an offhanded remark about Mark Zuckerberg "saying some horrible stuff about lefthanded people" causing frantic TikToks about that it means killing lefthanded people, all based on a supposed "secretly recorded video" that no one can validate due to all of the noise.


Investors will quickly learn to identify deepfakes if it becomes a regular thing. The damage will be everyone else on the ground: https://news.ycombinator.com/item?id=38157593


> If the recent history of tech is anything to go by, the most popular in-practice use case will be porn

Recent history? I’m pretty sure as soon as the camera was invented, someone took a nude picture with it.


And promptly put it up in the town square to get back at someone after a breakup.


Lets hope the primary uses are entertainment and maybe communication. It will soon be possible to create a realistic video of anyone doing anything.


Would it be too optimistic to say that if we can create realistic videos of anyone doing anything, the general public will not believe in videos as evidence anymore anyway? And so it wouldn't matter soon, except it would matter for a different thing, namely we'll lose the ability to use legit videos as evidence of anything.

On the other hand, it has been possible for an extremely long time (millenia) already to create text of anyone doing anything, and in general it seems we can still trust text mostly (and can distinguish somewhat well between false and reputable textual sources), so maybe it's all reputation based, also for video in the future.


I could also see government agencies using this to frame people for crimes they do not commit.


Thanks to insightface anyone can already deep fake any video with a single input image in a few seconds.


Which is why we need laws that make doing so without very explicit consent for every generated work a felony offense with social and penal consequences equivalent to rape.


An abundance of laws is little help against the evil intentions of the human heart.


It would give victims a path to defend themselves. Today, they have nothing.


Deepfake videos aren’t covered by existing libel law?


To the best of my knowledge, in the US, there is no explicit law and libel/defamation laws have not been adequately tested and it is understood that they have very serious limitations, especially for those who previously engaged in sex work.

> But no federal law criminalizes the creation or sharing of non-consensual deepfake porn in the United States. A lawsuit is also unlikely to stand up in civil court.

https://www.wbur.org/endlessthread/2023/06/23/deepfake-porno...

> ... if someone makes a deepfake or otherwise posts a nonconsensual video of someone who does sex work, the defense could argue that because the woman’s reputation already involves sexual activity, her reputation is not being defamed by another video of sexual activity. Such an argument — and thus pursuit under defamation as a whole — would be missing the actual point, which is the violation of consent.

https://www.cyber.forum.yale.edu/blog/2021/7/20/deepfake-por...


That won't help against nation states spreading propaganda or manufacturing false evidence. If you thought Russian Twitter trolls were bad in 2016, wait until you see the TikTok trolls of tomorrow.

This won't be regulated away, it will happen and everyone should be educated and prepared. That's just an unrealistic dream though, enough people will get fooled (I will probably be among them at least some of the time) that it will have consequences. Deepfakes have been used for scams for years already: https://www.youtube.com/watch?v=vqr0oER03SE


Murder isn't regulated away, either. Yet it is still against the law, and society invests in its prevention, and empowering the victims of violence with a civil means to seek retribution and protection.

The situation is very similar to CP, and should be treated in a similar fashion.


Is the creation of many unique and realistic looking human models often the bottleneck in game development? Personally, I'd think rules of play, quality of story, and the immersiveness and variety of settings available in the world make a game. Give me that and it can be text-based or have characters that all look like Pacman for all I care.


It is not.

However the adaptability of character creators is still a hard problem.

Like, I’m not particularly concerned with character aesthetics but I still found myself reject every single god damned hair style for my male OC in Baldurs Gate 3.


Programmers aren't necessarily good at art, and artists aren't necessarily good at programming. Good art is expensive, or difficult, so if you could get it generated (and iterated on) at a very low cost it might make it easier to make a game in your office in your spare time without having to learn a very different, very difficult discipline without much overlap.


To me, it seems people who predicts huge impacts/"democratization" from genAI are at best not in the market for existing content, or in some other times hates it. That projects onto predictions.


This is just my random person take, but personally I'm skeptical about generative photorealism - it's just extra cognitive load that distracts from payload content, unless some audiences needs it as visual aids.


Eh. You’ll have a very tough time doing better than unreal’s metahumans.

The most important thing is doing this in a way that works well with the lighting and of course being performant. Whenever you see something generating avatars, you need to question if it’s viable in game, because often it’s not.

No suit motion capture is already a thing. It’s not great, but it is a thing. Most games don’t actually need that many animations.


All the 3d models generated by these AI solutions are very poor (I think I tried every service and project under the sun). So many imperfections it's merely a base or a tool for a real 3d artist.

That said metahumans don't look much like the subject imho - and scanning with reality scan is a nightmare experience


Perhaps. But I think mapping scans to meta human type models will be the better approach for a long time. If you need to create a specific person with high fidelity then get an artist. If you just need people then metahumans are a great tool.


>Since the advent of Gaussian Splats, in August of this year, the image synthesis research community has clearly embraced this innovative approach to neural recreation of people, things and scenery.

I thought splats were explicitly not neural?


I also find their usage of the word neural very confusing but this is how they explain it.

A Gaussian Splat, instead, is a neural representation unit that is not limited in this way – not only can it be assigned anywhere in XYZ/3D space, but it can as necessary multiply and subdivide into additional splats, as coverage requires.*

And the asterisk leads to the following explaination.

Technically it’s a rasterization unit rather than a neural unit, but in all current Splat implementations that are of any power or interest to the synthesis community, it ends up as a neural unit, passed through standard training processes.

Amended Friday, December 15, 2023 13:33:47 EET to clarify ‘neural unit’

But even with this I am not sure that I understand what they want to say.


> But even with this I am not sure that I understand what they want to say.

Look at the domain name of TFA. They are saying "neural neural PageRank look at meeee"


What does neural mean nowadays anyway? I thought neural nets got their name from the perceptron's resemblance to interconnected neurons, but what about CNN's or transformers? From what I can tell neural just means "plenty of parameters optimized using a learning process", in which case gaussian splats fits the description.

I also think gaussian splats encode the same information as NeRFs, which have neural in the name.


Practically speaking, it's a history-dependent definition. Some set of things are considered "neural", somebody makes an innovation that's based on the existing set of "neural" tools, and now the set of "neural" things is expanded to include the new innovation.

For this post, they have more traditional neural networks embedded in the pipeline (search TFA for "decoder"), and they're thus calling the composite technique a "neural" technique.

As to the CNN/Transformer/... question, those are still very similar to a perceptron. A multi-layer perceptron has a sequence of alternating matmuls and activations. The CCN has a particular choice of sparse matmul and activation (and a couple choices of fast implementations to operate on that sparse matmul). A Transformer is one way to wire up many such MLP predictions (recurrent networks being another) to work with long/uneven inputs. There's more going on, but most of it is fluff to make the thing numerically stable and do something interesting. The core structure is still a dressed-up perceptron.


[...] plenty of parameters optimized using a learning process [...]

Would that not make fitting a degree 1,000,000 polynomial a neural method?

I also think gaussian splats encode the same information as NeRFs, which have neural in the name.

The collection of source images also encodes the same information but I don't think we should call them neural images.

I would argue we should stick with the original meaning, inspired by biological neural networks, a network of artificial neurons. Calling everything neural because of the hype will benefit no one in the long run as the term will just lose any meaning.


Anything that require cuDNN installed in dev environment.


I thought neural meant df/dx optimization applied to a giant function.


>but what about CNN's or transformers?

They also consist (in the abstract, at least) of interconnected neurons with activation functions.


If kids are depressed from looking at the fakeness of people on Instagram, how are they going to feel put up against the possibility of these AI created perfected people?


Reminds me of the plot device from Surrogates https://www.imdb.com/title/tt0986263/?ref_=nm_flmg_t_63_act

Basically everyone was afraid to go outside and show their true nature and would curate from the safety of their home a perfect outward appearance.


Same as every time in history when cultures have gone morally bankrupt and nobody can be trusted, they'll keep isolating.


How do you become someone people can visibly see as a person of trust in a society like this?

I know religion doesn’t do it, that’s for sure. The only thing I can think of is some kind of authenticity.


New innovations - gives you PTSD every day. Computeres and technology used to be fun. It hasn't been for years. We're building ourselves a horrible future. If it wasn't my main income I would have stopped using them and moved into the woods long ago. I've seen this comming. It's no wonder that birth rates are plummeting world wide. The only help I can think of is divine intervention or if someone has a very well hidden plan to work it out. For authenticity to work it needs to be universal and I'm not sure that is the case.


The birth rates are plummeting because people are taking longer to start their lives. Women are in school until 23 and tend to change partners at 26. This means there is a 9 year window of being marriable for children until 35 at which fertility dramatically declines. I'm talking about five trips to an IVF clinic or 75% chance of conception using donor eggs for a just a single child.

I don't want to see this too negatively though. In principle the pool of women who stay fertile longer will outproduce women who lose their fertility too early. It also means there is genetic selection for longer lifespans in general.


As someone who knocked up their wife, who is well over 35, this year, I think your understanding of the drastic drop off in fertility at that age is exaggerated.

All of what you said I do expect to be the expectation past 40 though.


Yes, same story for me. But I got out of it in 1999. I seem to have mentally foreseen the crash coming and it portrayed itself as a mood disorder.

I know many of you will disagree with me on this, but I also think a large problem is the increase in both extremely low frequency, and radio frequency, electromagnetic fields.

No one can tell me that the bloody nose I get whenever I use laptops that emit high levels of low frequency electromagnetic radiation is psychosomatic.


The "Mandela Effect" is one day going to be a completely understandable and common phenomenon


Seems we have finally crossed the uncanny valley…

Hope the benefits will outweigh the dangers!


The benefits will be to Disney shareholders.


Don’t forget porn shareholders!




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: