Hacker News new | past | comments | ask | show | jobs | submit login
Conversation with Zuckerberg, this time we talked as photorealistic avatars (twitter.com/lexfridman)
222 points by sergiotapia 6 months ago | hide | past | favorite | 191 comments

How did they go from second-life knockoffs to this?

This is actually impressive, and while not perfect, is close enough that they'll probably reach a version that is viable very soon.

The second-life knockoffs were only necessary because of the trade-off they made to target low-price HMD market and on-device GPU+CPU processing rather than PC VR or processing puck design. This photogrammetry based avatar stuff has already been working for years. See Meta's previous "Codec Avatar" in 2021 (https://www.youtube.com/watch?v=bS4Gf0PWmZs) or Unreal's MetaHuman (https://www.roadtovr.com/epic-metahuman-mesh-import-scan-rea...).

what's the score with Epic and Apple Vision btw? I saw Lumen and Nanite now work in UE 5.3 on M2 silicon but not sure if it will be possible to develop apps for Apple Vision with Unreal, they just mention Unity?

edit: looks like Epic is working on support for VisionOS in UE5 https://github.com/EpicGames/UnrealEngine/tree/ue5-main/Engi...

> How did they go from second-life knockoffs to this?

Typically you do one MVP version and as soon as you've knocked that out, start working on a proper version, integrating feedback from the MVP as it comes in. I'm guessing they always wanted to have photorealism, but realized it'd take a while to get in place so they launched a quick version then got back to work.

> it'd take a while to get in place so they launched a quick version then got back to work.

This is not a phone app. Physical demos that propose something that a user has no point of comparison with (new paradigm) absolutely need to be on either end of the maturity scale.

- Either very raw barebone MPVs that should be only for lab/research aware people who won't cling to details and are able to see the 'principle' of what it could be.

- Or absolutely imaculate picture perfect marketing demos that you can show to non-technical people

Anything in between is a recipe for getting shot down, getting no buy-in, and receiving no insightful feedback.

> Anything in between is a recipe for getting shot down, getting no buy-in, and receiving no insightful feedback.

I don’t agree. I dislike Fb to the extent that I already deleted my account quite a while ago. And I thought the Meta attempt at a meta verse with cartoon characters looked ridiculous.

But the video in the OP link, with the photo real avatars. This is a whole other level. It’s super cool.

The fact that they made a ridiculous cartoon Metaverse first does not detract from the coolness of this photorealistic thing.

Still, I’m not gonna use any Metaverse that Meta will make. Photoreal or otherwise. But that is because I dislike Meta as a company.

The point is that what they have made here is some top notch stuff.

I look forward to seeing Apple compete with this. I hope that with an iPhone and an Apple VR headset I can get to experience something similar in a few years.

> The fact that they made a ridiculous cartoon Metaverse first does not detract from the coolness of this photorealistic thing.

I agree with you on the coolness of this demo. I disagree that the cartoon demo does not detract.

I suspect a large majority of users and advertisers assumed Zuckerberg went off-piste completely. This interview with Lex is much more aligned with what should have been their first public demo of a Metaverse 'vision'.

A negligible amount of people have an attention span and memory as strong as you suggest is commonplace, so I think your point is moot.

Have you never heard of the importance of first impressions?

> This is not a phone app.

Really? Definitely looks like something a simple phone should be able to process... Are they not wearing phones on their heads in the demo?!

This "Publish > Feedback > Iterate > Repeat" loop doesn't just apply to smartphone apps, you can basically do this with anything in life, from hardware to mixology.

I guess you could say the same thing about LLMs, since the initial models were terrible and just basically demos, they shouldn't actually have published those at all?

> This "Publish > Feedback > Iterate > Repeat" loop doesn't just apply to smartphone apps, you can basically do this with anything in life, from hardware to mixology

I again disagree.

"Publish > Feedback > Iterate > Repeat" loop works when users can build on it from a previous reality that they can compare against.

Physical demos that propose something that a user has no point of comparison with (new paradigm) you can either show a sketch/concept of what you want to do (to a tecnhical audience). Or build a semi-realistic demo to a non technical audience.

Anything in between will probably be met with confusion or people focusing on details that are not the concept that you are showcasing.

I'm not sure how much facebook paid Fridman for this podcast but whatever it was, whether it was 0 or 10 million, it's probably the best value for money PR they've done in the last decade...or ever?

This is how much they paid: "Do you want to be the first one to interview Marc in the Metaverse or do we pick someone else?"

Why would they pay Lex for this? It's basically his full time job to do interviews for his podcast.

That demo was going to happen one day or another as the tech progresses, and it'll only get better from there.

The implication in my comment is that they might indeed have not paid him anything. And Fridman on principle may not have accepted payment even if offered. But who knows what type of deals go down behind the curtain in the podcasting world, it’s not unheard of to come across interviews that look like paid promotion where the podcaster doesn’t explicitly state it as such.

But my point was more really that even if they paid him 10 million, the PR would have been worth it.

  >That demo was going to happen one day or another as the tech progresses
Yes, all amazing technology is going to happen eventually — no one really cares about that. They care about seeing it happen in the present, right in front of their eyes. The metaverse was mocked profusely in 2023, and we have gone through multiple cycles of “This is going to be the year of VR” in past decades. Tech stocks have been flat the last couple of months and Meta is a public company. No one on the outside would have batted an eye if they quietly shut this whole thing down after being so relentlessly mocked for the past year. So saying Zuckerberg doing this demo was inevitable after the fact is a bit flippant I think.

And if facebook did a demo with a bunch of celebrities and put it on their website I’m not sure it would have the same effect as this long interview with Fridman. It wouldn’t have the same effect on me that’s for sure. And I’m pretty sure they would have paid those celebrities.

Anybody who has tried VR knows that the critics have been way off and it's not about if, but about when someone is convinced that VR is amazing

I think it's naive to even talk about payment, that's not how the podcast world works. The way it works is you have an assortment of bros who have no background in journalism and no second thoughts about ethics and their method for getting on high value guests is just to be incredibly soft interviewers and incredibly sycophantic to their guests. The result is rich tech CEOs get to go around getting licked up and down by these podcasters without ever having to worry about facing real scrutiny. And you don't even need a quid pro quo, Lex knows he won't book guests if he starts grilling them like a real journalist, and Zuck knows he's not going to accidentally endorse Nazi's if he sticks to safe interviews like Lex. And there really is no limit, Lex will literally try to serve up soft ball questions to Nazis given half a chance.

Of course anyone in the real world realizes how little credibility these "interviews" have.

Good Lord this is cynical. Perhaps it's just a different medium? Not everything has to be a hardcore interrogation.

There's a time and a place for that.

They didn't, people just haven't been paying attention. They've been working on these avatars for almost a decade in parallel waiting for the hardware to catch up. The cartoon avatars are a function of hardware limitations.

> is close enough that they'll probably reach a version that is viable very soon.

Haven't we seen this type of thinking to be proven a fallacy enough in the last decade to at least be skeptical of solving that last 1% until we actually see it?

It is definitely not perfect. It doesn't seen to track distortions of the skin on the face, such as around the mouth on the cheeks. It tracks eye blinks, but does it track pupil dilation? Doesn't seem like it could track forehead creases, though possibly. Do the avatar's eyebrows track the real eyebrows? No way to tell, but an important part of expression. I rarely see the eyes moving, maybe an artifact of being in a dark space with only one thing to look at? Pursing of the lips is not tracked. Tongue? I don't see shoulders moving. It's an impressive demo, yet as a repeated life experience it probably would start to feel artificial and uncanny. The video does discuss some of this. I get the impression some of it is prediction and some of it is tracking.

I doubt casual users care about these.

Casual users probably care the most, though perhaps not consciously. But all of these things are critical for understanding body language and facial expressions, and without those this is just a nice tech demo.

All things that will come with time. What makes for a viable product is if the mind is 'tricked' enough into believing it's seeing an actual person instead of a 3D replica.

That’s the thing. This whole metaverse push has been in the works for a long time. All that ad money being funneled into R&D.

It does not mean they will make a killer product. But behind closed doors they have some amazing tech.

The background is either a white or black void, so they removed the entire "metaverse" from it, which I'm sure fixes most performance issues.

Rendering two fairly realistic models on the screen at once isn't a big challenge these days, if they're the only thing being rendered.

> The background is either a white or black void, so they removed the entire "metaverse" from it, which I'm sure fixes most performance issues.

I doubt it's a performance issue: the 3D model of a rectilinear room has fewer vertices than used to represent a single earlobe. It most likely is just an unimplemented feature; they also could have "cheated" by chroma-keying a 2D-texture background or rotoscoped the avatars onto a another 3D room scene, but what's the point when its just talking heads?

I think a large part of the black background choice was to hide the missing bodies; any light past chest level is going to push this into uncanny valley pretty fast because the modeling ends at the chest like talking Greek busts.

Totally agree that it’s a stylistic and presentational decision and not a computational limitation.

Surely people have higher expectations for the environment their avatar is going to be in than a literal box?

Games often only have the highest detail models present in cutscenes, where the rest of the game can be limited, and the camera can be controlled to limit how many characters are visible.

> Surely people have higher expectations for the environment their avatar is going to be in than a literal box?

Yes - they do, but you don't have to get fancy with the environment for 2 static avatars locked in place. What are the odds that rendering a room was dropped because it is allegedly GPU-intensive vs it being a feature that is not fully-baked for a public debut yet?

Nvidia had built Maxine back in Oct of 2020. Arun Mallya also published One-shot Talking Head Synthesis in 2021 which does the same thing except removing the background from the rendering

Slightly of off topic, but how did Fridman get to the point that he could pull big tech C-execs and celebrities onto his podcast? He has such a monotone voice (no shade meant, I am the same way, no matter how hard I try otherwise) and - this might just be personal taste - but I feel like this unedited long-form interview format is the lowest bar. I would have thought that an excellent interviewer would generally be considered one who is efficient at extracting interesting data (or can at least edit it to appear so) and can engage with & challenge the interviewee; long-form seems like the opposite of this.

> I would have thought that an excellent interviewer would generally be considered one who is efficient at extracting interesting data (or can at least edit it to appear so) and can engage with & challenge the interviewee ...

I agree with the earlier part but disagree with this part of the comment. If the interviewee is someone particularly interesting, I learn a lot more about them when the interviewer just throws softball questions to help the interviewee more or less free-associate. Most of the time when an interviewer tries to "challenge" or "ask hard questions" it comes across as cringeworthy and a waste of time that could have been better spent letting the interviewee speak their mind.

I enjoy Lex's podcast, but he's been extremely well connected from the start. His 6, 7, and 8th episodes were Guido van Rossum, Jeff Atwood, and Eric Schmidt respectively.

He's done a great job of marketing himself as a non judgemental platform for celebs. He's a super positive guy and basically lets his interviewees guide the conversation, and let's the viewer make up their own mind.

I actually like his interviews, because he typically speaks less than his interviewees and doesn't guide the viewer to a certain opinion. Even his interview with Kanye West was like this - it was pretty insightful into Kanye's state of mind and didn't need any commentary.

Lol! I guess I'm biased coming from a slightly-more-engineer perspective than Lex, but my opinion is opposite yours.

I find he drones on way too much, and way too many first-person pronouns. "I-me-me-me-I-I-I" - it's pretty narcissistic when the guest is politely sitting there across.

Surprised you also brought up the Kanye interview, that's a great example of Lex getting reprimanded by the guest. Lex couldn't let go at a particular moment and kept injecting his opinion that Kanye had to slap down.

He kisses ass and helps replenish/bolster their image. See the Elon interviews where Elon clearly has no clue what he’s talking about.

He often platforms assholes and makes them look good.

No real journalism involved, just talks about peace love and prosperity to some of the shadiest, shittiest businessman and politicians of our time.

Be predictable, throw softballs and never challenge anything. Easy as that.

Really? you don't think there are many people like that? how successful have they become?

Larry King was incredibly famous and threw nothing but easy questions.

Checkout Dwarkesh podcast, he has great questions and does comparatively better research.

He’s a combination of good vibes (spread love), scientist, and long-form interview where you let the interesting person do most of the talking. It works with a lot of people including myself.

I’m guessing if you’re a famous scientist you probably don’t care/mind being interviewed by most poscasters, but Lex is the next-level thing to do

I agree, this Lex guy is such a dud.

Here's a neat article from Nvidia that goes more into the bandwidth savings and some side-by-sides of ML + keypoint vs. compression[1]

Six years ago I went to an arcade called the rec room in Toronto where they have a ghostbusters AR game I got to experience with friends[2]. To this day I still have vivide memories of the experience. When it was all done, we all sat around for 15 minutes in complete silence, just recovering from the experience. I understand the skepticism about the metaverse, but after that experience there's no doubt in my mind it's the future -- you really need to experience it to get a sense of what a more polished version of this feels like.

[1] https://blogs.nvidia.com/blog/2020/10/05/gan-video-conferenc... [2] https://www.youtube.com/watch?v=ar9YwEv2ACk

My wife and I did the VR Star Wars game/experience at Downtown Disney about five years ago, and had a similar reaction. I've played a lot of video games, so I was just really stunned at how they integrated environmental stimuli, e.g. blasting hot air into your face when you stepped into a room filled with molten lava. My wife hasn't played a lot of games, and when Vader showed up she was genuinely terrified -- she's refused to do anything in VR since because it felt too real and too scary.

My mind blowing moment was playing Catan in VR with complete strangers. I wrote about it here: https://p1x3l.com/story/239/social-virtual-reality-and-the-o...

IMO the point is not if you will ever think VR is promising or not, it’s about when you will have your first mind blowing moment playing VR

That is a very impressive demo. Does anyone know if the details of the codec, and the measurements used by the headset to generate the facial stream, have been publicly released yet?

The interview gets quite interesting nearer the end where Zuckerberg talks about his vision for the future, like training AI-backed avatars to simulate real people. Which apparently is not far off from reality - they joke about Fridman doing a podcast where he interviews an AI version of himself in the near future. Reminds me a bit of that Black Mirror episode where Miley Cyrus' character has AI versions of herself contained, or trapped, within consumer electronics.

It also occurs to me that this could be a useful technology for people with social anxiety who have to endure uncontrollable blushing when conversing in real life, or people with awkward facial tics. These could be filtered out or not even recorded before being encoded for transmission, while still giving a realistic appearance, making for a more comfortable presentation of the real-virtual self.

> Zuckerberg talks about his vision for the future, like training AI-backed avatars to simulate real people

This vision of the future sounds like a nightmare to me.

Do you get scared by video games too?

An entire fully "rigged" version of your face is generated from thousands of high resolution photos using a variety of facial expressions, distilled down into a model. Data from eye tracking and the cameras on the HMD are used to estimate the position of your face, lips, eyes etc, which is then fed into the pregenerated rigged model to be rendered.

cameras on the Quest enough to capture all the facial/eye positions to drive the rigged facial model? impressive.

They should open up the Codec avatar creation - lots of people have the hardware and the time/expertise to create them

You might think that, until you see the insane rig that's used to capture these scans: https://www.uploadvr.com/meta-codec-avatars-iphone-scan/

Each scan captures dozens of terabytes of data and lasts for hours, at least for the current high resolution avatars

that's a full Debevec style light stage but you can do this stuff with a couple of dSLRs and a home lighting setup with polarised light - kind of standard photogrammetry/hq texture/normal map generation techniques.

You can see from the normal map in the video it's pretty detailed but at least for a single face capture you can do this at home. I'm not sure what secret sauce they have for capturing multiple facial expressions and some ML magic how to morph/animate between those.

Btw you can get good results with things like https://www.unrealengine.com/en-US/metahuman too

I don't see why an iPhone 15 Pro wouldn't be able to capture these scans, especially with the new "spatial video" feature, which takes a "3d" video using multiple lenses.

You'll get decent results but it won't be as good as with dSLRs and polarised studio light - if you want super detailed textures, be able to relight them etc

I wouldn't be surprised if it's some form of (a variant or advancement) of FACS which we've used in animation eons ago https://en.wikipedia.org/wiki/Facial_Action_Coding_System

Basically for each of the identified poses you have a key and you grab a pose of it (photogrammetry here) and then you would capture performance and identify somehow which combination of keys and weights would be translated to you model. Sounds easy but it ain't.

Mark talks about a future in which you're wearing glasses all the time to seamlessly integrate meatspace and cyberspace. I dislike the idea of having cameras pointed at everything all the time, and the idea of having such tight integration between digital and physical makes me uneasy. It feels like a subtle push to help enable NFT-style content in the longterm. Maybe this is a sign that I'm growing old, and it'll be completely natural for the younger generation.

The technological showcase is really cool though. Right now people are paying dozens of thousands of dollars for fully rigged vtuber avatars and 3D virtual chat models, but in a few years we'll probably have AI tooling that allows you to do it yourself.

The discussion of non-human avatars made me wonder what kind of fantasy avatar Mark and Lex would use. If profile pictures are any kind of indication, I suspect a large number of tech users will opt to be cute anime girls. The days of catgirl-ification grow ever closer.

They discuss an idea of having celebrities and famous people train AI models so fans can interact with them. That seems so dangerous, arguably pushing parasocial relationships to a completely new level. It feels like it fundamentally hacks the human brain. Are we going to reach a point where most interactions are mediated through various layers of AI models? Maybe I'm being too much of a pessimist...

This looks amazing and the wow factor is probably due to the fact that it came out of the blue -- not a day went by in 2023 without the Metaverse being a favorite punching bag for tech writers.

Now someone do a PGP version of the Codec Avatar, where facebook included can't access the raw data, but streaming is still possible. Otherwise we get the Meta version of Worldcoin and merrily continue on the "we're completely fucked" branch of this multiverse.

I could honestly see this being the future of remote working, you still want face-to-face time, here you have it. Want to sit next to people in an "office" you can do it all virtually now. Of course you will need to add in the ability to draw on a whiteboard that's also rendered virtually, and a water cooler in the corner.

If my manager/boss told me I had to use some shitty headset from the cancerous entity that is Meta (ignoring that this also means that Meta will now have full body scans of people, as if that's not the worst idea on earth) to be in VR with my colleagues for my useless morning standups I'd quit on the spot and go do something more fulfilling with my life, for example chewing on gravel and sand.

How many “cancerous” entities do you use every day willingly or not willingly?

Not willingly? Too many, I unfortunately have to work with Meta's horrifically terrible APIs and every time an issue crops up related to their APIs (nearly daily) I say a silent prayer for that company to disintegrate into nothingness.

Willingly, basically none, or as few as I can realistically manage. I don't have a smartphone other than a burner one with GrapheneOS for my bank app, run linux on everything else and my work MacBook sits idly in the corner somewhere with 0 battery in it despite protesting from my manager.

I'm not just talking about your computer

Well, except the fact that you lose out on spontaneous meeting points like someone heading to the kitchen while you're heading somewhere else and suddenly one of you remembers this one thing and a conversation happens in the hallway.

I’ve heard so much about magical hallway meetings since work from home became a thing over the past few years but I cannot once remember having one.

> but I cannot once remember having one.

Maybe it depends on the person(s) involved? ;)

Bell Labs is a famous example where it seems to have played a role (together with the people working there of course, and other variables).

> But just as important was the culture of collaboration that the company fostered. The leaders of Bell Labs understood that physical proximity could spark innovation, and they designed its facilities to bring experts together in both deliberate and unexpected ways.

> At Bell Labs’ headquarters complex in suburban Murray Hill, New Jersey, all of the laboratory spaces connected to a single, vast corridor, longer than two football fields. Great minds were bound to cross paths there, leading inevitably to spontaneous and meaningful interactions. As author Jon Gertner writes in The Idea Factory: Bell Labs and the Great Age of American Innovation, “a physicist on his way to lunch in the cafeteria was like a magnet rolling past iron filings.” Throughout the labs, employees were instructed to work with their doors open, the better to promote the free flow of ideas.


> and other variables

Easy to gloss over and ignore when they don't support a decision that's already been made.

you can have in VR hallways now

Hallway conversations make sense when everyone works in the same building. As soon as you have two buildings, it just ends up excluding people.

RTO makes sense if everyone in the company is working in the same building and sharing spaces for the same reason.

Nope. Mark your coffee breaks in software like you do in Slack, then your avatar can just be standing in the coffee room or company cafe and people will be jogged there.

It's exactly the same. You can have the avatars doing anything you want, there is no difference.

Want them to do a pass by every desk in the office when they go to the kitchen, or bathroom? Possible. Want to turn that option off to get down to work? Possible.

It's cool.

Is this a situation that you find yourself in frequently (with serious work sparked by chance conversations)? It sounds awfully cliché.

No, but that's what the higher ups want to believe. :)

Guess it depends on what you think is frequently, maybe once or twice a year when working in a office setting. Always happened at small companies, and usually what was sparked in the conversation had a big impact on what the company worked on.

So for the off chance that once in 6 months you have a conversation in person that "sparks something" everyone has to suffer commutes and all the other crap that comes with WFO?

Yeah I'll pass on that, thanks

No, I think people should be able to make a choice between working in a office vs working remote, which considering how many jobs are remote nowadays, you can kind of already do.

I'm not saying all companies should work in a office, I'm just sharing my viewpoint from someone who prefers in office compared to remotely.

All of the hallway conversations I've had have been diversions.

The biggest problem with remote communication isn't presence. It's latency. You really feel it when you have those situations where people constantly talk over each other.

People already join always-on video calls. Adding a VR headset doesn't seem like a clear improvement.

Is it really different from current video-conferencing tech? I mean, why would I want to slap the display onto my face?

Maybe this could help all those Meta employees who are apparently too unproductive to work at home...

Imagine being in a meeting of like 12 floating torsos, you are position locked, but your boss enabled X-Y position for himself so he floats in front of you to express his concern of how you are not a team player. If I were at meta I would backdoor a sub-routine that increases gaussian blur if the boss gets too close.

Plenty of public polls on Blind and threads on HNews to see the impact of remote work. Don't think we benefit from ignoring that for many people remote work means hardly working.

>remote work means hardly working

If the only way an employer can tell if their employees are working is by forcing people back to the office, the company has bigger problems that employees not working.

I'm one of those. If I get the chance to work from home I do everything in my power to slack off and fake it. No tools or processes can stop me. I find ways to exploit everything and game the system to make it look like I'm working when I'm not.

Perhaps, but they are Meta. With tech like this, they should be well equipped for remote work.

It would be like Ford employees being largely unable to drive. Which I hope is not the case...

Nobody actually eats their own dogfood. They ply you with wine while they drink grape juice, waiting to take advantage of the drunk fool.

Fucking Zoom declared mandatory RTO, which says a lot about what they sell the rest of us on.

And Ford...heh. Despite incentives, their own employees refused to buy their cars to such an extent that competitor vehicles were banned or relegated to remote parking lots.


> their own employees refused to buy their cars to such an extent that competitor vehicles were banned or relegated to remote parking lots

That's actually a good thing! It means they took it seriously, and their employees were incentivized to fix the problems so that they actually wanted to drive the company's cars.

This is so much more impressive than the Apple demo.

I’m sure some people are Apple are freaking out. Meta has a 10X more affordable product on the market.

They are all in on spatial computing and AI.

The future is exciting indeed.

I’m reading Snowcrash again and just been blown by how fiction is turning into reality.

Raven: “You wanna buy some Snow Crash, man?”

Hiro: “Snow Crash?”

Raven: “It’s the most expensive drug there is.”

Snow Crash was a dystopia, though.

That's a matter of perspective. I'd much rather live in the world of Snow Crash where everything is constantly in flux

I really don't get how some people got excited with Apple's VR play. Even when they announced, the Quest line seemed light years ahead of it already.

I swear to god if the Metaverse actually works my friends at Meta are never gonna let me live it down.

I've talked SO MUCH trash

That's pretty impressive. Certainly helps bridge some of the gap lost from remote 2D video collaboration/discussion.

I don't want to be a photorealistic version of my human self. That seems so limited and boring.

I was a bit disappointed they didn't try swapping avatars with each other. Would've been interesting to see the effect of two people temporarily inhabiting each other's skin, albeit virtually.

I wonder if it could be useful technology in helping face transplant patients get used to their new features, in advance of the operation. A virtual mirror, with a reflection of the future.

Yeah I don't get how they looked at SecondLife and VRChat and went "no, people don't want that, people want to be entirely themselves in virtual world".

I think both are useful. If I'm talking with my Mom, or most people I would talk to on Facebook or say Linkedin it seems like a good fit but not most other places. Having an avatar in public places with randoms and the option for a realistic version in private places seems useful.

Sure, but the same technology is what enables you to be a photo-realistic Kzin. Once you have eye, mouth, and body tracking, an effective way to transmit that information in realtime, and a way to apply those motions realistically to a model, then you can swap it out with any compatible model.

They discuss this at some point in the podcast, Zuck specifically saying he is interested in whether the future is photorealistic or more abstract and commenting on some interesting tests they have run that have mixed the two together.

Perhaps it's a feature but I wonder how jarring the lack of progression would be given the static nature of models?

If you were to chat with someone daily for a year, you'd never see their hair grow (or get cut) for example but then I suppose we never actively think of these things, just notice when they change.

Zuckerberg said that he wished that it will be a 3 to 5 minute process to scan your face.

When they achieve that, it will be quick and simple to update your scan.

The people that keep an old scan at that point are the same people that also keep an old photo as their profile picture anyways, and it’s often not due to the technology.

Some are too lazy to update profile pictures no matter how simple it is. Some people don’t know how to do it no matter how simple, but that’s more rare. And some people purposely choose to keep an old photo because they want to be seen the way that they once were rather than the way that they are now.

Me for one, I will keep my 3d scan up to date every now and then. Say, every few months or so. Just like how I currently update my profile picture every few months on platforms that I actively use. (For example, I’ve been at my current company for about a year now, and in that time I’ve changed my Slack and company GitLab profile pics one time – from the picture I chose when I joined, to a new up to date photo).

Anyway. I am sure that if people stick to old scans, the Metaverse companies will eventually counter that by virtually “aging” the scanned models that people use. So that even if you don’t change your scanned model, the Metaverses will add grey hairs, wrinkles, etc to it over time.

Red Dead Redemption 2 (5 years old at this point) has natural hair growth, barber shops, hygiene, body shapes that change based on diet, plus a host of other time-oriented realistic simulation. [1][2]

It's just a matter of time before this stuff is incorporated into what we're seeing here.

[1] https://screenrant.com/rdr2-hair-growth-length-time-tonic-sp...

[2] https://www.watchmojo.com/articles/10-most-realistic-feature...

> Perhaps it's a feature but I wonder how jarring the lack of progression would be given the static nature of models?

They talked about this in the recording, calling out (not/)shaving and weight fluctuations may or may not be reflected.

My own thoughts: we partially already there, considering people hardly update their static avatars/profile/professional headshot pictures on a weekly basis . We're inching towards the "Residual Self-image" of the Matrix universe

I have this issue with Zoom as well, when people have their perfectly selected picture from 5 years ago for so long, then at one point they turn on their camera and you can't help but see the difference between both.

On Teams, I don't have a picture of myself as my avatar at all. I'm very happy with it just being my initials.

I gotta say all I see is narcissism when looking at those photos. I much prefer ones that have nothing to do with the person's face

This is truly amazing. It's interesting that they used their arms/hands a number of times but they didn't show up so it prevented them from communicating with them. Actually, I'm wondering how weird things were without being able to use their bodies. I wish they would have talked about that (or how eye contact is weirder/easier in this setting).

I'm imagining that with actual built environment it will be so much nicer as well. This reminds me of when a friend and I visited VR spaces (in VR chat I think?) felt like I was visiting a minecraft universe all over again with a friend.

This looks impressive. What's funny is I had an emotional first reaction that I'm going to be old and gray by the time fully body scanning is readily available. No longer my young supple self. Silly, I know.

Take a few hundred photos now for later avatar building. You can already have headshots generated from uploading a variety of ~40 photos of your face and head.

Lex Fridman makes Zuckerberg seem like Robin Williams.

Did anyone else find themselves focusing on the eyes?

Pupil diameter in most humans is affected by autonomic arousal, e.g. in conversation it provides an often unconscious signal to the listener/observer. I didn't detect any dilation or constriction of the pupils in either head image; and for me it introduced some uncanny valley-ness.

The contrast between Mark's lighter iris color and both the blackness and relative smallness of his pupils drew my attention repeatedly. The middle image of the sampled video shows some contrast between iris and pupil but that might have been too noisy for their use. Anyway, I'd be curious, what they tried here. It seems they're rendering the pupil, I wonder if they'd tried playing with the diameter as a fixed proportion of the iris diameter, or whether they tested edge blurring for lighter-colored eyed individuals to reduce contrast.

I'd be curious to learn, but suspect that sending the "wrong" eye dilation information may be worse (e.g. sending a "beady" eyed signal triggering unconscious emotional responses) than just sending a static pupil size too.

Still a very impressive demo.

So how come they are able do this in real time, on a headset, over the internet, yet next gen gaming consoles don’t even get close to that level of detail?

Games do a lot more than just "render a high quality bust of a person", you have whole environments and entire systems that are interactive. Most technical demos get away with higher fidelity because of this, and when you finally see it implemented in games, they've been scaled back a lot.

Yuge pipes.

I’ve had the pleasure of sitting on a network that was, in practice, not bandwidth limited and it has led me to conclude that the terrible experience in practice is caused by retail ISPs being absolute dogshit. If you can get on a really well run ISP like Fiber7 in Switzerland, or a $BigCorp network, things are much better and demos like this are no problem.

Game consoles have lots of other details to worry about like the background (this demo is just an empty black background), NPCs and everything they need to do, game logic, physics, etc

They talk about their low bandwidth avatar codec in the first 5 minutes of the interview.

latest consoles could, it's more about the software. Also easier if you have nothing else to render than a face. https://www.unrealengine.com/en-US/metahuman looks pretty good, not many games are using UE5 yet

he explains it during the podcast, that they only send some information over the wire

I'm quite surprised with the overall negative reactions on this thread. It's like everyone's saying "why do we need emails if we already have the fax machine?" when this truly feels like the future

Sure, there's post-processing, scanning your face with this level of definition may take some work currently and the full screen video we see may feel very different for someone wearing those goggles, but these should all be solved as the technology improves

You can fight it all you want, but if it's half as good IRL as this demo suggests, it's obviously here to stay.

Criticism so far seems to make the same few points:

> "Hell, now remote work is ruined. Thanks, Zuckerberg"

I actually think this may help enable remote work in the long run as companies who are unwilling to accept the current WFH/hybrid models see this as a viable compromise.

More importantly, when I call my supplier in Belgium, I'll be able to see them "face-to-face" (avatar-to-avatar?) and develop a more human relationship than just exchanging emails or phone calls.

> "Great, social relationships are going to be even worse"

If that's the use you want to make of it, sure. But it also enables you to connect with loved ones who live far away. It can be so powerful for the elderly who struggle with loneliness to feel closer to their family.

> "These two guys are terrible at conveying emotions with facial expressions"

Ad hominem notwithstanding, if anything this is an endorsement of the casting choice for a tech demo since their expressions would be easier to replicate in the virtual world

In classic HN fashion I think there will have have been more positive comments added by the time you finished posting your comment.

> "These two guys are terrible at conveying emotions with facial expressions"

Fridman anticipates and addresses this reaction at the 10:08 minute mark in the video as well, so for the people that watched it these type of comments will unfortunately come off as unoriginal and stale. Either the people saying that didn't get that far or they felt the need to make the comment anyway.

Aren't the benefits (with your supplier and loved ones) also equally achieved by using videoconferencing that's available today?

They are achieved, but not in an equally immersive way. If you believe "more immersive = better" for those situations, then this is an improvement. I'm inclined to believe most (non-HN) people would take that position.

Impressive, he has legs.

As they kept zooming out I was more and more impressed.

what timestamp is that?

First 10 seconds.

Ah damn I thought there was a full body thing at some point

I'd heard they added legs but this was my first time seeing it.

The interesting part is that even with current tech this entire interview could be performed by ML-powered agents. Create text and emotional markup with LLM, clone voices with Paddle, animate pre-scanned models with SelfTalk or FaceFormer and voila.

In several years we will be unable to say if the content is generated by an actual media person, ML agent or low-paid shadow-performer.

Why does it sound like he's saying "Kodak" instead of "codec" (at ~3:19 in the video)?

Seems like the kind of thing that should lower transmission bandwidth once they get it sorted out. Send the body model once and just articulate the joints/muscles. Probably on the order of 1kB/frame assuming FP16, I'm sure you could do better with compression and diffing.

I wonder how much post processing this has. Does these glasses reproduce eye blink too? (honest question)

Yes. These are Quest Pro headsets, which have eye tracking, so they should be able to pick up blinks, squints, winks etc. Apple Vision Pro should also be capable of this.

Apple Vision Pro previews have showed off not-quite-photo-real avatars [1] that feel like they're intentionally toned down slightly into 'good but obviously artificial CG animation' territory to avoid uncanny valley issues.

[1]: https://www.zdnet.com/article/meet-your-digital-persona-appl...

Very interesting. Thanks!

What timeline do people think it'll be until I can use this on a VR device at home?

Zuck said it would take more than few years.

End of this or next year?

This is like they are in the construct in the Matrix. Pretty cool that Zuck and friends are close to implementing this in reality.

Now they just need to be able to "load the weapons program", and "learn kungfu".

I listened to the entire hour when this got posted earlier today. I still have no idea what the point is here. Does anyone have a better use case for this technology than was presented in this video?

While these avatars seem pretty accurate, a conversation between two completely robotic humans might not be the best showcase of how far off we are from “feels like a real conversation”

Wow Zuckerberg looks exactly as real here as he does in every photo

It looks amazing, but the idea of FETA mining my facial data for marketing info is 100% black mirror.

It’s great looking tech but it couldn’t have come from a worse company.

Futurama have been very clear on the subject: all internet, and this new toy included, are done for certain kind of virtual contemplation only.

Yeah learn how to scan entire rooms and beam it into my head without me having to strap a screen to my head (as small as it is) and then we can talk.

Holy uncanny valley Batman!

I think it's pretty cool but a bit jarring. I think I'd rather have the cartoonish avatars until this gets a little better. It's a good start though!

And I agree that seeing facial expressions is critical for the best interactions.

I disagree. I would absolutely have thought this was real if no-one told me. It's only the first few seconds of the video that look uncanny.

I think it's only uncanny valley on Mark Zuckerberg. The dude looks uncanny in real life.

They should have waited for this and skipped the legless avatars.

they previewed this at around the same time

Does anyone know how that would compare to some sort of real time Neural Radiance Field ? Is it possible ?

That might be what they are doing.

To get full realism for the viewer in the headset as he moves around, they would need to be able to accomplish something like that.

Am I the only one who doesn't get the fuss about this and has no interest in it?


The day Lex Friedman sold out. Recorded for posterity.

Ironically, I am unable to locate this media as a VR180.

This is impressive, hilarious and cringe all at once.

We need to invent a new word for this sensation.

Uncanny Silicon Valley

It's impressive, yes, but the lips are off and delayed. It's pretty bad compared to video.

I hate this so much. It's technically awesome, but I want nothing to do with it. If I want face to face time, I'll get face to face time. My fear is this is going to be shoved down our throats and WFH will become work while being monitored with your meta on. There may be cool applications for disabled persons or isolated folks like the elderly. But I do not want to be on this all day.

Also, I think it's pretty telling that the people who do want to be in VR all day, like dedicated VRChat users or the handful of people who actually work in VR collaboration apps, generally go for either vaguely humanlike abstract avatars (see: any screenshot of Bigscreen Beyond), or wildly un-ordinary robots, impractical anime people, favorite cartoon characters, etc (see: all of VRChat).

> I think it's pretty telling that the people who do want to be in VR all day [...] generally go for either vaguely humanlike abstract avatars

I'd wager that's more a product of technological limitations (and overall awkwardness) than a matter of demographics. Video games and other fully 3D environments tend to avoid photorealism at all cost, because it's compute-expensive and ugly. By comparison, simple cartoon characters, blobs or robots are inoffensive and perfectly usable abstractions. Even this "Avatar Encoder" is 'cheating' by only rendering a relatively static portion of your face. It would be almost unusable in a VRChat-style environment where dynamic lighting and shadows are concerned.

Excellent for capitalizing on parasocial relationships.

Very bleak.

Zuckerberg's "vision" is so hilariously and transparently: "I want to build a new internet so I can own it all from the very start, inject ads and tracking into places never dreamed of before and take a cut from everyone who participates in this brave new world!"

Probably couldn’t have chosen 2 worse people to demonstrate the range of human facial expressions

If you haven't gotten very far with the "photorealistic" part, maybe you couldn't have chosen 2 better people for the demo?

It’s to be expected with beta products. Emotions have been on the Zuckerberg 1.0 roadmap since the beginning. I don’t know about Fridman, though.

Mark Zuckerberg testifying in congress reminds me of the Star Trek movie First Contact when Data was starting to feel anxious around the Borg, so he disabled his emotion chip.

Honestly though he handled that extremely well and made it backfire.

Congressional hearings are purely for political grandstanding. The low height seat countered with a cushion, the dumb questions answered with direct unemotional answers. 'we sell ads senator'. The entire process had nothing come out of it except a few politicians had egg on their face.

"Data, there are times when I envy you"

My impression is that the device isn't able to track all of the face's subtle movements so the avatars come across as seeming relatively expressionless. For example, I noticed that Lex's and Mark's eyebrows don't seem to move as much as you might expect given the emotions communicated by their voices. I assume this is either because the device literally restricts the movements of the eyebrows (perhaps they're pressed down under the headband) or it just isn't able to track them that well.

Such a negative, ad hominem attack of a comment. The technology is breathtaking. The two people have each contributed so much.

I thought Fridman was just a podcast guy? Not that that's quite nothing, but there are a lot of podcast guys.

Fridman is a rabbit hole, you'll find detractors and defenders and slanderers.

The debate has already been had so i'll just link it



Lex Fridman is a Russian-American computer scientist, podcaster, and writer. He is an artificial intelligence researcher at the Massachusetts Institute of Technology, and hosts the Lex Fridman Podcast, a podcast and YouTube series.

Lex Fridman has also done original research on robotics and computer vision detection of facial expressions. Here is one of his papers; there are several others on related areas.


That's an inadequate description

it was a tongue-in-cheek joke referencing a meme.

You were meant to chuckle at it, not take it seriously.

They also made the same joke in the interview itself

It's not a range test demo. It's a real conversation with real people who aren't prone to melodrama.

As mentioned in the video by Lex, it's the subtleties that make all the difference. I'm astonished with the accuracy of the blinking, mouth movements, subtle cheek variations, etc. It seems more accurate than the realtime feed from my webcam. The only thing I wouldn't like about it is having to wear a headset in order to experience it.

Sure, but from a technical point of view I don't think the range of human facial expressions is that wide anyway. It's just movements of muscles.

Hi mark

I Did Not Hit Her. I Did Not.

So now Facebook not only has your pictures and videos but a full 3D body scan.

Nothing bad could ever come of that /s

This seems like an incredible waste of resources that will invariably lead to work attire invading my remote work life. Veto.

The tech is cool but I'm wary of continuing to evolve the Internet to reward people for their physical appearance rather than their intellectual contributions.

There is ongoing research that would allow you to dress up your avatar however you would like.


Virtually dressing every day seems like a lot of work compared to just using an avatar that has your favorite three or four variations baked in, not to mention the basic assumption there that everyone wants to be a realistic (or realistically shaped) human and not a robot or an anime exaggeration of the human frame.

I'd like to be a robot, personally

How can it be a waste of resources if it saves an incredible amount of resources spent on travel and traffic for face-to-face meetings?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact