Hacker News new | past | comments | ask | show | jobs | submit login

It’s not live, but it’s in the realm of outputs I would expect from a GPT trained on video embeddings.

Implying they’ve solved single token latency, however, is very distasteful.




OP says that Gemini had still images as input, not video - and the dev blog post shows it was instructed to reply to each input in relevant terms. Needless to say, that's quite different from what's implied in the demo, and at least theoretically is already within GPT's abilities.


How do you think the cup demo works? Lots of still images?


A few hand-picked images (search for "cup shuffling"): https://developers.googleblog.com/2023/12/how-its-made-gemin...


Holy crap that demo is misleading. Thanks for the link.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: