Hacker News new | past | comments | ask | show | jobs | submit login
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D and Text-to-Image Diffusion (bluestyle97.github.io)
99 points by danboarder on Jan 3, 2023 | hide | past | favorite | 11 comments



As with everything AI related it seems this sub-field is making enormously fast progress.

The most exciting long-term effect of this, in my view, will be the democratization of storytelling. The sort of stories we get given today by Hollywood / Netflix / Apple TV / national TV channels tend towards duplication and are dominated by particular ideologies and beliefs. The lack of optimistic sci-fi is one symptom of this. I think it happens due to budget increases driving centralization. YouTube is far more diverse and vibrant, but even great YouTube content is very low budget compared to a typical TV show or film. Combining Epic's tech stack (Unreal 5, Quixel MegaScans, Metahuman) with AI generated assets could break this and allow for the production of stories that would be considered too risky, niche or ideologically unacceptable by existing studios. By allowing the assets to be forked it will also become possible to scale up story production far beyond the episodes-in-seasons format required by conventional TV, in which fanfic and canonical lore blur together with "canonical" becoming the set of fan-produced episodes that are blessed by the original creators.


What was the bottleneck that was seemingly lifted to cause this significant progress? Was it the CLIP release? The DALLE paper?


I think it's a combination of many things. Very powerful compute, very large datasets, transformers and diffusion models and a ton of research by lots of researchers easily available on arxiv, etc.


There has been pretty significant steady progress for almost the last 10 years, but only recently finally crossed into the realm of broad usefulness. This has driven an influx of people and funding to the field which increases the rate of progress even further.


Video games too - imagine modding a game simply by describing your desired changes in natural language.


I was playing around with ChatGPT to generate a rather humorous story (well, at least I thought it was) which got me thinking along these lines. Excited to see what comes out of these technologies.


Unfortunately, none of the sample videos loaded for me. I’m really excited about having text to 3D. It’ll massively speed up game dev for me.

Anything World has rigs and procedurally animates 3D objects. But isn’t true text to 3D in that it needs to have the 3D model in its catalogue.

https://anything.world/


Having this page open in the background brings my Ryzen 3600 to 60% and allocates 1gb additional RAM in Firefox. Why is HTML5 video so crippling?


Having 39 videos load and autoplay on page load isn't a very good idea in the first place.


Thinking of it in 2005-2008 some web pages with memes rendered 100th of GIFs simultaneously on single page and it worked flawlessly. Magically in 2022 modern we use videos instead for the same goal and it work terrible.


It's the browser implementation that's crappy. On my phone they all load and play flawlessly (iOS 16.2).

Still not a good idea to have so many videos at the same time though.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: