Stability.ai – Introducing Stable Video 3D

thrdbndndn · on March 19, 2024

The emphasis here is Single Image, but can this model generate with multiple images too?

We know that a single image of an object physically can't cover all the sides of it, so it's all guesswork in AI. This is totally fine for certain scenario, but in lots of other cases, it's trivial to have multiple images of the same object, and if that offers higher fidelity, it's totally worth it.

I'm aware there are many algorithms or AI models that already do that. I'm asking about Stability's one specifically because if they have impressive Single Image result, surely their multi-image results would also be much better than state-of-the-art?

pksebben · on March 19, 2024

If it's not there yet, I'm willing to bet it will be soon enough given folks hacking it apart and injecting their own solutions.

kouteiheika · on March 18, 2024

Just tried to run this using their sample script on my 4090 (which has 24GB of VRAM). It ran for a little over 1 minute and crashed with an out-of-memory error. I tried both SV3D_u and SV3D_p models.

[edit]Managed to generate by tweaking the script to generate less frames simultaneously. 19.5GB peak VRAM usage, 1 min 25 secs to generate at 225 watts.[/edit]

GistNoesis · on March 18, 2024

I managed to get it working with a 4090. You need to adjust the parameter decoding_t of the sample function in simple_video_sample.py to a lower value (decoding_t = 5 works fine for me). I also needed to install imageio==2.19.3 and imageio-ffmpeg

kouteiheika · on March 18, 2024

Ah, yep! You're right! It works now!

ganeshkrishnan · on March 18, 2024

4090 is in weird spot. High speed but low RAM. Theoretically everything should run in ai but practically nothing runs

Sohcahtoa82 · on March 18, 2024

They don't want to cannibalize sales of the super-expensive GPUs dedicated to ML/AI.

5090 likely won't have more than 32 GB, if even that much.

Tenoke · on March 19, 2024

I made a Manifold market[0] on the amount of ram a 5090 will have, and while pretty much nobody has participated, I just checked and the market is amusingly at the 32GB you've also quoted. Just like you, I hope it will be more but I fear it will be even less.

0. https://manifold.markets/Tenoke/how-much-vram-will-nvidia-50...

karolist · on March 19, 2024

Even 32GB would be great for a gaming card, any more and you're never seeing on sale as it will be bought by truckloads for AI, so of course they're not gonna balloon the VRAM. I suspect we'd still be at 16GB but they launched 3090 on Sep 2020 with 24GB, before all this craze really, and lowering is bad optics now.

Culonavirus · on March 19, 2024

Meanwhile Apple will sell you a chip with 96GB of unified memory for the price of two 4090s ... and that is with the Apple tax ... it's ridiculous. I know the memory bandwith of M2 Max is like 1/2 of a 4090, but still, the artificial kneecapping Nvidia does is absurd.

karolist · on March 19, 2024

You can add multiple, but practically speaking you're better off with used 3090s which you get 2 for the price of one 4090.

I have 3090 Ti and I can run Q4 quant 33b models at 30t/s with 8k context. A 4090 would allow me to do the same but with ~45t/s, both inference speeds are more than fast enough for people so 3090 is the usual choice. In my tests on runpod, H100 with 80GB memory is around the same speed as 3090, so slower than a 4090.

ynniv · on March 19, 2024

Don't forget the 24GB P40, which is a third the speed but also a third the cost if a 3090 (both used).

jug · on March 19, 2024

Almost sounds like a GPU vendor who isn't seeing enough competition.

ImHereToVote · on March 19, 2024

Almost like the only competition of Nvidia is the niece of the CEO.

paxys · on March 19, 2024

Or, you know, the fact that the card is made for playing video games, not training AI models.

chaxor · on March 19, 2024

"Theoretically everything should run in ai"

Odd statement. I don't really know what you mean by that. Perhaps 'math _works_, code should too' ?

I would definitely agree that it _should_ work.

I'm of the belief that no one should _have to_ publish (e.g. to graduate, get promotions, etc) in academia, and that publications should only occur if they're believed to be near Novel prize worthy, and fully reproducible by code with packaging that should last and work in 10 years, from data archives that will exist in 10 years.

But it seems I have been outvoted by the administration in academia.

Hence, we get this "ai that doesn't run" phenomenon

KeplerBoy · on March 19, 2024

What's the point of academia if not to publish?

Do you want to publicly fund researchers only for the industrial research partner's benefit?

chaxor · on March 19, 2024

It already is effectively just for industry benefit. It's been like that since the start. Work that is too expensive for industry to do (research and discovery) was put into the public sphere such that the role of industry was to take that innovation and optimize it. That's at least how it is intentionally constructed.

My main point was that there is a lot of noise in scientific journals that are caused from pressures in academia that are requirements if publishing. If these are removed, then the quality of work published increases and quantity decreased.

There are other places to post work that is derivative and non-novel like blogs. The field of biology has an immense amount of work that is mostly observational without strong conclusions or predictivity. A tabulation of observation should definitely be put out by a lab, and it should be much sooner with far less pressures than today, such as the typical dance of putting the data in during publication. The SRA is one example of a place to share data. If the typical way to work was put all data immediately onto a public repo, sometimes comment on it in ways that have been seen before on blogs and other classes below scientific journals, and then if something truly substantial comes out of it (a novel model that is analytical and highly predictive of cell behavior in all situations for example) then publish.

It could alleviate the noise from the signal. LLMs is one case where the noise is very strong in that many papers are simply 'we fine tuned an llm'.

michaelmior · on March 19, 2024

So how should knowledge be shared in academia without publishing? Any work worthy of a Nobel Prize (or more likely, a Turing Award) is built on top of significant amounts of other research that itself wasn't so groundbreaking.

That said, I certainly think that researchers can do more to make their code and data more accessible. We have the tools to do so already but the incentives are often misaligned.

ganeshkrishnan · on March 20, 2024

>"Theoretically everything should run in ai"

> Odd statement. I don't really know what you mean by that. Perhaps 'math _works_, code should too' ?

It was a typo. I mean "Everything should run in it" as in most LLM should be able to run at least quantized in 4090

chaostheory · on March 19, 2024

Yeah, I’m still debating whether I go with a Mac Studio with the RAM maxed out (approx $7500 for 192 GB) or a PC with a 4090. Is there a better value path with the Nvidia A series or something else? (I’m not sure about tibygrad)

karolist · on March 19, 2024

I have an M1 Max with 64GB and 3090 Ti. M1 Max is ~4x slower at inference for the same models than 3090 (i.e. 7t/s vs 30t/s), which depending on the task can be very annoying. As a plus you get to run really large models, albeit very slowly. Think if that will bother you or not. I will not give up my 3090 Ti and am rather waiting for 5090 to see what it can do because when programming, the Mac is too slow to shoot of questions. I use it mostly to better understand book topics now and 3090 Ti to do fast chat sessions.

kristianp · on March 19, 2024

You can get a previous gen RTX A6000 with 48GB of gddr6 for about $5000 (1). Disclosure: I run that website. Is anyone using the pro cards for inference?

(1) https://gpuquicklist.com/pro?models=RTX%20A6000

Oioioioiio · on March 19, 2024

Just don't max out the Mac Studio and get both...

chaxor · on March 19, 2024

Groq may be an option?

Zenst · on March 19, 2024

Perhaps NVIDIA or somebody could invent a RAM upgrade via NVLINK? Seems plausible and not every problem would want to add another GPU when the ability to add the extra memory alone is all they need.

wongarsu · on March 19, 2024

But why would NVIDIA do that when they can just sell you an A100 for ten times the price of a 4090?

margorczynski · on March 19, 2024

We need AMD to compete, but from what I know their software is subpar to NVIDIA's offering and most of the current ML stacks are built around CUDA. Still there's a lot of money to be made in this area now so competition big and small should pop up.

idonotknowwhy · on March 19, 2024

I'd love it if AMD and Intel teamed up to make a wrapper layer for CUDA. Surely they'd both benefit greatly.

versteegen · on March 19, 2024

First Intel and then AMD funded a wrapper, yes. Unfortunately the new version supports AMD but no longer Intel.

https://github.com/vosen/ZLUDA

That's a binary level wrapper. Of course there's also ROCm HIP at the source level, and many other things, such as SYCL

dragonwriter · on March 19, 2024

In a hypothetical near-future world, competition?

dacryn · on March 19, 2024

the memory is inherent to the gpu architecture. You cannot just add VRAM and expect no other bottlenecks to pop up. Yes they can reduce the VRAM to create budget models and save a bit here and there. But adding VRAM to a top model is a tricky endeavour

Animats · on March 19, 2024

There is a mini-industry in China buying old NVidia GPUs, upgrading the memory, and reselling them.

devit · on March 19, 2024

What's the best that can be achieved with this method?

Animats · on March 19, 2024

2X. Converting old NVidia 2080s from 11GB to 22GB. [1]

[1] https://www.tomshardware.com/pc-components/gpus/chinese-work...

jokethrowaway · on March 18, 2024

What can't you run? Unquantised large text models are the only thing I can't run

Stable diffusion, stable video, text models, audio models, I never had issues with anything yet

michaelt · on March 19, 2024

The 4090 is in a bit of a funny space for LLMs.

There's a lot of open weights activity around 7B/13B models which the 4090 will run with ease. But you could can run those OK on much cheaper cards like the 4070Ti (which is of course why they're popular).

And there's a lot of open weights activity around 70B and 8x7B models which are state-of-the-art - but too big to fit on a 4090. There's not much activity around 30B models, which are too big to be mainstream and too small to be cutting edge.

If you're specifically looking to QLoRA fine-tune a 7B/13B model a 4090 can do that - but if you want to go bigger than that you'll end up using a cloud multi-gpu machine anyway.

blackoil · on March 19, 2024

It is targeted to gamers, that professionals are buying. They should be buying A6000 which has 48GB.

LoganDark · on March 18, 2024

4090 has more VRAM than most computers have system RAM. Surprised this is considered "low RAM" in any way except for relative to datacenter cards and top-spec ASi.

samplatt · on March 19, 2024

You're comparing RAM amounts to other RAM amounts without considering requirements. 24GB is more than (most) current games would ever require, but is considered a uncomfortably-constrictive minimum for most industrial work.

Traditional CPU-bound physics/simulation models have typically wanted all the RAM they could get; the more RAM the more accurate the model. The same is true for AI models.

I can max out 24GB just using spreadsheets and databases, let alone my 3D work or anything computational.

LoganDark · on March 20, 2024

Good to know. I've only been running LLMs at home and most of the open-source ones have been more than small enough to fit in my measly 12GB. But I guess most workloads want so much that 24GB won't fit them at all.

bionhoward · on March 20, 2024

a ~multi thousand dollar chip with 24gb is “low ram” ? That’s more ram than most computers!

idonotknowwhy · on March 19, 2024

Didn't know 24GB was considered low lol.

michaelt · on March 19, 2024

Here's a vram requirements table for fine-tuning an LLM: https://github.com/hiyouga/LLaMA-Factory?tab=readme-ov-file#...

No matter how much vram you have, there's something that doesn't fit :)

throwing_away · on March 19, 2024

This is also how I learned that 8X7B doesn't mean "eight 7B models joined somehow".

mcbuilder · on March 19, 2024

For AI that's either a very fat SDXL model at it's max native resolution, or a quantized 34B parameter model, so it's on the low size. Compare that with the Blackwell AI "superchip" announced yesterday that appears to the programmer as single GPU with 30TB of RAM.

Hikikomori · on March 18, 2024

Maybe dont use a gaming card for ai then? 24 is plenty as most games dont use more than half in 4k.

smcleod · on March 18, 2024

Maybe give me lots of money to give Nvidia for a card with more memory then?

Nvidia have held back the majority of their cards from going over 24GB for years now. It's 2024 and my laptop has 96GB of RAM available to the GPU but desktop GPUs that cost several thousands just by themselves are stuck at 24GB.

dannyw · on March 18, 2024

They don’t get their absurd profit margins by cannibalising their data centre chips.

This is like Intel and their refusal to support ECC memory; when AMD does on nearly all Ryzens.

—

Note: your laptop is probably using a 64-bit memory bus for system RAM. For GPUs, the 4090 is 384-bit. That takes up a lot more die area for the bus and memory controller.

versteegen · on March 19, 2024

But GP's laptop with 96GB of unified memory would be a M2 Max Macbook or better. The M2 Max has a 4 x 128-bit memory bus (410GB/s) and the M2 Ultra is 8 x 128bit (819GB/s), versus a 4090 at 1008GB/s. But see here for caveats about Mac bandwidth: https://news.ycombinator.com/item?id=38811290

DSMan195276 · on March 19, 2024

Isn't there the risk that if they give the gaming cards enough RAM for such tasks then they'll get bought up for that purpose and the second-hand price will go even higher?

I guess my point is, rather than give the cards more RAM, the gaming cards should just be priced cheaper.

Hikikomori · on March 18, 2024

Why would they do that with a gaming card? If you want more you can rent in Aws etc.

smcleod · on March 19, 2024

It wouldn’t be a local model if it has to work on AWS.

chaostheory · on March 19, 2024

Which laptop models share system RAM with an Nvidia RTX cards?

stygiansonic · on March 19, 2024

Op probably referring to an M series MacBook since it has a unified memory architecture and the same memory space used by both cpu and gpu

karolist · on March 19, 2024

This is unfairly downvoted. They launched 3090 on Sep 2020 with 24GB which was more than AMD's 16GB 6900XT launched on that same month. Maybe before blaming Nvidia, blame AMD for lack of trying to compete with them? Of course they're not gonna release a gaming card with loads more VRAM because a) competition doesn't exist nor has gaming cards with more VRAM b) it would all be bought up for AI workloads c) games don't really need more as parent said.

whywhywhywhy · on March 19, 2024

Dunno why the defaults for this stuff isn't the base performance, feel I always have to tweak the batch size down on all the base scripts even with 24gb cos everything assumes 48gb

monkeynotes · on March 19, 2024

Yeah, this is to be expected with early adoption. This stuff comes out of the lab and it's not perfect. The key thing to evaluate is the trajectory and pace of development. Much of what folks challenged ChatGPT with a year ago is long lost in the dust. Go look at stable diffusion this time last year. Dall-E couldn't do words and hands, it nails that 90% of the time in my experience today.

remotefonts · on March 19, 2024

About words, Dall-e is nor even close to nail it 90% of the time. Not even 50%. Maybe they nerf it when you request a logo from it, but that was my experience in the last few days.

Filligree · on March 18, 2024

If the animations shown are representative, then the mesh output may very well be good enough to use in a 3d printer.

Looking forward to experimenting with this.

jsheard · on March 18, 2024

With previous attempts at this problem the shaded examples could be quite misleading because details that appeared to be geometric were actually just painted over the surface as part of the texture, so when you took that texture away you just had a melted looking blob with nowhere near as much detail as you thought. I'd reserve judgement until we see some unshaded meshes.

What they show in the demo: https://i.imgur.com/9bZNTcd.jpeg

What comes out of the 3D printer: https://i.imgur.com/MZrzsfh.png

SV_BubbleTime · on March 18, 2024

It’s always been this. None of these ever show the untextured model.

When I see a demo where they are showing wireframes I know it’ll be good enough.

jsheard · on March 18, 2024

Seems like a tougher nut to crack than image generation was, since there isn't a bajillion high quality 3D models lying around on the internet to use as training data, everyone is trying to do 3D model generation as a second-order system using images as the training data again. The things that make 3D assets good, the tiny geometric details that are hard to infer without many input views of the same object, the quality of the mesh topology and UV mapping, rigging and skinning for animation, reducing materials down to PBR channels that can be fed into a renderer and so on aren't represented in the input training data, so the model is expected to make far more logical leaps than image generators do.

wincy · on March 19, 2024

I know where I could get several hundred terabytes (maybe an exabyte? It’s constantly growing) of ultra high quality STL files designed for 3D printing. I just don’t have the storage or the knowledge of how to turn those into a model that outputs new STL files.

I’d imagine it’d require a ton of tagging, although I have a good idea of how I could leverage existing APIs to tag it mostly automatically by generating three still image thumbnails of the content, then feeding that through CLIP, and verifying that all two or three agree on what it’s an STL of, and manually tag the ones that fail that test.

supermatt · on March 19, 2024

There’s a pretty big difference between hundreds of terabytes and an exabyte. Maybe you meant petabyte?

derefr · on March 19, 2024

> since there isn't a bajillion high quality 3D models lying around on the internet to use as training data

There aren't a bajillion high-quality 3D models of everything, but there are an unbounded number of high-quality 3D models of some things, due to the existence of procedural mesh systems for things like foliage.

You could, at the very least, train an ML model to translate images of jungles into 3D meshes of the trees composing them right now.

Although I wonder if having a few very-well-understood object types like these, to serve as a base, would be enough to allow such a model to deduce more generalized rules of optics, such that it could then be trained on other object categories with much smaller training sets...

refulgentis · on March 18, 2024

It almost seems easier, in that you have an arbitrary # of real world objects to scan and the hardware is heavily commoditized (IIRC iPhones have this built in at highres now?)

polygamous_bat · on March 18, 2024

How is building a dataset easier than using a prebuilt dataset?

refulgentis · on March 18, 2024

In context, the conversation was beyond a dichotomy - thankfully. Having only 2 choices leaves conversation at people insisting one is better, and becomes an argument about definitions where people take turns alternating being "right" from the viewpoint of a neutral observer.

It's proposing a solution to the author's observation that everyone is doing it in second order fashion and missing a significant amount of necessary data.

The implication is that rather than doing it the hard way via the already-obtained 2nd order dataset, it'll be easier to get a new dataset, and getting that dataset will be significantly easier that it was to get the second-order dataset, as you don't need to worry about aesthetic variety as much as teaching what level of detail is needed in the mesh for it to be "real"

clbrmbr · on March 19, 2024

Couldn’t a deep network learn the latent 3D representation just on video input?

bobba27 · on March 18, 2024

Yes. But it is still promising. Things are getting incrementally better.

(I dream of the day when this can be used to automatically create paper-craft templates.)

euazOn · on March 18, 2024

Therefore, what is the main usecase of this model? Generating cheap 3D assets for videogames?

jsheard · on March 18, 2024

I don't think they have a specific use-case for this model, they're throwing ideas at the wall again in the hopes some of them stick and eventually turn into another product. The paper doesn't discuss any of the problems that would need to be solved in order to easily generate game-ready assets so I think it's safe to assume that it currently doesn't.

For games at the very least you need to consider polygon budget, getting reasonably good UVs, and generating materials which fit into a PBR shader pipeline, at least if it's going to work with rendering pipelines as we know them today (as opposed to rendering neural representations directly, which is a thing people are trying to do but is totally unproven in production).

pksebben · on March 19, 2024

I'd be willing to bet you could create a diffusion model to map unrefined meshes to UV-fixed and remeshed surfaces. If you had a large enough library of good meshes you just programmatically mess 'em up and use that as the dataset.

Oioioioiio · on March 19, 2024

There are AI models who can create proper meshes though.

dgellow · on March 19, 2024

Which ones?

Oioioioiio · on March 19, 2024

This for example: https://research.nvidia.com/labs/toronto-ai/flexicubes/

dgellow · on March 20, 2024

Thanks, that's really interesting!

strich · on March 19, 2024

There exists software to reproject texture normals back on to a high poly model. So this problem does have a solution for anyone interested.

jsheard · on March 19, 2024

That's assuming your generator produces a normal map, the ones I've seen do not, the only texture channel they output is color. That being the one channel that a model trained on images is naturally equipped to produce.

huytersd · on March 19, 2024

You can generate pretty reliable texture depth maps from just an image. It’s going to be trash if you’re trying to generate the depth for the entire 3D model but I presume it’s going to go a good job with just texture. Then you just use a displacement based on the depth map.

pksebben · on March 19, 2024

I may be speaking out of ignorance here, but couldn't you use photogrammetry techniques to translate these to a higher resolution mesh?

zo1 · on March 19, 2024

Only if you have multiple images of the same areas so that you can extract actual position. And there is no guarantee that multiple pictures of the same model have the same detail, much less in a manner that can be triangulated with accuracy. A lot of the photogrammetry algorithms discard points that don't match certain error-bars.

So yes, there might be a wooden frame in the middle of that window, but does it match the math on both angles of it? Doubt it.

neom · on March 18, 2024

I don't know much about 3D printing, would be very interested in learning more about this idea if you'd be so kind as to expand on it. Could I have AI spend all day auto scanning what teens are doing on instagram, auto generate toys based on it, auto generate advertisements for the toys, auto 3D print on demand?

CobrastanJorji · on March 18, 2024

I think their suggestion was more "I have a photo of a cool horse, and now I would like a 3D model of that same horse."

Another way of looking at it, 3D artists often begin projects by taking reference images of their subject from multiple angles, then very manually turning that into a 3D model. That step could potentially be greatly sped up with an algorithm like this one. The artist could (hopefully) then focus on cleanup, rigging, etc, and have a quality asset in significantly less time.

bobba27 · on March 18, 2024

The question is whether this actually "creates a 3d model based on the picture", or if it "finds an existing model that looks similar to the picture and texture map it".

SirSourdough · on March 18, 2024

Hypothetically, sure, assuming the parent comment that these meshes are sufficient for modelling is correct and that you can find any teens who want a non-digital toy.

I think a good hobbyist application for this would be something like modelling figurines for games, which is already a pretty popular 3D printing application. This would allow people with limited modelling skills to bring fantastical, unique characters to life “easily”.

Filligree · on March 18, 2024

Pretty much. We're already generating images of monsters and characters for a D&D campaign; being able to print those in 3D would be pretty amazing.

maicro · on March 18, 2024

OP is suggesting that this (AI model? I honestly am behind on the terminology) could replace one of the common steps of 3D printing - specifically, the step where you create a digital representation of the physical object you would want to end up with.

There are other steps to 3D printing in general, though; a super rough outline:

- Model generation

- "Slicing" - processing the 3D model into instructions that the 3D printer can handle, as well as adding any support structures or other modifications to make it printable

- Printing - the actual printing process

- Post-processing - depending on the 3D printing technology used, the desired resulting product, and the specific model/slicing settings, this can be as simple as "remove from bed and use" to "carefully snip off support structures, let cure in a UV chamber for X minutes, sand and fill, then paint"

As I said before, this AI model specifically would cover 3D model generation. If you were to use a printing technology that doesn't require support structures, and handles color directly in the printing process (I think powder bed fusion is the only real option here?), the entire process should be fairly automatable - a human might be needed to remove the part from the printer, but there might not be much post-processing to do.

The rest of your desired workflow is a bit more nebulous - I don't know how you would handle "scanning what teens are doing on instagram", at least in a way that would let you generate toys from the information; generating and posting the advertisement shouldn't be too hard - have a standardish template that you fill in with a render from the model, and the description; printing on demand again is possible, though you'll likely need a human to remove the part, check it for quality and ship it. You could automate the latter, but that would probably be more trouble than it's worth.

neom · on March 18, 2024

Interesting, to be clear I don't think this is a good idea and it's kinda my nightmare post capitalism hell. I just think it's interesting this could be done now.

On finding out what teens want, that part is somewhat easy-ish, I guess you'd need a couple of agents, one that is scanning teen blogs for stories and then converting them to key words, then another agent that takes the key words (#taylorswift #HaileyBieberChiaPudding #latestkdrama etc) into Instagram, after a while your recommend page will turn into a pretty accurate representation of what teens are into, then just have an agent look at those images and generate difs of them. I doubt it would work for a bunch of reasons, but it's an interesting thought experiment! Thanks!

dubin · on March 19, 2024

I'd like to play around with something like this, but from my understanding my machine (Macbook, 2021 M1) isn't nearly powerful enough (right?). Are there remote/cloud environments where I can run models like this?

ilaksh · on March 19, 2024

I suggest just using Stability's API. You aren't allowed to use it locally for commercial use anyway.

You could set something up on RunPod or AWS, but I doubt it's worth the effort.

dubin · on March 19, 2024

Awesome, thank you!

It does look like SV3D is not a part of the API currently, but only a matter of time I imagine.

ionwake · on March 18, 2024

Im sorry for dumb lazy question. But would the input require more than one image? Is there a demo url to test this? I think it might jsut be time to buy a 3d printer.

EDIT> Does "single image inputs" mean more than one image?

simonw · on March 18, 2024

It's just a single image. It guesses the shape of the bits it can't see based on vast amounts of training data.

ionwake · on March 18, 2024

Amazing! Thank you

exodust · on March 19, 2024

I have an even lazier question after failing to speed-read the article.

Does this output an actual 3D mesh? Or does it only output a 3d-looking rendered animation?

kylebenzle · on March 18, 2024

Single image means one image.

ionwake · on March 18, 2024

lol cmon guys don't be too hard on me it does say "inputs"

stavros · on March 18, 2024

I do see how "single image inputs" can be conflated with "multiple inputs of a single image each time", as opposed to "video".

ionwake · on March 18, 2024

TBH I always look at the worst case scenario. I was worried it meant it need 3 images inputted as a single image at direct steps of the process, so requiring different angles. I wasn't sure, but thought best to check. I feel like it would have been clearer to have said something like " generates a 3d models from a single image". ( not exact wording but you catch my drift ). Sorry I am over analysing but all feedback is good right?

ganeshkrishnan · on March 18, 2024

Describe in single words only the good things that come into your mind about... your mother.

dartos · on March 18, 2024

Can confirm the word single means 1

issung · on March 19, 2024

> Stable Video 3D (SV3D) is a generative model based on Stable Video Diffusion that takes in a still image of an object as a conditioning frame, and generates an orbital video of that object.

So can it actually output a 3d model? Or just images of what it thinks the object would look like from other angles?

krebby · on March 19, 2024

The reference video (https://youtu.be/Zqw4-1LcfWg) says they use a NeRF / structure from motion and then create a mesh with marching cubes from the generated radiance field. This is how most soa text-to-object generators work now as well

2StepsOutOfLine · on March 19, 2024

I'm also struggling to find any examples of how to actually get a 3D model output. Very few references to this capability outside of the blog post.

ddtaylor · on March 18, 2024

Does anyone know what hardware inference can run on or memory requirements?

kouteiheika · on March 18, 2024

It crashes with an out-of-memory error on my 24GB 4090, so at least when it comes to their sample script the answer is "a lot". Maybe it's just an inefficient implementation though.

dragonwriter · on March 19, 2024

Pretty much every initial Stability release has been inefficient and has resources drop a lot when optimized for real consumer hardware community engines appeared for running the model.

OTOH, with their shift to a less open licensing structure, community tooling probably won’t emerge with the same level of energy.

Mathnerd314 · on March 18, 2024

In the repo the model weights file is 9.37GB, whereas sdxl turbo is 13.9GB, and I don't see any mention of huge context windows, so probably it just needs a decent graphics card.

nbzso · on March 20, 2024

Billions purred into technology with minimal use case application. What is the direct implication of this tech? Porn on demand?

GeoAtreides · on March 20, 2024

Extracting object from an image, transforming the object (rotating, for example), re-blended it into the original image.

Or making 3d game assets from objects you have around. Imagine: take your phone, go around town, into shops, into churches, come back, press a button, get huge library 3d assets to populate your game.

Or, something like this, for IKEA: couple of photos of a room --> extract objects --> let user re-arrange furniture. The room could be either the user's room or an IKEA showroom.

You can do it with existing tools, but this kind of technology reduces it to pressing a couple of buttons.

akanet · on March 20, 2024

There are many direct consequences of people being able to directly transform text into textured 3d models, and even vaster indirect consequences if one pauses to reflect. A tight feedback system with good cohesion could revolutionize, art, design, mechanical engineering, video games, etc, etc.

ecronant · on March 20, 2024

Avante-garde/experimental film making is the main benefactor of all this.

Basically, cool looking video no one watches. I say that as a huge fanboy/artist myself. It is like Christmas every other day right now. All this VC money being set on fire to make better Avante-garde film tools is just wonderful. A dream come true.

londons_explore · on March 18, 2024

All the examples resemble plastic children's toys...

How would it handle other objects? (People, fabrics, buildings, plants, mountains, mechanical parts, etc)

programjames · on March 18, 2024

It's hard to get camera position tracking for random objects, so it looks like they used simulations. There's probably a lot more plastic children's toy models in Blender than people, fabrics, buildings, &c.

Ultimatt · on March 21, 2024

How has no one drawn attention to this was science fiction in Enemy of the State in 1998 now a trivial reality https://www.youtube.com/watch?v=4AjLXZV46eE

airstrike · on March 18, 2024

that demo animation is so clever and satisfying

amelius · on March 18, 2024

But it doesn't look very realistic, tbh.

dreadlordbone · on March 18, 2024

it doesn't break Euclidian space at least

itsgrimetime · on March 19, 2024

I can’t get them to play

dheera · on March 19, 2024

They compare against Zero123-XL, but they should compare against MVDream instead. MVDream is quite good. If you fiddle with the loss you can get even better results.

canadiantim · on March 18, 2024

I can't wait until we can use something like this for architectural design

whywhywhywhy · on March 19, 2024

SDXL+Controlnet and then feeding it just blocked out depth maps are probably more useful for that.

andybak · on March 19, 2024

Something like https://dust3r.europe.naverlabs.com/ might be more appropriate?

throwaway743 · on March 19, 2024

Anyone know of anything that'll auto rig/add weights?

ImHereToVote · on March 19, 2024

There are numerous tools that auto-rig humanoid figures. The obvious one: https://www.mixamo.com/#/

abdellah123 · on March 19, 2024

Did you write the blog post using AI ?

leesec · on March 19, 2024

I wonder when Emad will be outed as a fed or a fraud. He's certainly leaving a trail of nasty behavior in the industry.

preommr · on March 19, 2024

elaborate?

esafak · on March 19, 2024

https://en.wikipedia.org/wiki/Emad_Mostaque#Controversy_and_...