Hacker News new | past | comments | ask | show | jobs | submit login
Toon3D: Seeing cartoons from a new perspective (toon3d.studio)
414 points by lnyan 14 days ago | hide | past | favorite | 111 comments



It's interesting that they used the Planet Express building from Futurama as one of their examples of 3D-inconsistency, because I'm pretty sure the exteriors are in fact computer-generated from a 3D model. Watch the show and you can see the establishing shots usually involve a smooth complex camera move around the building.


Agreed, most or all shots of the Planet Express building and Planet Express ship are 3D renderings, even in the original first few seasons. Beyond that, even some shots of Bender in Space are 3D renderings, especially in cases where a complex and continuous shift in perspective is required.

Non-photo-realistic (NPR) 3D art goes back a surprisingly long way in animations. I rewatched the 1988 Disney cartoon "Oliver and Company" recently, and I was surprised to see that the cars and buildings were "cel-shaded" 3D models. I assumed that the movie had been remastered, but when I looked it up, I found out that it was the first Disney movie ever to make heavy use of CGI[0] and that what I was seeing was in the original. The page I found says:

"This was the first Disney movie to make heavy use of computer animation. CGI effects were used for making the skyscrapers, the cars, trains, Fagin's scooter-cart and the climactic Subway chase. It was also the first Disney film to have a department created specifically for computer animation."

References ----------

0: https://disney.fandom.com/wiki/Oliver_%26_Company


> "This was the first Disney movie to make heavy use of computer animation. [...]"

Tron came out 1982, six years before Oliver & Company.

https://en.wikipedia.org/wiki/Tron


I guess it depends on the definition of "heavy use." I know in Tron a few scenes were CG, and there were a few CG+live-action bits, but the majority was filmed on normal physical sets in high-contrast, then painstakingly hand-processed[1] to add the neon "glow".

[1] https://filmschoolrejects.com/tron-costumes-glowing-effect/ Thanks legions of Taiwanese animators (:


From your link: >The 1982 Disney movie is privy to a remarkable number of firsts: the first feature-length film to combine CGI and live-action; the first talking and moving CGI character; the first film to combine a CGI character and a live-action one; the first fully CGI backgrounds… The list goes on and on.

Sounds pretty heavy to me.


And the film OP mentioned Oliver & Company:

>Eleven minutes of the film used "computer-assisted imagery" such as the skyscrapers, the taxi cabs, trains, Fagin's scooter-cart, and the climactic subway chase

I think Tron wins in terms of CGI


But Disney financed and distributed Tron. It wasn't made by a Disney Studio, and most of the animation was outsourced to a Taiwanese studio because Disney wouldn't lend any of their own talent. So I think it's fair to say that Oliver & Company is the first Disney-made film to use CGI.


The Great Mouse Detective (1986) was earlier and the ending sequence is CG (printed out and traced onto cels so traditional 2D characters could be drawn on top).


That's a good point. What's funny is that "The Great Mouse Detective" was actually the film I was thinking of this whole time - I believe the ending sequence took place in Big Ben, and it looks quite good by 2024 standards. But I forgot the name of the movie and assumed it was "Oliver & Company" because Oliver is a plausible name for an English mouse :)


And large amounts of the "computer" graphics in Tron are hand drawn.


Still lots of CGI.


Probably meant “Disney animated feature”.


Found a pretty cool wireframe video of Oliver and Company.

https://m.youtube.com/watch?v=mix9rStOqoI

Now I am curious to watch it



Cel shaded 3d models?

Wait, you're telling me that computers have enabled us to have fewer artists and thereby replacing artists for a long time now?!

Just like pretty much every industry out there?!

And that it's widely accepted so long as people get their cheap plastic goods from China?!

And that the current outrage won't even be remembered in 20 years?!


Kind of, it hasn't replaced anyone though. 3DCG just became good-enough basis for artists to build on, what AI bros have been fantasizing and advocating for couple years by now, yet completely ignored and mocked over.

Which tells, AI hatred don't necessary come from what pro-AI thinks where it comes from, people potentially just find AI art rage inducing.

Like, not even specific technical aspect of AI is bad or could use improvements. It just sits at the wrong side of the uncanny valley, and arguments clump around that.

That's the real problem with generative AI.


Isn't a lot of 3D in shows and games "faked" to look good to the viewer?

I remember seeing this blog write up on what 3D animators do to make things look acceptable. Like make a character 9 feet tall because when the camera panned them, they looked too short at their "real" in-system height. Or archway doors that are huge but at the perspective shot, look "normal" to us. Or having a short character stand on an out-of-scene blue box to make them having a conversation with a tall character not look silly due to an extreme height difference? Or a hallway that in real life would be 1,000 feet long but looks about 100 in-world because of how the camera passes past it, and how each door on that 1,000 foot hallway is 18 feet high, etc.

I wonder if shows like Futurama used those tricks as well, so when you sort of re-create the 3D space the animators were working in by reverse engineering like this, then you see the giant doors and 9 foot people and non-Euclidian hallways, etc. Just because it looks smooth as the camera passes it, doesn't mean that actual 3D model makes sense at other perspectives.


I don't have a ton of experience in this realm but from what I've seen it does happen a lot -- looking good is often better than being right. A great example of this is the way they tilted the models for Zelda's A Link Between Worlds[0]. Basically everything in the world is tilted back so it looks better for the camera angle, which is designed to mimic the feel of A Link to the Past.

[0]: https://www.gameinformer.com/b/news/archive/2013/11/20/the-t...


I saw some video on A Difficult Game About Climbing a while back. The things they did to make the guy appear to grip the rocks and suck normally make the hands utterly bizarre when seen from the side.


Indeed many animated shows that don't look 3d animated have a 3d model somewhere in their pipeline these days. Even if there's not a digital 3d model, there might be a physical model of the main locations in the studio for animators to refer to.


Yeah, Futurama used composited 3D elements from the very first episode in 1999. The vehicles are nearly always 3D.


the exteriors aren't generated from a 3D model, they are generated from many 3d model(s), of the same thing, that perhaps changed over time or changed between scenes, like the models on the star trek enterprise

It's... neat? But I'm struggling to think of what the applications of this would actually be. 2D artwork usually doesn't have a consistent 3D space, which they acknowledge, but they don't seem to have overcome that problem in any useful sense. The scenes are barely coherent once they move from one of the originally drawn camera positions.


Both Futurama and Family Guy sometimes use 3d rendering for vehicles for example, and render it in a cartoon looking style and composit it with flat 2d animations.

Maybe similar kind of things could be an application of this.

Another possible use-case might be a game development studio developing a license game based on a 2d cartoon, but making the game 3d. They could use this as a tool for visualization while planning and developing, to iterate quickly and to reference how the original 2d could translate into 3d.


Not really? In those examples the hand crafted 3d assets already exist, this thing could at best recreate the 3d geometries the show creators made themselves. That seems useful mainly for cloning someone else's work.


“Similar kind of thing” meaning for another show that wants to do the same but who have not created the 3d assets yet.

Team of 2d artists draw the desired vehicles for the cartoon from two or three angles. Software like this makes a usable 3d model of it.


If you were making 2D drawings with the intent of turning them into a 3D model then you would draw them to be coherent in 3D in the first place. The whole novelty of the research in the OP is that they're trying to reconstruct drawings that were never intended to make realistic sense in 3D.

Even if AI has a place in the 2D to 3D part of a pipeline, surely you'd still want the 2D artwork be unambigously representative of what the 3D asset should look like, rather than providing self-contradictory input data and praying that the AI can magically make it make sense.


True. For the second use-case I mentioned it still applies though. Where a studio is making a licensed 3d game based on an existing 2d cartoon.


A show that's being doing purely 2d art can't just integrate the 3d art in their pipeline on a whim. If they can, they probably already have the skills to make the model outright.


SpongeBob brazenly violates 3D space rules (I mean, they also have fire underwater...). The writers and artists both draw heavy inspiration from Looney Tunes, where such rules are broken because it's funny to break them.


A refined version of this could be used to make stereoscopic versions of cartoons.

On the other hand you are probably better off only using the depth prediction and filling any voids in using image generation instead of this mapping process.


I think this is just a device used to demonstrate and advance the technology. I doubt this has a real application in this context given how little work is needed to 3D model these kinds of environments anyways.


With future advancements you could pump out video games for many series.

While rough, these do look better than some implementations of the artwork for cartoon games.


I could see some value maybe in giving an artist feedback on where the model detects inconsistencies between different viewpoints.


That assumes that consistency between viewpoints is actually desirable - part of the charm of 2D animation is that things can be stylized or exaggerated or simplified in ways that don't come naturally in a 3D workflow, where the "default" is for things to fit together realistically and any deviation from that takes additional effort.

If you do want numerous 2D artworks which share a realistically defined 3D space then that can easily be done by making a very rough 3D scene and then painting over it, you don't need any AI for that.


If consistency was highly desirable you'd just model the 3d space from the start...


Maybe you could better construct a 3d model of a demolished landmark from old paintings and photos?


The renders it creates are underwhelming, but it seems good at determining the location and angle of the camera.

I could see it being used to create a "scratch track" that human animators animate on top of. An aid to tweening.


Creating 3D spaces from inconsistent source images! Super fun idea.

I tried a crude and terrible version of something like this a few years ago, but not just inconsistent spaces without a clear ground truth - purely abstract non-space images which aren't supposed to represent a 3D space at all. Transform an abstract art painting (Kandinsky or Pollock for example) into a explorable virtual reality space. Obviously there is no 'ground truth' for whatever 'walking around inside a Pollock painting' means - the goal was just to see what happens if you try to do it anyway. The workflow was:

1. Start From Single Abstract Art Source Image

2. SinGan to Create Alternative 'viewpoints' of the 'scene'

3. 3d-photo-inpainting (or Ken Burns, similar project) on original and SinGan'd images (monocular depth mapping, outputs a zoom/rotate/pan video)

4. Throw 3d-photo-inpainting frames into photogrammetry app (Nerf didn't exist yet) and dial up all the knobs to allow for the maximum amount of errors and inconsistency

5. Pray the photogrammetry process doesn't explode (9 times out of 10 it crashed after 24 hours, brutal)

I must have posted an example on Twitter but I can't find the right search term to find it. But for example, even 2019 tier depth mapping produced pretty fun videos from abstract art: https://x.com/jonathanfly/status/1174033265524690949 The closest thing I can find is photogrammetry of an NVIDIA GauGAN video (not consistent frame to frame) https://x.com/jonathanfly/status/1258127899401609217

I'm curious if this project can do a better job at the same idea. Maybe I can try this weekend.


What is a technique/library that can take an image of a 3d environment/drawing of a room and detect a rough mesh highlighting ground, walls, barriers ?


> What is a technique/library that can take an image of a 3d environment/drawing of a room and detect a rough mesh highlighting ground, walls, barriers ?

Well just in case it wasn't obvious, Toon3D, the project being discussed, is doing that. Part of the workflow is asking the user to indicate correspondences between geometry in different images, and each image is processed individually to create blocks of geometry you can toggle on or off visually.

Older projects:

https://github.com/sniklaus/3d-ken-burns

https://github.com/vt-vl-lab/3d-photo-inpainting

I believe there are some NeRF variants that do something like this from a single image as well, but I haven't personally tried any.


In the past after I got Quest 2 and started to dive into the world photogrammetry. I went into the entire pipeline into building a 3D *model* from photos of an object taken from different angle. Pipeline involved using MeshRoom and few other software to clean up mesh and port it into Unity.

In the end (from my superficial) understanding, the problem with porting anything into VR (say in Unity in which you can walk around an object) is the important of creating a clean mesh. The 3D model that tools such as OP (I haven't dived deep into it yet) is these are point cloud in 3D space. They do not generate a 3D mesh.

Going from memory from tools I came across during my research, there is tools like this https://developer.nvidia.com/blog/getting-started-with-nvidi..., again, this does not generate a mesh. I think it is just a video and not something you can simply walk around in VR.

My low key motivation was to make a clone/model like what Matterport and sell it to real estate companies. Major gap in my understanding - the cause of me to loose steam is - I was not sure how are they able to automate the step to generate clean mesh from bunch of photos from a camera. To me, this is the most labor intensive part. Later, I heard there are ML model that is able to do this very step, I have no idea on this tho.


Perhaps using Unreal + nanite + PCVR would be a better option? Nanite can handle highly complex meshes and algorithmically simplify them in realtime. Basically a highly advanced LOD system. Not sure what limitations are but it's worth a try. Also I highly recommend using Reality Capture for photogrammetry. The pricing is super cheap and you pay per scan.


NeRFs are sort of last year's technology. The latest hype is about gaussian splats.

My understanding is that essentially these technologies take some images as input, and then train a model, where the model is learning the best way to render the imagines into a model in some sense. I think for gaussian splats, it represents images as sort of "blobs" in space, and each image has the same set of blobs that have to be used from some perspective to render the image, hence by positioning the splats such that each image is rendered correctly, you can reproduce the scene.

This training is currently very expensive and has to be done for each model, but produces an output that can be explored in real time.

I think the photogrammetry approaches used by matterport et all are older and require much higher quality input data, whereas the newer approaches can work with much less and lower quality data.


https://www.reddit.com/r/sdforall/comments/13lenfm/free_seam...

https://github.com/3DTopia/OpenLRM (They mention NeRF as inspiration but it seems original paper it was based on decided to use visual transformers. the opensource version seems to use meta's dino as one of key components)


Like shrink wrap in rhino?


I have never heard of this. Must be a commercial solution?

Are you saying it can take a point cloud 3D representation into a fully working and clean 3D mesh for VR?


Rhino3d does this I believe. It is a bit spendy, yes.

It's kind of amazing that they're able to take drawing of a scene someone imagined and then create (bad) 3d models. Imagine if in the future an artist could sketch a couple of images from a scene and then get an accurate 3D model?

Or if a 2D artist could sketch a couple of poses and automatically get a well structured 3D model and textures?

I think there's been a lot of concern in the industry about the impact AI and similar tools will have on artists, but it seems like it's possible to imagine a future where machine learning based systems work more directly with an artist rather than rendering based on language etc.,

I don't know how I feel about all the moral arguments about AI training etc.,. I think to me more concerning is how it could impact people more so than how it was trained. Even if a perfectly "ethically" trained model learned to produce perfect art and artists became a niche field, I think it could still be a bad outcome for civilization as a whole because I think there's value in humans producing art, and in having a society where it's (at least somewhat) of a sustainable field.

Otoh, I think it's amazing that people can produce the kinds of images using image models, so I'm not sure. Ideally we'd be able to support people in what they want without needing their to be a market for it, but the world's not ready for that.


I'm not a graphic artist, and appreciate how the illustrator's art involves many creative tricks of representation to convey complex meanings.

However, the "messy" reconstructions of 3D space seen in these videos did make me think of the recent hype over LLMs.

That is, the representations have a clear link to the "truth" or "facts" of the underlying material, but are in no way accurate enough to be considered useful as source material for further use.


I've posted this comment before but I'm excited to see if LLMs can write new episodes in the same vein as the previous ones. I think it would be really amusing to see "new" episodes of old cartoons ( albeit with an ensuing copyright shitstorm).


I was surprised by how poorly it reproduces the look from the perspective of specific images. For example, see the magic schoolbus further down. It feels like their algorithm could probably be tuned more in the direction of "trust the images".


A huge part of art is distinguishing between what "feels" right and what would be the case in reality. Even in the spaces I usually work in-- 3D animaton and film-- things in the background or maybe out-of-focus in the foreground or whatever are often distorted and weirdly juxtaposed to make something that looks right even if it wouldn't map to a real-world configuration that makes sense. 2D art is even less tied to real-world representations than that. What we can see in applications like this is how incredible our brains are at conceptually constructing ideas based on relatively abstract representations, and how incredible artists are at operating in the less-defined realms of that space. Maybe a scene seems to have a coherent perspective to the viewer, but the couch and end table in the BG were drawn as they would look shot with a 120mm lens while the foreground is deliberately claustrophobic and drawn like it was shot with a 30mm lens? It could look fine to us because we don't need to reason about the realistic 3D space those characters exist in-- we just need to understand that they're in a space like that because we know what it's like to be in spaces, and how people interact with them-- good art gives us just enough to communicate the core ideas making them the focus of the message, and lets our brains subconsciously make the connections and add all of the context to make a complete 'experience.' Everything is a potential layer of communication to achieve deliberate artistic effect-- the type of couch and end table, the often skewed or exaggerated scale and relationships between objects, etc.-- and it often just doesn't have a coherent real-world representation. Beyond that, in any given shot, things are certainly moved around to aid in composition, emphasize certain interactions, etc. etc. etc. If you notice it, then it's a continuity problem. If you don't notice it, then job well done. In the overwhelming majority of cases, nobody notices it, and we just happen to have a world where everything from every angle has really compelling composition.

An algorithm that needs to look at the lines and try to figure out a real-world scenario that correlates to that representation might be trying to create something that could never exist in any coherent form.


Why would you have a site with a whole load of videos on it, with all of them set to autoplay and constantly loop? I was watching a video on my second screen, and it stutters each time I try to visit the site.


Is this a Chrome thing? My Firefox on Windows doesn't autoplay the videos for me.


No autoplay in Edge for me, but I definitely have Media Autoplay set to "Block".


Maybe that’s why it locked up my iPhone (Firefox) on load. Only a power cycle fixed it haha.


If you showed the Spirited Away one to Miyazaki, he would probably call it an insult to life itself.


For those wondering, this is a reference to an older video: https://www.youtube.com/watch?v=ngZ0K3lWKRc

So, not hyperbole.


Not pointed at @helloplanets, just need to note that he responded that way because it was an abomination. And they're all stockholm syndrome about the situation, "we can use AI to make grotesque monsters that feel no pain. All we want to do is make a machine that replaces human drawing." With this weird implied feeling he was supposed to congratulate them.

Quoting Miyazaki, which was not especially harsh given they showed him a naked mutant zombie crawling across the ground using it's head and arm as legs while constantly trying to arch it's butt toward the camera.

> Every morning, not recent days, but I see my friend who has a disability.

> It's so hard for him just to do a high five (waves hand showing difficulty)

> His arm with stiff muscle reaching out to my hand (demonstrates body stiffness)

> Now thinking of him, I can't watch this stuff and find it interesting

> Whoever creates this stuff has no idea what pain is, or whatsoever. I am utterly disgusted.

> If you really want to make creepy stuff, you can go ahead and do it

> I would never wish to incorporate this technology into my work at all

> I strongly feel that this is an insult to life itself.

(room sits in silence awkwardly)


Many YouTube comments seem to have understood this clip as a dismissal of AI in general. And, regardless of whether thats accurate, I disagree with this standpoint. It's not easy to defend this particular example. But seeing how Rainworld uses synthetic animation to simulate an alien, yet somehow familiar ecosystem, makes me excited for whats next.

From a Review by Matthewmatosis [1]:

> Not long after setting out, I found myself staying in a quiet place, just moving Slugcat around various obstacles as smoothly as I could. [...] What was happening on screen looked like an animal testing its limits so as to build survival skills. It was then that I knew that this system was a resounding success.

[1]: https://www.youtube.com/watch?v=x-Un2L5tF1w


Miyazaki famously not a very kind person, especially to his son


Perhaps there are different types of kindness, because his films are deeply, profoundly kind; and acutely aware of inner life.

Someone with an extreme sensitivity for kindness can easily be seen as a curmudgeon by others, ie, after long years of disillusionment with the human race, or after a traumatic experience, or simply because of how they look.

Some people might be good at being kind 'in the moment', while others need to reflect - and the second kind can be a 'bigger', more encompassing, more effective or beneficial kindness.

And many (all?) of the people who give the most of themselves without hope for any reward genuinely care nothing for external validation or recognition - meaning we don't often hear about them or recognize them.

One could garner a reputation as an absolute arse, while accomplishing fantastically beneficial changes in the world. And conversely, a man could get a reputation as a folksy down-to-earth guy who you'd love to have a beer with, even as he sets the planet on a course to perpetual war. Cough.


I am amazed they didn't seem to talk to any 3D animators before writing this. Because this is just plain wrong:

> The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it is difficult for humans to draw multiple perspectives of an object or scene 3D consistently. Nevertheless, people can easily perceive 3D scenes from inconsistent inputs!

It is difficult for human artists to maintain perfect geometrical consistency. But that is NOT why 2D animation of 3D scenes is geometrically inconsistent! The reason is that artists stylize 3D scenes to emphasize things for specific artistic reasons. This is especially true for something surreal like SpongeBob. But even King of the Hill has stylized "living room perspectives," "kitchen perspectives," etc. The artists are trying to make things look good, not realistic. And they aren't trying to make humans reconstruct a perfect 3D image - they are trying to evoke our 3D imaginations. It's a very different thing.

Pixar and other high-quality 3D animation studios intentionally distort the real geometry of their scenes for cinematic effect: a small child viewed from an adult's perspective might be rendered with a freakishly long neck and stubby little torso, because the animators are intentionally exaggerating visual foreshortening to emphasize the emotional effect of a wee little child. A realistic perspective would be simply boring. These techniques are all over the place in Pixar movies - it's why their films look so good compared to cheaper studios, who really are just moving a virtual camera around a Euclidean 3D space.

I don't want to comment on the technical details. But it really seems like the authors missed the artistic mark.


As someone who works in this space professionally, my face and my palm have never been closer. I have no problem with the project-- research is research and it's not like they're trying to pass this off as a 'solved problem'-- but among a specific subset of tech folks, AI image tools arouse this completely unwarranted "we've solved art" bravado. It inspires them to arrogantly-- sometimes even imperiously-- throw around these baseless assumptions about basic art principles. I worked in software for a long time and I know hubris in software development is nothing new, and can even sometimes be beneficial, but I'm not sure I've ever seen such an intense collective overconfidence in a single subject within the software world.


It's especially funny considering the same is done with real TV cameras. For an easy example, a lot of supposedly square rooms used in sitcoms are trapezoidal shaped; the walls meet at obtuse angles. Very few people notice that.


Also, even putting stylization for specific artistic reasons aside, work in this context is always going to get warped for the simple needs of the camera (or "camera"). This goes double for anything pre-HD, where people or characters had to fit pretty tight into the shot to have the perspective close enough for facial expressions and body language. Dig into even the most "realistic" and staid shows of the era and you'll eventually find moments where they had to discreetly move furniture or even walls to make particular shots work.


It kinda looks like a cartoon version of Microsoft Photosynth? https://en.wikipedia.org/wiki/Photosynth


I don’t like to bring unrealistic expectations to this sort of thing, but even so, all the examples look pretty bad. Am I missing something?

In addition to all the noise and haze -- so the intermediate frames wouldn’t be usable alongside the originals -- the start and end points of each element hardly ever connect up. Each wall, door, etc flies vaguely towards its destination, but fades out just as the “same” element fades in at its final position a few feet away.

It’s a lovely idea, though, and it would be great to see an actually working version.


Yes, it looks pretty bad imho. It seems that the researchers have learned about a handful of recent techniques, such as Gaussian Splatting, and decided to apply it to a novel domain (hand drawn images) without any deeper understanding.

Gaussian Splatting is, in my opinion, simply the wrong tool for geometrically inconsistent images even if you manually annotate a bunch of keypoints. Another thing is that the spherical harmonics color representation makes it easy for the model to "cheat" when there are relatively few views, i.e. even when the Gaussian is completely geometrically wrong, it can still show the right color in the directions of the views. Perhaps they should have just disabled the spherical harmonics thing (i.e. making each Gaussian the same color regardless of which direction you're looking at it from), since most cartoons have flat-ish shading anyway.

Furthermore, they didn't attempt any sort of photometric calibration or color estimation between the different views. For example the paintings each show the building in a very different lighting condition, and it seems they made no attempt to handle that at all, leading to a very ugly reconstruction.

Finally, this method requires significant amounts of human work to do the manual annotation for a very subpar result, making us wonder what the whole point of it is. It would seem to me that diffusion models like Sora or Veo could do a much better job if you just want to interpolate between different views. It isn't much different from image inpainting, which diffusion models excel at.


This web page uses over 1.6 gigabytes of RAM.


That might explain why it consistently kills Firefox Focus on my phone.


I imagine Spongebob episodes converted to this 3D format, and watching them with VR goggles, like you're there.



I love this


> The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it is difficult for humans to draw multiple perspectives of an object or scene 3D consistently.

I find this premise unsound. The reason is less that it is difficult but more that it is undesirable - in this medium.


It is a good idea, but the results are quite bad. It barely works in their demos, tons of artifacts everywhere.


I can't think of a great application either. Maybe if you want to map camera movements when converting an animated scene from 2d to 3d. It'd probably be easier just to start from scratch though. Simple polygons with a toon shader would work for simpsons and family guy im sure.


I don't think it will ever catch fire in animation studio workflows. I can't see it beating the current process of applying toon rendering to 3D geometry. Though it may help renderers add variation to the output in a way that's more authentic and less random.

I'm wondering if it's at all useful in understanding / improving AI's ability to infer semantic meaning from even real images in a variety of scenarios? Like the ability to re-interpret an interpreted construction (drawing) of a scene.

One area of application may be helping machines better understand hand drawn human input?


"the current process of applying toon rendering to 3D geometry"

Is this widespread? My sense is that most mainstream TV animation that isn't obviously CGI is still drawn in 2D, with 3D work if used at all being relegated to backgrounds and the like.


I'm sure some marketer has an email chain open with a developer asking if they can use it to help advertise bigger houses to TikTok users who film at home, or something like that. Or maybe advertise luxury products to people who are in large homes.


I'm not a historian, but I remember a tour guide in Forum Romanum mentioned that current state of knowledge about how buildings and parts of cities looked like stems from their depictions on coins that period. Perhaps it could be used for that?


That'd be a small enough sample size that it would make sense to just have a human agonize over it.

I feel like this type of thing best applies in the kind of domain they're already in— TV shows with hundreds of hours of content that a machine can comb through looking for reference images to synthesize into these models.


Trying to use this but stuck after exporting from the labeler (guessing that is close source), lots of questions:

What do I do with this data exactly? Not really following the instructions from README

Do I need a hefty GPU to run this? Doesn't say anything about hardware.

What am I going to get as a result? Will it generate a 3d model or "point clouds" ?

Do I need multiple inputs (from different angles) through the labeler?

What is the depth estimator being used here (this im most interested in especially its able to detect ground from multiple angles) ?

Guess I'm just really lost here but super eager to use this. We do have a real world application to use this.


The ability to reconstruct a coherent 3d view from a sparse set of photos seems much more useful than for a set of 2d drawings of an entirely imagined space, I don't think 2d artists are cheaper than the 3d artists.


Surely they're 2/3 the price right? I'm basing this on the fact that I'd happily draw 1 dimensional pictures for 1/2 the price of a 2d artist.


A little bit off topic, but related: are there any tools to which you can feed a few photos of a room from various angles and it will generate a floorplan or 3d model like this?


Yes, in fact at least one of them got funding from YC: Matterport.

There are many others: Kuula, Cupix, iStaging, EyeSpy360... Real estate companies use them a lot, e.g. to create a virtual tour for prospective buyers.


Yes but they don't just use a random set of photos. They take multiple 360 photos for the specific purpose of feeding them into their virtual tour app. That's why you can pan around.


thank you! very interesting, i'll check those out. do you know of any open source projects?


lumalabs dot ai is pretty neat. Takes videos as input but works very well.


This is very interesting but I feel like the name suggests it's an animation or graphics program more directly? That might be a branding loss


It’s fascinating that the generated Gaussian splats look kind of like a dream. Almost like that was the way we generate 3d scenes in our minds


I see they didn't even try Peppa Pig.


It's hallucinating a bit. There are new things put in that weren't there in the previuos frame.


It's cool. It might be useful as a 3d camera movement visualisation tool in pre-production. As a tool for recreating old cartoons in 3D it'll produce results as desirable as those ghastly coloured versions of old bw movies.


Not sure how related there are, but it looks like it could be used to do https://www.wakatoon.com


Thank you HN for showing me enough papers on "Gaussian Splating" that I was about to pick it out as the method visually from the examples.


Cool, but why? Structure from motion has applications in the real world, but this use case doesn't seem to be that compelling to me.


Will be awesome when we can watch old cartoon shows in VR and look all around the world.


This is so cool!


Amazingly weird


xkcd's yearly april fools had automatic 3d comic conversion done back in 2011: https://web.archive.org/web/20110813115522/http://chatter.re...

Kiiiiind of disappointed to not see the alley from King of the Hill, I tell you h'what.


Holy crap, can you imagine rewatching your favorite shows from different perspectives?


A VR/AR reproduction of old cartoons where you can explore a coherent 3D space would be cool.

It doesn't seem like the OP comes even close to this though.


And the events of the cartoons would play on in a timed manner, so if you're not at the main point at the right time you could miss it. That would be cool.

Not if it looks anything like this... Honestly I'd be surprised if AI could do it justice. In a shot showing one character talking, panning around to see the other characters that AI pasted into the scene wouldn't be enough. Those characters would also have to be animated and show appropriate attention/reactions to what was being said/going on.


i just want to see Steamed Hams from the perspective of the oven



This website is crap on mobile. No image can be enlarged...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: