Hacker News new | past | comments | ask | show | jobs | submit login
Video Vectorization (vectorly.io)
319 points by xanthine on July 20, 2020 | hide | past | favorite | 117 comments

This why Flash was such a nice format for some things (not going into the pitfalls of Flash - but in some ways it was fantastic).

I guess these days we have animated SVG, and https://lottiefiles.com/ is getting some traction - but these require you to export in a specific format of course, you can't just convert/trace a bitmap movie with these. And SVG or Lottie aren't designed for longer/streaming vector animations, and they don't carry synchronised or streaming audio - Flash did all of those things.

Vectorisation of bitmap images does have some artifacts, as is evident in the Simpsons demo on this website - when possible, you should export in a vector movie format directly from the vector animation software.

It is kind of depressing that we don't have an open standard for vector movies (with sound), over a decade after Flash was killed. Sometimes it feels like technology stops or moves backwards.

Actually I wonder why we should have a vector video format in the first place. Would it not be much simpler to render a series of vector images and their transformations directly to a standard video code? I mean all the modern video codecs have vector elements afaik, and most have motion prediction. One could just leave out the whole messy inference and directly tell the decoder "here is a rectangular shape, moving with that speed in this direction", no?

The primitives aren't the same for video codecs as they are for something like flash. Even the transforms you can do are fairly limited. AV1, for example, is partially complex because it allows for some scaling transforms as part of the codec.

The problem is that you could have those concepts in a video codec, but then you'd be in some sort of weird hybrid world. A general encoder would almost certainly never use most of those transforms and primitives. Meanwhile, the decoder would need to understand both to properly render anything. I'd imagine you'd frequently run into cases where the decoder fails simply because whoever wrote it didn't want to be bothered supporting the full set of features for the steam (since encoders don't employ that full feature set).

Then there is the whole danger of having a turing complete video stream format. Last thing you'd want is for someone to publish a bitcoin miner on a youtube video :)

Flash vector animations are infinitely scalable from a very tiny file size. The vectors in modern codecs are motion vectors. They don't generally have any concept of drawing primitives like splines, polygons, fills, gradients, etc., and they are raster formats so they have a fixed native pixel size.

The idea of a vector codec isn't new though, and prior attempts have been made, though I don't remember enough details to find a reference.

Just for anyone who might not be familiar, "vector" in this context refers to the graphics you would generate in a tool like Inkscape, where you can define complex geometric shapes with just a few datapoints. "Raster" graphics are what Gimp works with, where each pixel is specified (effectively) individually.

This is something I've been thinking about for a while as well, and I'll be giving a talk exactly on this subject next week:


There are already vector-graphics runtimes like WebGL and SVG that are more than capable of rendering "video", even within the html video tag, and through modern video streaming architectures like Dash and HLS (will share the link to the talk, and demo links for these things next week).

Those aren't "codecs" in the traditional sense, and I think there is an open question as to whether a "codec" is even necessary. Scrimba (https://scrimba.com/) uses "HTML" as a video codec just fine in production, and it works perfectly, and there's no "codec" per se behind it.

That said - in 2020, web architectures already exist in a way that you could easily make a "video codec" for vector graphics, and some standardization would help - thought not specifically necessary for adoption in the way it would be for a "regular" video codec which doesn't just enjoy native vector graphics runtimes like SVG or WebGL.

If you needed a file format, I honestly think that Lottie is the best option for a "file format" for vector-video, since it's open source, and mostly just based on the old Flash standard (https://www.adobe.com/devnet/swf.html)

What's missing from Lottie for it to be a "video" format would be easier integration into video streaming architectures like HLS or DASH, and that's honestly something I'd like to do as an open source project - essentially a way for video players to "play" lottie video files as one of the video options (like, on top of 1080p, 720p and 480p versions, you have the vector version as well)

From the perspective of Vectorly, we clearly understand that there's a lot of skepticism around the idea of a new "codec", and we wanted to avoid the idea that we are actually building a new codec.

Our preferred framing is this: There are already "codecs" for vector graphics that are open, and as well established as H264 is. We're just working on a converter / transcoder from raster to vector, which admittedly will always have artifacts of some kind - though you could just as easily take source vector files and stream them and transcode them to a vector codec (SVG or WebGL) without doing that kind of raster to vector conversion that we're proposing.

Good luck with the company, I think what you're doing is very cool and I can see the use case for low bandwidth conversion for educational/whiteboard video in particular.

I think you're right about current web tech being able to support this. Lottie extended to an open source vector "video" format would be awesome, with HLS/Dash streaming and especially audio (streaming in sync) too. Hope you can find a sponsor for such an open source project

Isn’t this basically the same thing which was famously used to achieve full motion video in the 1992 Amiga demo “State of the Art”¹ and improved one year later in the followup “9 Fingers”²?

1. https://www.pouet.net/prod.php?which=99 https://www.youtube.com/watch?v=J2r7-ygXOzo

2. https://www.pouet.net/prod.php?which=100 https://www.youtube.com/watch?v=tGetanBEKK8

Dunno but the still image definitely reminds me of ‘Another World’ circa ‘91


Apparently, there's not much info out there about how 9 Fingers was made. I found this [1]. A genetic algorithm takes quite a while to come up with a pic, I wonder how long it took them in 1992! [2]

Kind of related, this series of articles on the vector encoder for Another World [3].

1: http://www.mos6502.com/friday-commodore/9-fingers-the-infamo...

2: https://chriscummins.cc/s/genetics/

3: https://fabiensanglard.net/another_world_polygons/index.html

I can give you a glimpse into the past. Amiga had a very slow CPU (7 MHz) and each instruction took between 4 and 40-ish cycles. However, it had 3 special purpose co-processors. For these demos, Blitter (very fast bit copy and memory fill functionality) and Copper (change palette and even resolution on a specific vertical scan location) were essential. The technique is the same one we used to make 3D games - you would draw an outline (taking care to write a exactly single pixel on boundaries, your line-drawing routines had to be written to take that into account) and let Blitter fill this. Amiga had 6 overlay planes you could use for up to 64 colors (well, it was actually up to 4096 colors with caveats, but let's ignore that for a moment). So, if you write a filled outlines (time shifted) in those 6 planes, you'd get this nice blurred effect. My guess is they manually wrote pixel coordinates for the images, connected them with lines, filled with Blitter, repeated for each frame. It's how things were done back then. Without Internet we had a lot of free time (I was a kid in high school). Those special purpose chips were magical. At the time of IBM XT and its green on black terminals, Amiga was a space shuttle in comparison. For example, my game used to have upper part of the screen in lower resolution (where we painted vector graphics), lower part in higher (where we had game stats).

Here's a quote from Lone Starr:

> "State of the Art was traced by hand with a Genlock overlay and tool I developed. In 9 fingers the process was automatic, my program controlled the videoplayer, digitized one picture, traced it, and skipped to the next frame. For me the equipment at that time was expensive, about 150 Euros for the videoplayer (used the prize money from State of the Art), since it had to show de-interlaced pictures."

I swear I once saw a special encoder format built specifically for compressing south park episodes, but I've never been able to find it again.

Yes!!! I remember this too!!!

But it was so long ago I don’t have the link, I only responded to you to help verify that it was a real thing and you’re not dreaming

Or maybe we all are dreaming 🧐

Ok that got meta quickly

I remember RMVB being very popular for animated shows (I seem to remember Family Guy most of all). It had very good compression ratios.

I remember watching ~50 MB RMVB anime fansub releases circa 2004.

Yes, this was how I was able to watch 200 Naruto episodes before there was streaming video. Good times torrenting all these RealPlayer files on my 50KB/s DSL connection/

My network provider capped my bandwidth to 1mbps because of this for like five years in the early to mid 2000s. They said I used more bandwidth than an entire apartment building (not a small one). Very early days of mainstream pirating. It was a makeshift internet provider that used (stole, I guess) internet access from a nearby university.

Not a special encoder, but i've seen (and used) waifu-2x to upscale low-bitrate/low-resolution video.

I also swear that I've already seen a vector/cartoon optimized codec, but my google-ability are lacking.

never let prior art get in the way of a good software patent, i guess!

It just boggles my mind that the front page of a page that offers a "patented vector-transcoder converts video to a vector format, reducing bitrates" doesn't include such video on its front page.

Well, it's not at the top of the page, but they did include a short proof-of-concept video of the Simpsons at the bottom of the page.


The video always plays at a (large) fixed size... which is unfortunate for a vector data source, scaling should be relatively easy for such a file format.

(Edit: I just notice the full-screen button on the video, I really think they should be emphasizing that, since vector video should be able to stay smooth at extreme resolutions)

Resizeable link! Previous one was a bug


We're still not comfortable putting demos on our front page, for all the bugs everyone's mentioned, and other issues.

Not even sure who posted this on HN, but grateful for the attention / feedback!

It would be useful to detect browsers that don't work at all (e.g. Safari) and at least warn that it's coming. Right now it doesn't inspire much confidence...

When I press full-screen I just get the same video in top-right 25% corner and the rest filled with black.

This is from The Simpsons "Treehouse of Horror XXIII", Season 24 episode 2. This segment starts at 14:45 on the Disney+ version.

There's a relatively low-quality cut on YouTube: https://www.youtube.com/watch?v=Z3ehuHRnC4Q

That is broken on my computer :(

Seems to work best in Firefox, then Chrome has a few more glitches, and is totally broken in Safari

I suspect that they only focused on particular versions of Chrome during development because it's a proof of concept.

Then they shouldn't be making the claim that "this codec wouldn't require end-users, OEMs or browsers to install special software to enable playback of these videos." If I have to install a browser I don't normally use, then it requires installing special software.

Yeah, but it is Windows 10 on the latest chrome... pictures other people have linked look cool though.

Seems to work fine on Win 10 and latest chrome for me. I wonder if it might be related to turning on/off the GPU rendering in chrome maybe?

works on Firefox here for me

There's also a "Khan Academy"-style demo:


Tons of artifacts, low framerate.

It boggles your mind that “we are still in the early stages of developing this technology” does not include said technology on the front page? Some research projects are just not suitable to be shown on the front page at every point of their development.

There is a demo page, it's just that the link to it is buried right at the bottom. You'd think that it'd be better to just embed it on the page itself.

It boggles my mind that they slough off the showstoppers of previous attempts with throwaway lines. Nobody has any reason to believe they know what they’re doing.

This reminds me of a paper from 2005 by Daniel Sýkora et al. [1] which tries something very similar, with the specific use case of animation video. The authors describe it best in the abstract I think:

> Video Codec for Classical Cartoon Animations with Hardware Accelerated Playback

> We introduce a novel approach to video compression which is suitable for traditional outline-based cartoon animations. In this case the dynamic foreground consists of several homogeneous regions and the background is static textural image. For this drawing style we show how to recover hybrid representation where the background is stored as a single bitmap and the foreground as a sequence of vector images.

The idea of using prior knowledge about the nature of the content to decide on an encoding scheme makes intuitive sense to me, though I'm not a codec person so I don't know how feasible it would be to make these ideas into the hardware-accelerated codecs we know from other methods.

Of course, these methods would make the most sense when used directly by the animation studios during export, not as an afterthought. But I'll take what we can get.

By the way, the corpus of Sýkora's works [2] is really really impressive in my opinion. He gave a talk in my institute while I was researching methods around neural style, and his take on parametric models, paired with the quality (and speed!) of his results, really left a mark, if not to say they made me seriously question wtf I was doing there. His work is strictly tailored to a professional animation / video production setting, so it seems extremely applicable compared to the toy-like nature of neural style methods. That is not to say he doesn't know about those. His team's recent papers actually fruitfully combine the two.

[1]: https://link.springer.com/chapter/10.1007/11595755_6

[2]: https://dcgi.fel.cvut.cz/home/sykorad/

Daniel's video style transfer tech was used in this - it's really cool. https://ebsynth.com/

Domain specific video formats could be the future. They can reduce bandwidth requirements and offer features not available in pixel videos.

One good example is https://asciinema.org , which plays back terminal sessions. The text in the "video" is selectable!

Wow. Could you embed these files in a GitHub README.md, for instance, or does the playback require a specific decoder? I've long wondered how developers manage to capture snazzy terminal demos of their projects, as the Linux tools I've tried using for this purpose generate bafflingly huge files (~750mb for roughly a minute of content).

Apparently, you can after rendering the asciicast file to SVG: https://github.com/marionebl/svg-term-cli

Sadly no, people include a preview image that link to asciinema

One overlooked usecase is RDP and VNC-like concepts. Right now Appetize (App streaming service) uses raster graphics. Vector would allow high-FPS streaming of Apps, making running Apps in the cloud a realistic option.

Nah, raster encoders can do a pretty great job on vector-style data, whereas vector encoders can't do anything with raster-style data. In other words what happens when you play a video or view a photograph? Are you going to auto-detect that and switch modes? You'll end up just recreating a modern video codec.

The best thing to do is just use a modern video codec and make sure it works well with text and sharp edges.

Most modern application UI can't really be encoded as vectors, there's lots of raster stuff in there plus bitmaps. Many mobile apps right now are a big pile of PNGs and/or raster filters (like drop shadows, etc). So you end up actually needing to ship scene graphs along with bitmaps.

You can certainly leverage that, but it also increases the resource demands (and thus power demands) on the client.

I thought RDP was already a bit more intelligent than "only" sending raster frames, sending things like window dimensions and such to be rendered on the client?

RDP is probably more intelligent than most people are aware.

Try remoting into a machine over RDP and playing a youtube video occupying a large portion of the screen. You may be surprised to find that it plays nearly perfectly, even over non-ideal network conditions (wifi). Try this same exercise with VNC and you will experience a frame maybe once every other second.

I am not sure exactly the heuristics involved, but RDP is certainly switching between modes of operation based on what kind of visual information is on the screen.

RDP also works more reliably over mobile connections than VNC, in my experience

Is there market demand for this?

In a world where videos are still sent around as multi-megabyte gif files, and audio clips are still distributed with a random slideshow on YouTube, I think a lot of users aren't so bothered about efficiency - they just want the simplest thing that works.

Lots of music is most efficiently stored as MIDI, yet how many songs on iTunes are midi?

For video, raster is king because it works for everything.

We (scrimba.com) actually do something similar for e-learning within programming. And it definitely makes sense for both students and as a business.

- Our videos use about 1/100 bandwidth of a video. This matters for rendering (fans running when watching HD?) and to people who live in places with shitty ISPs like in Germany

- It is rendered and thereby crisp as a Pringle

- No codec needed as we render it as html

- When you pause a video you actually have the text there, not pixels. For programming that allows you to copy and change things!

- We store the context it was recorded in (dev environment) so you can change and run code in the "video" itself. This changes the pedagogy you can do in a video to be more interactive and hands on.

Netflix and others would be interested in mega bandwidth gains for animated content. They usually own both ends of the stream so they can deploy whatever they want.

If you could drastically reduce the size of videos, self-hosting video distribution pages for e.g. training material would suddenly become feasible, if not for individuals then for small companies.

Yep, we serve around 50.000 hours of video in a month with a similar format and have ~0 extra bandwidth fees.

If you try to do it, you'll find that self-hosting for these scenarios is surprisingly cheap. You might be able to just use YouTube, anyway.

Well, youtube is exactly not self-hosted. For one reason or another, you might really not want to have to deal with youtube (or vimeo for that matter) and it would be great to have the possibility to opt out of using such a service.

In some parts of the world mobile data is expensive compared to income and this is likely limiting consumption of services like online video.

Random link with some pricing information for Africa [1] claims prices are up from $0.50/GB.

[1] https://kenyanwallstreet.com/mobile-data-pricing-2020-report...

It's not really the same. True, MIDI won't work for say, iTunes, because it doesn't capture a lot of the information needed to recreate the audio of your average song. But for certain types of video, you can capture everything you need to recreate it faithfully with a vector format, at much higher compression rates. I think a better analogy is SVG -- which is not as popular as raster formats in the web, but it certainly has its place.

(As an aside, MIDI is a useful format in its own right, and is still alive and well in the music-making world, even if from a technical perspective it's sort of outdated in 2020.)

They've listed some uses for it — animated videos, e-learning (the likes of Khan Academy). While I doubt that animation would shift to vector video any time soon, it indeed would help e-learning a lot, speaking from the side of the content distributors (less size). The only benefit for end users might be faster streaming, but yeah, I agree that the average user can't be bothered about it.

Lots of animation is done in vector tools to start with. If this format goes anywhere it would probably be pretty easy to add exporters to Toon Boom, Animate, OpenToonz, and whatever else people are currently using - I left that industry a decade and a half ago so I'm not too up on what everyone's using right now.

It's a good point. Internet speeds are also quickly increasing in areas getting online for the first time.

IMO the biggest problem is that even for enterprise/large-scale-website scenarios, the combination of the product cost, the cost of downloading the decoder on each machine, and the increased cpu/gpu load (resulting in lower battery life) can drown out any advantage from the size reduction. The fact that it only works for a subset of video content is another problem.

Switching to H265 or AV1 is way more compelling in practice because it works for everything, it's hardware accelerated, the decoders are off-the-shelf, and the quality is significantly improved. A modern GPU can both encode and decode H265.

>Lots of music is most efficiently stored as MIDI, yet how many songs on iTunes are midi?

While storing music as its component instruments is indeed size efficient, converting existing music into MIDIs using some ML/AI algorithm is not yet effective. Vectorly seems to convert _existing_ raster videos into vector ones using some combination of computer vision algorithms.

Playback of a MIDI file also doesn't return the original sound quality, it plays the same notes with generic sampled instruments. For high quality samples of some instruments, this can be OK, but for others, it sounds terrible. As for lyrics, you have to use a speech synth... https://www.youtube.com/watch?v=bJjxy4V00r0

Reminds me of the Spaceballs demos State of the Art and Nine Fingers [1]. Released in 1993 iirc.

[1] https://www.youtube.com/watch?v=PPoYzwib7JQ

I think your technology would be useful for restoring old animated videos. Plus it would be useful for The catchy intro animations used on startups to demonstrate their technology.

Also would be nice to use when you have bad internet connection speed to watch e-learning material animations.

For e-learning you may need a hybrid M3u like Playlist approach with video for the presenter and vector graphics for the screen casts.

Manga videos would probably also compress well.

Children animated videos.

What if you would reduce the color space vectorize of ordinary video to for example 8 colors and smooth out the noise to make large flat surfaces could you compress it with vectors?

The average anime or children's cartoon won't compress well with this approach unless it's authored with it in mind - if you look at the average anime episode it's full of raster effects, layered gradients, 3d perspective, and mixed in 3DCG. Worse still the anime is typically sent to the broadcaster in an already-lossy format so the vectorizer will have to handle macroblocking and noise.

Here are two random screen captures from anime airing this season:



Suffice it to say that vectorizing this content would be difficult and the vector representation wouldn't be particularly small.

We are using a similar concept in production. Rendered videos for online education within programming. It is a limited usecase of course, but brings the same benefits: 1/100 bandwidth; crisp rendered text/content instead of rasterized; Easy editing; and the content is searchable / indexable.

Using html gives us some additional benefits in that we can combine rendered with rasterized content. We also get access to a lot more advanced functionality through the browser context.

But how do you convert a video to text? Or do you record straight to text?

We provide the recording tool, so it is recorded straight to rich data.

Woah this is incredible. Even with the increased CPU demands, this is a very large bandwidth savings.

>In practice however, DRM, streaming , analytics and ad placement also require javascript logic to function in web runtimes, so in real word settings web-video playback can and does use a non-trivial amount of CPU time.

I'm a little skeptical of this claim. No idea how much CPU is used for DRM, but I can't imagine it's on the order of multiple percentage points.

> Woah this is incredible. Even with the increased CPU demands, this is a very large bandwidth savings.

GPUs happen to be very good at drawing shapes. So there's probably a lot of room for improvement, probably even enough that these kinds of videos could be faster than decoding "normal" video (for which modern CPUs and GPUs have dedicated silicon).

This doesn't mean that I'm not entirely skeptical of this, especially because this probably will fail spectacularly for animation that isn't just vector graphics, which is most animation.

GPUs are good at drawing shapes, but that doesn't actually mean that using the GPU to rasterize high-quality polygons will be fast. In practice most production solutions for this run on a combination of CPU and GPU and use complex shaders to achieve acceptable quality. If you're rendering regular alpha-blended or opaque triangles that's very fast, but they're not going to be anti-aliased and stuff like cubic splines will need to be triangulated on the CPU or GPU. Libraries like Pathfinder and Slug are the current state of the art and they are very complicated.

In comparison, video acceleration on most modern GPUs is done using dedicated silicon that typically has its own performance budget (so it doesn't eat into the GPU time to run shaders or your compositor).

Pathfinder and its ilk would breeze through this sort of thing, and are still improving constantly too, because so comparatively little work has been done in the space.

Guessing very roughly based on the figures in https://raphlinus.github.io/rust/graphics/gpu/2020/06/13/fas... (and Pathfinder is integrating some of the techniques there), I’d estimate that piet-gpu and soon Pathfinder, running on Intel integrated graphics, could comfortably draw each frame of this Simpsons sample video in well under one millisecond (for an effective maximum of 6% GPU/CPU usage), probably even under a tenth of a millisecond (0.6%).

(Admittedly you could easily create a more complicated video which would be much more draining, e.g. paris-30k frames would struggle to hit 120fps on integrated graphics. Generally speaking it’s easier to make pathological cases in vector encodings than raster. But let’s forget about that.)

At that point, the hardware separation of video decoder and other graphics stuff (which is a strong point in general, because it’s so computationally expensive) is much less important.

You speak of Pathfinder and Slug as being “very complicated”, and indeed they are; but video decoding is hardly any less complicated, and much more computationally expensive.

Raph's technique relies on compute shaders as a core part of the processing technique. That wouldn't work on an overwhelming majority of phones these days. Pathfinder has a compute-less approach which moves the binning to the CPU, not the GPU. And we're talking about bandwidth-constrained devices. I don't think you're going to out-perform H264.

Slug is not applicable to this scene -- it makes several core operating assumptions (being able to pre-process curve data into a friendlier format, and its shape bounding boxes are well-defined and pre-processed. These hold for font / text assets but not real-time glyphs)

Video decoding is computationally expensive, but it's done by dedicated silicon on most modern GPUs in order to deliver better battery life and better performance. So is encoding. Dedicated silicon is what these other techniques have to compete with - certainly possible, but not easy.

It’s done in dedicated silicon in order to deliver better battery life and better performance, given how extremely computationally expensive it is.

Playing back vector video is simply much less expensive, as it’s doing much less work. Assuming content like the sample videos here, I would expect vector video to be immediately competitive with raster video codecs in power efficiency and performance, once you shift the bulk of the vector rendering from CPU to GPU.

> this is a very large bandwidth savings.

I'm going to need to see some evidence of that! I'm pretty sure H.265 or AV1 could encoder better quality videos at that bitrate.

Is this a net positive or a net negative as far as carbon emission go? With traditional techniques, the video can be recorded once and played "anywhere", no matter the level of detail. If you push the effort to the viewers, won't that increase overall power consumption?

Nationalization of costs, privatization of profits.

>Our first vectorized proof of concept for animations is a 17 second clip of the Simpsons located here. Keep in mind, our technology is still at a very early stage, and this is much optimization work left to be done.


There isn't a raster version to compare to, but that looks noticeably worse than what I'd expect from a raster version. There's a lot of artifacting when there's motion, and the linework looks.... off.


The khan academy one looks much better, although there's still some minor artifacting, eg. when the mouse comes close to the "O" in "O_2" changes a bit.

Most hand drawn animation wont't actually vectorize well, especially the ones from the hand drawn era, where every cel is a real painting:


Look at all that line detail. It creates subtle gradients in the final film, which don't readily transfer to vector graphics.

In this case, I think with some better pixel-based filtering before vectorization, they could've gotten a better result.

However, the much bigger problem is that the vast majority of animation uses paintings for backgrounds that aren't just solid lines and colors.

You can extend this easily and obviously by fitting gradients.

No patent should be granted for this technique on general. It is known from late 80s. Maybe some details.

Also maybe the patent office shouldn't grant the patent because "use ANN to fit x" is also obvious. Again, only for details.

The goal of a video codec is to precisely recreate the input image, and measure error metrics against an uncompressed input. I don't even see basic error metrics listed, but the visual fidelity is so noticeably terrible it would likely be absurdly off the charts.

I did a similar thing for the same motivation a while back (compress Khan Academy videos to a miniscule amount of data), and spent an hour or two on it. http://funny.computer/cloud/Endless/khanvx/player.html and came away thinking it would be exceptionally difficult to work in a general case, but it might be a small useful subset of a larger video codec.

This is not the first time people have tried to do vector video codecs, either. Remember the VSV project from University of Bath? https://www.youtube.com/watch?v=LKaixODJEmo

Hm, unless I miscalculated the bit rate they use for the Simpsons demo is about equivalent to YouTube's 240p rendition (which uses 257 kbps; they use 250 kbps).

Looking at https://www.youtube.com/watch?v=a1nmZq1KEHk at 240p I think their Simpsons rendition looks better. But that's probably because of the low resolution. Makes me wonder how well H264 would perform with the same bit rate but a higher resolution.

I think it'd be best to have a 1 to 1 comparison between the two right on the vectorly site. I'm thinking that they'd have done that if they believed they were already ready to outperform, but that's a bit of speculation.

I think their demos are pretty good. But it would be nice to offer some comparison of the bandwidth savings.

That being said, they're just starting and I'm sure they could find some fat to squeeze out of the data stream.

AFAIK MPEG-4 experimented with encoding 3d objects but it never took off. As usual for MPEG they did not specify how to get the 3d data from a scene but how to encode them, actually how to decode them, so that innovation could happen on the encoding side.

The idea is so obvious that I would be astounded if this company gets anywhere. I'd wager many research teams already attempted this and were never heard from again.

Also note that video compression is pretty impressive these days. A typical 2 hour 1080p movie compresses down to a handful of GiB. Compare that to a typical 1080p action game which is easily ten times that big, because storing all the meshes and textures takes a lot of space, it turns out.

I think the fundamentals are already in place to support direct encoding of 3 spatial dimensions (or more) using essentially the same ideas as in JPEG/MPEG:


Not sure what the pros/cons this kind of approach might bring, or if it's already being used in some areas. I find it really hard to believe that 3d modeling software ecosystem has not experimented with the idea of lossy compression. Does the loss of high frequency information cause more adverse outcomes in 3d models than in 2d models? Seems like this could be useful in games or other creative areas where mathematical precision of the final compressed artifact is not essential.

Why use computer vision when likely most of these animations come from some software that can/could output a vectorized video format?

You wouldn't have any conversion artifacts that way.

They do not, the vast amount are PNG files where the creator of the video themselves didn't have access to the vectors. It also is completely unrealistic to convince everyone to use a new video format, not to mention the gigantic amount of videos from the past (e.g. cartoon tv shows from the 90s) that no longer have any source material available and all that remains are digitalized copies.

It sounds cool but given that the demo looks like this [0] on my bog-standard Windows 10 PC with Chrome (and Firefox and Edge) I'm assuming they've still got some bugs to work out... If it's working for anyone here I'd love to see a screen capture of the proper rendering.

[o]: https://i.imgur.com/YO42u2C.png

I just got a black box on my Windows system.

Looks pretty good in Firefox on the Mac though: https://imgur.com/a/rqPgQFZ

Edit: Here it is in video form https://imgur.com/a/B0UOigX

Thank you! That video really is impressive.

Aha. Reminds me of what I saw for two weeks when I wrote an svg renderer in MATLAB. I only got it to work with a subset of the standard and was reliant on the conventions of inkscape export for parsing. You can do a lot with just beziers, circles, and straight lines.

Interesting, maybe it's missing a coordinate, could it be missing the repeating first coordinate? Surprising to see that surfaces of the same color are made up so many polygons, I guess that can be made even more efficient?


Works well on Firefox 78.0.2 on Ubuntu 20.04

As a kid, I used to trace frames from the Simpsons in Macromedia Flash (later Adobe, later discontinued) as a way of creating high-res images, so reading this article really hit home for me!

While the output of this algorithm (just like my traces) isn't as faithful to the source material as, say, H.264 is, the result looks great and has an amazing style.

This might be a great target for mobile-first webtoons.

Even their job postings on LinkedIn look like somebody anger-wrote it. For an Image Processing role, they've written no ML/DL engineers which is obvious but still, as if Computer Vision isn't any way linked to whatever it is they are trying to do with compression.

I actually love the slightly 'off' aesthetic on the simpsons video, and I think there's some interesting creative space where this algorithm is deliberately de-tuned for interesting results.

It's awesome, but it also brings back memories of the flash-animation.


The funny thing is that Flash animations where the more straight-forward way. You create vector animations and export them as such.

Obviously, this is more universal. You can take any input as opposed to something that was vectorized to begin with.

Sounds like https://github.com/fogleman/primitive but for video

Patenting algorithms is supposed to be impossible.

Companies that ignore this and patent the “system and method” for implementing algorithms are being jerks.

They're using this to get VC clout & footing in job posts since they don't have any revenue the only way of convincing people for them is to say we have a 'patent'. Almost as if a shark-tank episode wrote their pitches.

Maybe this finally brings back the crispy scaling quality of Flash videos. They seem to be released in lossy raster formats these days.

If the vectors also relatively smoothly morph in time, then monitor-side interpolation (including motion smoothing) wouldn't be needed and directors would have more control over how interpolation is done. They've complained about monitors trying to do too much. It seems pixels are becoming obsolete for video and movies.

In terms of stream format, it seems that BIFS / MPEG 4 part 11 originaly aimed at the same purpose (probably in a more efficient manner than textual SVG), isn't it ? https://en.wikipedia.org/wiki/MPEG-4_Part_11

While this is pretty cool, this naive approach will fail spectacularly for animation that isn't just vector graphics, which is most animation.

This might have a future as part of a regular video codec, being used when there's mostly vector graphics on screen (or just for those areas that are vector graphics).

I imagine is a posterization preprocessing step would make this simpler and we could have very low bandwidth "video". If this could be done in real time, it would dramatically lower the bandwidth required for two-way video chat.

Their android SDK (no release version yet) is available at their Github repo, and so are their bulk upload tools (for talking to their servers using, I guess, a pay to use API).

If I understand this correctly, will it also allow to send updates to video on the fly ? Example : Change video from X to Y time to new vectors [V1 ..] ?

Not ready yet, just hyping SVG videos for now

That's really interesting, excited to follow the project and see more!

Curious how ML would train on vectorized video instead of rasterized.

Can this technique be used to generate a CAD sketch from a photograph?

Technically, probably, although there are exisitng and likely better vector conversions for none-animated purposes.

The problem with CAD is that whilst a conversion to a rough sketch will be possible, 'CAD' implies very high accuracy or exactness, and the amount of work required to render a converted sketch accurate will likely trump that by just doing it manually in the first place.

I suppose in future some ML-aided setup that takes a drawing in AND focuses on dimensional accuracy might fare better than this video-centric solution.

This is really cool. So the future video inspiration that will be built using only the code, without shooting again. Because everything can only be made with code. Cool!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact