
Video Vectorization - xanthine
https://vectorly.io/docs/technology/
======
dharma1
This why Flash was such a nice format for some things (not going into the
pitfalls of Flash - but in some ways it was fantastic).

I guess these days we have animated SVG, and
[https://lottiefiles.com/](https://lottiefiles.com/) is getting some traction
- but these require you to export in a specific format of course, you can't
just convert/trace a bitmap movie with these. And SVG or Lottie aren't
designed for longer/streaming vector animations, and they don't carry
synchronised or streaming audio - Flash did all of those things.

Vectorisation of bitmap images does have some artifacts, as is evident in the
Simpsons demo on this website - when possible, you should export in a vector
movie format directly from the vector animation software.

It is kind of depressing that we don't have an open standard for vector movies
(with sound), over a decade after Flash was killed. Sometimes it feels like
technology stops or moves backwards.

~~~
choeger
Actually I wonder why we should _have_ a vector video format in the first
place. Would it not be much simpler to render a series of vector images and
their transformations directly to a standard video code? I mean all the modern
video codecs have vector elements afaik, and most have motion prediction. One
could just leave out the whole messy inference and directly tell the decoder
"here is a rectangular shape, moving with that speed in this direction", no?

~~~
cogman10
The primitives aren't the same for video codecs as they are for something like
flash. Even the transforms you can do are fairly limited. AV1, for example, is
partially complex because it allows for some scaling transforms as part of the
codec.

The problem is that you could have those concepts in a video codec, but then
you'd be in some sort of weird hybrid world. A general encoder would almost
certainly never use most of those transforms and primitives. Meanwhile, the
decoder would need to understand both to properly render anything. I'd imagine
you'd frequently run into cases where the decoder fails simply because whoever
wrote it didn't want to be bothered supporting the full set of features for
the steam (since encoders don't employ that full feature set).

Then there is the whole danger of having a turing complete video stream
format. Last thing you'd want is for someone to publish a bitcoin miner on a
youtube video :)

------
teddyh
Isn’t this basically the same thing which was famously used to achieve full
motion video in the 1992 Amiga demo “ _State of the Art_ ”¹ and improved one
year later in the followup “ _9 Fingers_ ”²?

1\.
[https://www.pouet.net/prod.php?which=99](https://www.pouet.net/prod.php?which=99)
[https://www.youtube.com/watch?v=J2r7-ygXOzo](https://www.youtube.com/watch?v=J2r7-ygXOzo)

2\.
[https://www.pouet.net/prod.php?which=100](https://www.pouet.net/prod.php?which=100)
[https://www.youtube.com/watch?v=tGetanBEKK8](https://www.youtube.com/watch?v=tGetanBEKK8)

~~~
0xfaded
I swear I once saw a special encoder format built specifically for compressing
south park episodes, but I've never been able to find it again.

~~~
ThePadawan
I remember RMVB being very popular for animated shows (I seem to remember
Family Guy most of all). It had very good compression ratios.

~~~
Hamuko
I remember watching ~50 MB RMVB anime fansub releases circa 2004.

~~~
shifto
Yes, this was how I was able to watch 200 Naruto episodes before there was
streaming video. Good times torrenting all these RealPlayer files on my 50KB/s
DSL connection/

~~~
dmos62
My network provider capped my bandwidth to 1mbps because of this for like five
years in the early to mid 2000s. They said I used more bandwidth than an
entire apartment building (not a small one). Very early days of mainstream
pirating. It was a makeshift internet provider that used (stole, I guess)
internet access from a nearby university.

------
emmanueloga_
It just boggles my mind that the front page of a page that offers a "patented
vector-transcoder converts video to a vector format, reducing bitrates"
doesn't include such video on its front page.

~~~
jlmorton
Well, it's not at the top of the page, but they did include a short proof-of-
concept video of the Simpsons at the bottom of the page.

[https://files.vectorly.io/demo/v0-2-simpsons-250kbps/index.h...](https://files.vectorly.io/demo/v0-2-simpsons-250kbps/index.html)

~~~
joosters
The video always plays at a (large) fixed size... which is unfortunate for a
vector data source, scaling should be relatively easy for such a file format.

(Edit: I just notice the full-screen button on the video, I really think they
should be emphasizing that, since vector video should be able to stay smooth
at extreme resolutions)

~~~
sb2702
Resizeable link! Previous one was a bug

[https://files.vectorly.io/demo/resizeable/index.html](https://files.vectorly.io/demo/resizeable/index.html)

We're still not comfortable putting demos on our front page, for all the bugs
everyone's mentioned, and other issues.

Not even sure who posted this on HN, but grateful for the attention /
feedback!

~~~
azinman2
It would be useful to detect browsers that don't work at all (e.g. Safari) and
at least warn that it's coming. Right now it doesn't inspire much
confidence...

------
black_puppydog
This reminds me of a paper from 2005 by Daniel Sýkora et al. [1] which tries
something very similar, with the specific use case of animation video. The
authors describe it best in the abstract I think:

> Video Codec for Classical Cartoon Animations with Hardware Accelerated
> Playback

> We introduce a novel approach to video compression which is suitable for
> traditional outline-based cartoon animations. In this case the dynamic
> foreground consists of several homogeneous regions and the background is
> static textural image. For this drawing style we show how to recover hybrid
> representation where the background is stored as a single bitmap and the
> foreground as a sequence of vector images.

The idea of using prior knowledge about the nature of the content to decide on
an encoding scheme makes intuitive sense to me, though I'm not a codec person
so I don't know how feasible it would be to make these ideas into the
hardware-accelerated codecs we know from other methods.

Of course, these methods would make the _most_ sense when used directly by the
animation studios during export, not as an afterthought. But I'll take what we
can get.

By the way, the corpus of Sýkora's works [2] is really really impressive in my
opinion. He gave a talk in my institute while I was researching methods around
neural style, and his take on parametric models, paired with the quality (and
speed!) of his results, really left a mark, if not to say they made me
seriously question wtf I was doing there. His work is strictly tailored to a
professional animation / video production setting, so it seems _extremely_
applicable compared to the toy-like nature of neural style methods. That is
not to say he doesn't know about those. His team's recent papers actually
fruitfully combine the two.

[1]:
[https://link.springer.com/chapter/10.1007/11595755_6](https://link.springer.com/chapter/10.1007/11595755_6)

[2]:
[https://dcgi.fel.cvut.cz/home/sykorad/](https://dcgi.fel.cvut.cz/home/sykorad/)

~~~
dharma1
Daniel's video style transfer tech was used in this - it's really cool.
[https://ebsynth.com/](https://ebsynth.com/)

------
gardaani
Domain specific video formats could be the future. They can reduce bandwidth
requirements and offer features not available in pixel videos.

One good example is [https://asciinema.org](https://asciinema.org) , which
plays back terminal sessions. The text in the "video" is selectable!

~~~
neoncontrails
Wow. Could you embed these files in a GitHub README.md, for instance, or does
the playback require a specific decoder? I've long wondered how developers
manage to capture snazzy terminal demos of their projects, as the Linux tools
I've tried using for this purpose generate bafflingly huge files (~750mb for
roughly a minute of content).

~~~
sksavant
Apparently, you can after rendering the asciicast file to SVG:
[https://github.com/marionebl/svg-term-cli](https://github.com/marionebl/svg-
term-cli)

------
bkm
One overlooked usecase is RDP and VNC-like concepts. Right now Appetize (App
streaming service) uses raster graphics. Vector would allow high-FPS streaming
of Apps, making running Apps in the cloud a realistic option.

~~~
GordonS
I thought RDP was already a bit more intelligent than "only" sending raster
frames, sending things like window dimensions and such to be rendered on the
client?

~~~
bob1029
RDP is probably more intelligent than most people are aware.

Try remoting into a machine over RDP and playing a youtube video occupying a
large portion of the screen. You may be surprised to find that it plays nearly
perfectly, even over non-ideal network conditions (wifi). Try this same
exercise with VNC and you will experience a frame maybe once every other
second.

I am not sure exactly the heuristics involved, but RDP is certainly switching
between modes of operation based on what kind of visual information is on the
screen.

~~~
ttsda
RDP also works more reliably over mobile connections than VNC, in my
experience

------
londons_explore
Is there market demand for this?

In a world where videos are still sent around as multi-megabyte gif files, and
audio clips are still distributed with a random slideshow on YouTube, I think
a lot of users aren't so bothered about efficiency - they just want the
simplest thing that works.

Lots of music is most efficiently stored as MIDI, yet how many songs on iTunes
are midi?

For video, raster is king because it works for everything.

~~~
black_puppydog
If you could drastically reduce the size of videos, self-hosting video
distribution pages for e.g. training material would suddenly become feasible,
if not for individuals then for small companies.

~~~
kevingadd
If you try to do it, you'll find that self-hosting for these scenarios is
surprisingly cheap. You might be able to just use YouTube, anyway.

~~~
black_puppydog
Well, youtube is exactly _not_ self-hosted. For one reason or another, you
might really not want to have to deal with youtube (or vimeo for that matter)
and it would be great to have the possibility to opt out of using such a
service.

------
royjacobs
Reminds me of the Spaceballs demos State of the Art and Nine Fingers [1].
Released in 1993 iirc.

[1]
[https://www.youtube.com/watch?v=PPoYzwib7JQ](https://www.youtube.com/watch?v=PPoYzwib7JQ)

------
acd
I think your technology would be useful for restoring old animated videos.
Plus it would be useful for The catchy intro animations used on startups to
demonstrate their technology.

Also would be nice to use when you have bad internet connection speed to watch
e-learning material animations.

For e-learning you may need a hybrid M3u like Playlist approach with video for
the presenter and vector graphics for the screen casts.

Manga videos would probably also compress well.

Children animated videos.

What if you would reduce the color space vectorize of ordinary video to for
example 8 colors and smooth out the noise to make large flat surfaces could
you compress it with vectors?

~~~
kevingadd
The average anime or children's cartoon won't compress well with this approach
unless it's authored with it in mind - if you look at the average anime
episode it's full of raster effects, layered gradients, 3d perspective, and
mixed in 3DCG. Worse still the anime is typically sent to the broadcaster in
an already-lossy format so the vectorizer will have to handle macroblocking
and noise.

Here are two random screen captures from anime airing this season:

[https://pbs.twimg.com/media/EdP8gc4WAAAbkWy?format=jpg&name=...](https://pbs.twimg.com/media/EdP8gc4WAAAbkWy?format=jpg&name=large)

[https://pbs.twimg.com/media/EdPwGmyXoAIjlGw?format=jpg&name=...](https://pbs.twimg.com/media/EdPwGmyXoAIjlGw?format=jpg&name=large)

Suffice it to say that vectorizing this content would be difficult and the
vector representation wouldn't be particularly small.

------
laserpistus
We are using a similar concept in production. Rendered videos for online
education within programming. It is a limited usecase of course, but brings
the same benefits: 1/100 bandwidth; crisp rendered text/content instead of
rasterized; Easy editing; and the content is searchable / indexable.

Using html gives us some additional benefits in that we can combine rendered
with rasterized content. We also get access to a lot more advanced
functionality through the browser context.

~~~
bufferoverflow
But how do you convert a video to text? Or do you record straight to text?

~~~
laserpistus
We provide the recording tool, so it is recorded straight to rich data.

------
vhiremath4
Woah this is incredible. Even with the increased CPU demands, this is a very
large bandwidth savings.

>In practice however, DRM, streaming , analytics and ad placement also require
javascript logic to function in web runtimes, so in real word settings web-
video playback can and does use a non-trivial amount of CPU time.

I'm a little skeptical of this claim. No idea how much CPU is used for DRM,
but I can't imagine it's on the order of multiple percentage points.

~~~
chmod775
> Woah this is incredible. Even with the increased CPU demands, this is a very
> large bandwidth savings.

GPUs happen to be very good at drawing shapes. So there's probably a lot of
room for improvement, probably even enough that these kinds of videos could be
faster than decoding "normal" video (for which modern CPUs and GPUs have
dedicated silicon).

This doesn't mean that I'm not entirely skeptical of this, especially because
this probably will fail spectacularly for animation that isn't just vector
graphics, which is most animation.

~~~
kevingadd
GPUs are good at drawing shapes, but that doesn't actually mean that using the
GPU to rasterize high-quality polygons will be fast. In practice most
production solutions for this run on a combination of CPU and GPU and use
complex shaders to achieve acceptable quality. If you're rendering regular
alpha-blended or opaque triangles that's very fast, but they're not going to
be anti-aliased and stuff like cubic splines will need to be triangulated on
the CPU or GPU. Libraries like Pathfinder and Slug are the current state of
the art and they are very complicated.

In comparison, video acceleration on most modern GPUs is done using dedicated
silicon that typically has its own performance budget (so it doesn't eat into
the GPU time to run shaders or your compositor).

~~~
chrismorgan
Pathfinder and its ilk would _breeze_ through this sort of thing, and are
still improving constantly too, because so comparatively little work has been
done in the space.

Guessing very roughly based on the figures in
[https://raphlinus.github.io/rust/graphics/gpu/2020/06/13/fas...](https://raphlinus.github.io/rust/graphics/gpu/2020/06/13/fast-2d-rendering.html)
(and Pathfinder is integrating some of the techniques there), I’d estimate
that piet-gpu and soon Pathfinder, running on Intel integrated graphics, could
comfortably draw each frame of this Simpsons sample video in well under one
millisecond (for an effective maximum of 6% GPU/CPU usage), probably even
under a tenth of a millisecond (0.6%).

(Admittedly you could easily create a more complicated video which would be
much more draining, e.g. paris-30k frames would struggle to hit 120fps on
integrated graphics. Generally speaking it’s easier to make pathological cases
in vector encodings than raster. But let’s forget about that.)

At that point, the hardware separation of video decoder and other graphics
stuff (which _is_ a strong point in general, because it’s so computationally
expensive) is much less important.

You speak of Pathfinder and Slug as being “very complicated”, and indeed they
are; but video decoding is hardly any less complicated, and _much_ more
computationally expensive.

~~~
kevingadd
Video decoding is computationally expensive, but it's done by dedicated
silicon on most modern GPUs in order to deliver better battery life and better
performance. So is encoding. Dedicated silicon is what these other techniques
have to compete with - certainly possible, but not easy.

~~~
chrismorgan
It’s done in dedicated silicon in order to deliver better battery life and
better performance, _given how extremely computationally expensive it is_.

Playing back vector video is simply much less expensive, as it’s doing much
less work. Assuming content like the sample videos here, I would expect vector
video to be _immediately_ competitive with raster video codecs in power
efficiency and performance, once you shift the bulk of the vector rendering
from CPU to GPU.

------
gruez
>Our first vectorized proof of concept for animations is a 17 second clip of
the Simpsons located here. Keep in mind, our technology is still at a very
early stage, and this is much optimization work left to be done.

>[https://files.vectorly.io/demo/v0-2-simpsons-250kbps/index.h...](https://files.vectorly.io/demo/v0-2-simpsons-250kbps/index.html)

There isn't a raster version to compare to, but that looks noticeably worse
than what I'd expect from a raster version. There's a lot of artifacting when
there's motion, and the linework looks.... off.

>[https://files.vectorly.io/demo/khan-20kbps/index.html](https://files.vectorly.io/demo/khan-20kbps/index.html)

The khan academy one looks much better, although there's still some minor
artifacting, eg. when the mouse comes close to the "O" in "O_2" changes a bit.

~~~
pixelhorse
Most hand drawn animation wont't actually vectorize well, especially the ones
from the hand drawn era, where every cel is a real painting:

[https://caseantiques.com/item/lot-738-the-simpsons-
animation...](https://caseantiques.com/item/lot-738-the-simpsons-animation-
cel-homer-bart-2/)

Look at all that line detail. It creates subtle gradients in the final film,
which don't readily transfer to vector graphics.

In this case, I think with some better pixel-based filtering before
vectorization, they could've gotten a better result.

However, the much bigger problem is that the vast majority of animation uses
paintings for backgrounds that aren't just solid lines and colors.

~~~
AstralStorm
You can extend this easily and obviously by fitting gradients.

No patent should be granted for this technique on general. It is known from
late 80s. Maybe some details.

Also maybe the patent office shouldn't grant the patent because "use ANN to
fit x" is also obvious. Again, only for details.

------
fefe23
AFAIK MPEG-4 experimented with encoding 3d objects but it never took off. As
usual for MPEG they did not specify how to get the 3d data from a scene but
how to encode them, actually how to decode them, so that innovation could
happen on the encoding side.

The idea is so obvious that I would be astounded if this company gets
anywhere. I'd wager many research teams already attempted this and were never
heard from again.

Also note that video compression is pretty impressive these days. A typical 2
hour 1080p movie compresses down to a handful of GiB. Compare that to a
typical 1080p action game which is easily ten times that big, because storing
all the meshes and textures takes a lot of space, it turns out.

~~~
bob1029
I think the fundamentals are already in place to support direct encoding of 3
spatial dimensions (or more) using essentially the same ideas as in JPEG/MPEG:

[https://en.wikipedia.org/wiki/Discrete_cosine_transform#3-D_...](https://en.wikipedia.org/wiki/Discrete_cosine_transform#3-D_DCT-
II_VR_DIF)

Not sure what the pros/cons this kind of approach might bring, or if it's
already being used in some areas. I find it really hard to believe that 3d
modeling software ecosystem has not experimented with the idea of lossy
compression. Does the loss of high frequency information cause more adverse
outcomes in 3d models than in 2d models? Seems like this could be useful in
games or other creative areas where mathematical precision of the final
compressed artifact is not essential.

------
cphoover
Why use computer vision when likely most of these animations come from some
software that can/could output a vectorized video format?

You wouldn't have any conversion artifacts that way.

~~~
mattigames
They do not, the vast amount are PNG files where the creator of the video
themselves didn't have access to the vectors. It also is completely
unrealistic to convince everyone to use a new video format, not to mention the
gigantic amount of videos from the past (e.g. cartoon tv shows from the 90s)
that no longer have any source material available and all that remains are
digitalized copies.

------
katmannthree
It sounds cool but given that the demo looks like this [0] on my bog-standard
Windows 10 PC with Chrome (and Firefox and Edge) I'm assuming they've still
got some bugs to work out... If it's working for anyone here I'd love to see a
screen capture of the proper rendering.

[o]: [https://i.imgur.com/YO42u2C.png](https://i.imgur.com/YO42u2C.png)

~~~
hackstack
I just got a black box on my Windows system.

Looks pretty good in Firefox on the Mac though:
[https://imgur.com/a/rqPgQFZ](https://imgur.com/a/rqPgQFZ)

Edit: Here it is in video form
[https://imgur.com/a/B0UOigX](https://imgur.com/a/B0UOigX)

~~~
katmannthree
Thank you! That video really is impressive.

------
wodenokoto
As a kid, I used to trace frames from the Simpsons in Macromedia Flash (later
Adobe, later discontinued) as a way of creating high-res images, so reading
this article really hit home for me!

While the output of this algorithm (just like my traces) isn't as faithful to
the source material as, say, H.264 is, the result looks great and has an
amazing style.

This might be a great target for mobile-first webtoons.

------
villgax
Even their job postings on LinkedIn look like somebody anger-wrote it. For an
Image Processing role, they've written no ML/DL engineers which is obvious but
still, as if Computer Vision isn't any way linked to whatever it is they are
trying to do with compression.

------
fredley
I actually love the slightly 'off' aesthetic on the simpsons video, and I
think there's some interesting creative space where this algorithm is
deliberately de-tuned for interesting results.

------
okaleniuk
It's awesome, but it also brings back memories of the flash-animation.

~~~
cypressious
Exactly.

The funny thing is that Flash animations where the more straight-forward way.
You create vector animations and export them as such.

Obviously, this is more universal. You can take any input as opposed to
something that was vectorized to begin with.

------
zcw100
Sounds like
[https://github.com/fogleman/primitive](https://github.com/fogleman/primitive)
but for video

------
sneak
Patenting algorithms is supposed to be impossible.

Companies that ignore this and patent the “system and method” for implementing
algorithms are being jerks.

~~~
villgax
They're using this to get VC clout & footing in job posts since they don't
have any revenue the only way of convincing people for them is to say we have
a 'patent'. Almost as if a shark-tank episode wrote their pitches.

------
rhn_mk1
Maybe this finally brings back the crispy scaling quality of Flash videos.
They seem to be released in lossy raster formats these days.

------
tabtab
If the vectors also relatively smoothly morph in time, then monitor-side
interpolation (including motion smoothing) wouldn't be needed and directors
would have more control over how interpolation is done. They've complained
about monitors trying to do too much. It seems pixels are becoming obsolete
for video and movies.

------
karteum
In terms of stream format, it seems that BIFS / MPEG 4 part 11 originaly aimed
at the same purpose (probably in a more efficient manner than textual SVG),
isn't it ?
[https://en.wikipedia.org/wiki/MPEG-4_Part_11](https://en.wikipedia.org/wiki/MPEG-4_Part_11)

------
chmod775
While this is pretty cool, this naive approach will fail spectacularly for
animation that isn't just vector graphics, which is most animation.

This might have a future as _part_ of a regular video codec, being used when
there's mostly vector graphics on screen (or just for those areas that are
vector graphics).

------
bane
I imagine is a posterization preprocessing step would make this simpler and we
could have _very_ low bandwidth "video". If this could be done in real time,
it would dramatically lower the bandwidth required for two-way video chat.

------
xanthine
Their android SDK (no release version yet) is available at their Github repo,
and so are their bulk upload tools (for talking to their servers using, I
guess, a pay to use API).

------
nautical
If I understand this correctly, will it also allow to send updates to video on
the fly ? Example : Change video from X to Y time to new vectors [V1 ..] ?

------
villgax
Not ready yet, just hyping SVG videos for now

------
vslira
That's really interesting, excited to follow the project and see more!

------
jcims
Curious how ML would train on vectorized video instead of rasterized.

------
imtringued
Can this technique be used to generate a CAD sketch from a photograph?

~~~
detritus
Technically, probably, although there are exisitng and likely better vector
conversions for none-animated purposes.

The problem with CAD is that whilst a conversion to a rough sketch will be
possible, 'CAD' implies very high accuracy or exactness, and the amount of
work required to render a converted sketch accurate will likely trump that by
just doing it manually in the first place.

I suppose in future some ML-aided setup that takes a drawing in AND focuses on
dimensional accuracy might fare better than this video-centric solution.

------
iworkfromhome
This is really cool. So the future video inspiration that will be built using
only the code, without shooting again. Because everything can only be made
with code. Cool!

