
Show HN: From Markdown to Video - adzicg
https://www.videopuppet.com/docs/script/
======
gregmac
From a technology point of view, this is really cool.

From the view of someone that occasionally watches videos on YouTube, I am
trying to figure out a nice way to say... I hate it. Or more specifically, I
hate that it generates the voice, and basically enables video content spam.

What we don't need more of is cheap, easy to automatically generate videos
that are basically spam and/or clickbait, trying to get views. The problem
with auto-generated voices in videos like this is as a viewer I can't
distinguish between work that someone put deliberate production time into, and
something churned out by a content farm. The demo video even tricked me at
first, I didn't realize it was a generated voice until a couple sentences in,
at which point I had a visceral negative reaction, the same as when I
accidently click on a content farm-generated video.

It seems a major feature is automatically syncing the narration to the slides.
Perhaps a way to enhance this while avoiding spam generation is to use the
generated voice only for internal timing, and generated a karaoke-like display
for a narrator (human) to read? As a paid service, you could even provide
professional voice-offers as an add-on.

~~~
hobofan
> The problem with auto-generated voices in videos like this is as a viewer I
> can't distinguish between work that someone put deliberate production time
> into, and something churned out by a content farm.

If machine voiced vs human voiced is the only discernible difference in the
end, this seems like a non-argument.

As someone that is building a tool in the roughly same space (machine voiced
video generation), I can just say that the use-cases go far beyond "content-
farm". It also enables a lot of useful content like e.g. internal training
videos, or paired with browser automation, you can have narrated always up-to-
date video manuals of your product. In the education space, it enables a more
iterative way to produce material where you previously couldn't afford to
tweak parts of a video, as you would have to narrate it again.

And I also don't think that it will amplify the existence of such videos
significantly. There are already Youtube channels that already do just that,
and people don't seem to mind. E.g. there is a channel that uploads "car news"
content, which basically just has a narration on top of a series of pictures
of a car, and the amount of views and the rating on those videos is pretty
good. In the end its just a few fact bulletins stretched into an overly long
video using the same old worn out phrases (just like regular "car news"), and
I don't see why a human would need to waste their time to voice that.

~~~
gregmac
>> The problem with auto-generated voices in videos like this is as a viewer I
can't distinguish between work that someone put deliberate production time
into, and something churned out by a content farm.

> If machine voiced vs human voiced is the only discernible difference in the
> end, this seems like a non-argument.

The problem is getting to the end -- I don't want to spend several minutes
trying to decide if it's spam or useful. It's simply easier and safer for me
to use "contains auto-generated voice" as a filter to avoid watching garbage.
Specifically I'm talking about videos like the ones discussed in this video
[1].

Though I'd generally agree that good quality content is good quality content,
I personally think there's something lost by using a machine-generated voice.
Good human narrators add nuance and emphasis and energy, and it's much more
interesting when someone is passionate or excited about the topic they're
talking about and you can hear that come through.

Some humans are bad narrators, of course, and the machine-generated voice may
not be worse by comparison. The problem is I'd just rather not listen to an
emotionless voice -- whether it's machine-generated or human -- read a script,
I'd rather just _read_ it myself.

Maybe I'm wrong and the generated voices are much better than I've heard (any
examples?) but I think part of the problem remains in that unless I'm forced
to watch (eg, internal training) or have a recommendation come from someone I
trust, it's still safer to filter out videos with machine-generated voice as
"probably spam/garbage".

> it enables a more iterative way to produce material where you previously
> couldn't afford to tweak parts of a video, as you would have to narrate it
> again

I think this is a very compelling feature, but as a potential consumer of
these videos (either accidentally on youtube or forced via internal training)
I wish someone would come up with a way to enable this without having to
resort to using the emotionless robot voice.

This again could just be my personal preference: I think emotionless robot
voice is pretty much going to always mean somewhere between low- and mediocre-
quality video, and I also think a low quality video is significantly worse
than just having an easily-updatable HTML/PDF/whatever document with
pictures/screenshots/diagrams as appropriate.

[1]
[https://www.youtube.com/watch?v=1PGm8LslEb4](https://www.youtube.com/watch?v=1PGm8LslEb4)

~~~
mgkimsal
> Good human narrators add nuance and emphasis and energy, and it's much more
> interesting when someone is passionate or excited about the topic they're
> talking about and you can hear that come through.

And... there are some humans tasked with making videos for others and they're
just really bad. Again, internal/training videos, etc, done by people without
much passion for, or even knowledge of, the task they're training you on. I
prefer machine generated voice in those cases, or perhaps even some sort of
subtitling that could be piped to the TTS engine of my choice.

------
jez
I haven't gotten the chance to try it out yet, but an alternative in this
space is Komposition, which bills itself as "a video editor built for
screencasters". I gather that mostly means that if you take certain liberties
when recording your screen and voice (putting pauses in the right places),
Komposition will take care of automatically splitting your input media based
on when it determines a transition.

[https://owickstrom.github.io/komposition/](https://owickstrom.github.io/komposition/)

Slightly different aim compared to Video Puppet (the source being plain text
is not the goal, which means you will likely have to edit and re-record a
script multiple times) but still interesting, especially you'd rather avoid an
auto-generated voice.

~~~
adzicg
you can easily replace auto-generated voice with your own, or a professional
recording in Video Puppet scripts. Just add (audio: file.wav) to your scene.

------
kickscondor
Related: A very basic prototype I’ve been working on, but for video zines
[https://kickscondor.com/slaptrash](https://kickscondor.com/slaptrash)

Seems you could do something along these lines to avoid the video generation
part.

~~~
flanbiscuit
This is amazing! I'm going to have a lot of fun with this. I would love to be
able to save these as videos but I guess I could just use my mac's screen
record functionality

edit: I'm going down a rabbit hole looking through your site. Digging the
twisted early internet aesthetic.

~~~
kickscondor
Hey thankyou flanbiscuit! Got a web link for yourself so we can be friends?

------
worldofchris
Getting a real kick out of using Video Puppet. The idea of creating a video
from assets and a script is not a new one, I first saw it in the context of
Real Estate at a Kaltura conference back in 2012:

[https://connect.mediaspace.kaltura.com/media/Automated+Video...](https://connect.mediaspace.kaltura.com/media/Automated+Video+Slide-
show+Creation+with+Node.js+and+Kaltura/1_ji7bdc6a/16039151)

The existing tools for doing this sort of thing seem to either require quite a
bit of programming / video skills e.g. Media Lovin' Toolkit, ffmpeg, sox,
jimp, ImageMagick etc or they are templated / opinionated tools like
[https://www.magisto.com/](https://www.magisto.com/)

What I love about Video Puppet is that it provides a simple and easy to use
set of tools and an API that through GitHub actions allows you to put version
control and early/often feedback loops at the heart of your projects.

I'm using it to document the development story and back story of an Indie
Video Game I'm working on. Previously I was doing it as a Google doc which I
was sharing with my collaborators.

With Video Puppet, it requires little more overhead - I was writing this stuff
already - but when I see and hear the results played back I can immediately
see whether the story makes sense or not. I can see if I am jumping into
talking about something I haven't set up properly or if I am trying to say too
much.

One thing that would help me is to get feedback on fails in the markdown
script quicker, before even pushing to GitHub. For code, including things like
Terraform, I'd use a linter, or CircleCI has a validator tool you can run
locally.

The other place I'm going to start using it is for describing defects in a
product I am coaching a team on. Previously I would do a screen cap and then
upload that to frame.io. Now I can do the screen cap, describe the problem and
stick the whole lot into version control with a bunch of github actions to
point the team to the resulting video.

I will be following this product closely and actively using it.

Greak work Gojko!

------
tomatohs
I'm building the reverse, video to markdown. Paircast combines screen
recording, voice transcriptions, and code changes into a markdown guide.
[http://paircast.io](http://paircast.io)

~~~
yoz
Wow, that is FANTASTIC. I've not tried it yet, but it looks like a very
approachable execution of a brilliant idea. I'm a DevRel who's fascinated by
DX and I WANT THIS.

It's a shame it doesn't also capture the code's output and, ideally, the state
of the interpreter. For example: at 4:45 in the demo video, he tries to run
his code and it fails with an error. It's important for both coding tutorials
and DX analysis to capture the text of the output/error.

What would be even better would be capturing the error _and_ the detailed
stack trace, ideally with the state of each stack frame. My employer produces
SDKs for different languages, so it'd be invaluable for debugging.

I can imagine a couple of different ways of doing this which might not be
horrifically complicated to add to the Paircast recorder, though I suspect
you're already going down this road. If you'd like to chat more, yell!

------
adzicg
Just completed full support for scripting videos as Markdown files using Video
Puppet. Check out the post for some basic info. For more examples, see
[https://github.com/videopuppet/examples](https://github.com/videopuppet/examples)

~~~
capableweb
It looks pretty amazing, trying it out right now.

In the meantime, could you write a bit what different pieces of
technology/services you're using to build all this?

~~~
adzicg
sure. the video conversion is running on AWS Fargate, with bits and pieces
running on AWS Lambda. The speech synthesis is either Amazon Polly (neural
voices) or Google Cloud Text to Speech (Wavenet).

Under the hood, the conversion system is using Chrome headless to generate
slides, render markdown and provide syntax highlighting. Most of the video and
audio processing is with FFMpeg and SOX.

------
tdalaa
Looks really really cool, but to really show the power of this, they should
share the full example from their Web site landing page.

------
hombre_fatal
Some feedback on landing page:

\- Make the sample script response header "Content-Type: text/plain" so that
it renders in the browser instead of downloading a file.

\- Make the sample video demonstrate the three features it says it has, like
image captions.

------
formalsystem
I love this. I've been messing around with Premiere Pro and Audacity for the
past couple of days trying to get more into making video. Video puppet looks
way easier to debug and collaborate on since scrolling back in forth in your
video looking for stuff gets very tedious very quickly.

Is there any way I can add my own voice and then still write the words that I
want my voice to say?

~~~
adzicg
Video Puppet integrates with several voice synthesis services, including
Amazon Polly, which offers custom branded voices [1].

You could create a custom brand voice with Amazon, and we can then integrate
it into Video Puppet.

\- [https://aws.amazon.com/about-aws/whats-new/2020/02/amazon-
po...](https://aws.amazon.com/about-aws/whats-new/2020/02/amazon-polly-
launches-brand-voice/)

------
slobodan_
VideoPuppet is excellent. I am using it to create videos for the Five Minutes
Serverless Youtube channel, and so far, results are outstanding. I can create
a video from the markdown file really fast.

------
tir-kaval
If this type of application interests you, have a look at:

[https://savannah.nongnu.org/projects/kinetophone](https://savannah.nongnu.org/projects/kinetophone)

It is an application/shared library for Linux, released as free software. It
has a GUI program for live narration and one, "Vox", for creating video from
PDF or still images using speech synthesis (Festival).

[http://download-
mirror.savannah.gnu.org/releases/kinetophone...](http://download-
mirror.savannah.gnu.org/releases/kinetophone/Kinetophone_vox_tutorial.pdf)

The Kinetophone shared library could be used as a plug in for presentation
software. Kinetophone's file format is XML. I haven't updated it for years,
and it does require occasional patches to support the latest FFMPEG. It was
originally a commercial application for OS X called Ishmael, back in about '07
which I ported to Linux after my company went out of business.

------
mauricesvay
I remember doing something similar for real estate a few years ago. Could be
an interesting segment for you?

------
peter1125
I think this would be great for all the professors/teachers who suddenly have
to teach courses online. If the lecture can be made beforehand, then the
teacher can just focus on addressing questions or problems on zoom/skype(or
whatever platform is used for teaching online)

------
fudged71
I'm trying to imagine all the useful things you could do with code generated
videos.

I'm imagining a daily routine of airplaying the video to your TV with an
annotated dashboard of quantified self metrics, weather forecast, plotted
local Covid-19 cases, health advisories, etc.

~~~
majkinetor
For all those stuff, video is useless and prevents having benefits regular
dashboard would.

Only naration is useful form of presentation but you don't need this tech to
do so.

~~~
fudged71
I don't know about that. If I'm making breakfast and can see a screen, both
narration and visuals are useful. I think that's why Google and Amazon came
out with device assistants that have screens

~~~
majkinetor
If you are making breakfast and have a system that rotates dashboards (such as
Grafana for example), that is exactly the same as having a video from the
viewing angle, but otherwise better, since you can still come from the room
and do some interactive stuff not possible if it were only on video.

So, video is only limiting you, nothing else.

I can imagine this to be used alike to PDF in specific contexts - if you need
100% guarantiee that local devices/viewers/etc wont change any output detail.

------
simbas
I am a tech writer and I write tasks and procedures using DITA-XML. I was
thinking about transforming my .dita files to .mlt to use in shortcut/melt,
but I think I'm going to use this instead.

~~~
adzicg
Video Puppet can also process YAML and JSON files, so if you are running an
automated conversion from XML, it might be easier to output JSON instead of
Markdown; in any case, should you need help, drop me an email at
gojko@neuri.co.uk

------
bArray
Yes yes yes! I was literally thinking to implement this myself, but didn't
have time. It's a shame doesn't appear to be open source though - I might
still end up creating one.

------
npollock
I could see this being really useful for creating product onboarding video
tutorials - wondering if there's an ability to preview and edit/adjust before
exporting the final video?

~~~
adzicg
What kind of preview are you thinking about?

Building a full video is fairly quick compared to traditional editing tools,
so I haven't built any faster preview yet. I usually just build the whole
thing and look at it, then tweak the script and build it again.

You can easily upload just the script file into an existing project and re-
build the video as many times you like, then download the version you are
happy with at the end.

~~~
npollock
I was thinking about adjusting parameters like: background music volume,
segment length, pauses, etc.

I'll give it a try, perhaps its so fast that previews aren't really necessary.

I also think this could be an amazing tool for personalizing video marketing
too

~~~
adzicg
I'm planning to build a visual interface that would allow users to preview
individual scenes. Meanwhile, if the main video build is too long because you
have lots of scenes and want to check out how one would look with different
parameters, just create a different script file with that scene only and build
it. that's the beauty of text file editing, you can easily copy and paste and
experiment.

------
aquark
This looks cool. Something I have looked for on and off without create success
would be a fully scriptable NLE

Something like this that would support simple fades, transitions, and maybe
animation. The kind of stuff you can do fairly easily in a video editor, but
with lots of fiddly clicks and zooming in and out of timelines.

I'd like to have a script that let me specify when different source media
start, when to apply effects, etc. All written as a basic text file.

Anything obvious out there I've missed?

~~~
adzicg
simple transitions are already supported, and I plan to add a lot more
transitions soon.

you can set transitions globally in the document header, or on individual
scenes. for example, just add

    
    
        (transition: crossfade 0.2)
    

for a 0.2sec cross-fade transition between scenes

video segments (different source media start) are also supported. You can do
something like:

    
    
        (video:
          file: stopwatch.mp4
          segment: 00:02 - 00:04)
    

Check out video and transition sections here for more info:
[https://videopuppet.com/docs/format/](https://videopuppet.com/docs/format/)

------
pomber
Video Puppet + MDX Deck [1] would be a dream.

[1]: [https://github.com/jxnblk/mdx-deck](https://github.com/jxnblk/mdx-deck)

------
gavinray
Wow, I may be biased because this fills a particular niche usecase for me, but
this is truly incredible.

I can't stand hearing the sound of my own voice, but do a lot of tutorial
content production in Markdown for guides for learning material.

This would allow me to re-use all of the existing material I have, which
already includes detailed step-by-step screenshots and text instructions, to
make voice-over videos with slides and publish to Youtube. Amazing!

------
spalas
Super cool!

I created a single video from text using python
([https://www.youtube.com/watch?v=7CIakJ8PMZs](https://www.youtube.com/watch?v=7CIakJ8PMZs)
// [https://github.com/sidpalas/devops-directive-hello-
world](https://github.com/sidpalas/devops-directive-hello-world)), but this is
next level!

I'm excited to try it out.

------
DagAgren
I'm not sure I see why you would want to base this on Markdown. Markdown is
designed for a very specific niche, and this falls far outside that niche.

It seems it would make a lot more sense to just design the language from
scratch, rather than try to bend Markdown to do something it was not at all
meant for.

For instance, why would you WANT to have an example like this:

![](london.jpg)

Welcome to London

\---

![](berlin.jpg)

Welcome to Berlin

~~~
hn_throwaway_99
I totally disagree, and I have the exact opposite reaction. Markdown is
something tons of people know already. I literally just glanced over the
article and felt I could generate a "narrated PowerPoint", which seems like
the main purpose of this, extremely quickly. Why would I want to learn a
completely new language because there are some trivially minor syntax oddities
with using Markdown?

The perfect is the enemy of the good.

~~~
adzicg
btw, for narrated powerpoint, you can actually use Video Puppet directly with
Powerpoint files - just put narration into speaker notes. Here's more info on
that:
[https://videopuppet.com/docs/powerpoint/](https://videopuppet.com/docs/powerpoint/)

~~~
edoceo
Wow, ace feature right here. Sales team about to fall in love with you.

------
grantlmiller
Looks cool, love the idea of using version control for video content. No
mention of pricing that I could find easily.

~~~
capableweb
It's in the footer,
[https://videopuppet.com/docs/pricing/](https://videopuppet.com/docs/pricing/)

> This application is currently in beta version. While in beta, the
> application is free, and allows anyone to upload assets up to 25 MB. We will
> announce commercial pricing later, when the full version becomes ready. For
> now, experiment as much as you like!

------
stjo
I don't know if technically possible, but live preview would be really cool.
Maybe javascript rendering to canvas without the roundtrip to the server for
encoding to mp4.

Easier bulk upload / upload with curl / python requests is needed IMO.

------
cochne
Wow, really nice! I will try using this as an extra resource for teaching my
coding classes.

~~~
adzicg
thanks. that's one of my use cases as well, so as an extra tip, you can
generate code snippets over video or images just by adding a fenced code
block. eg

\---

![](background.jpg)

```js

//anything here will be rendered as a slide on top of the background, with
javascript highligthing

```

\---

There's a full example here:
[https://github.com/videopuppet/examples/tree/master/slides](https://github.com/videopuppet/examples/tree/master/slides)

------
mappu
Another way to make it seem more real would be to render a virtual webcam
overlay of a talking head, using UE4 or something. Maybe with an office-style
background.

------
alpb
Nice and simple but there are videos that are easily made with hand-writing
effect using [https://www.videoscribe.co/](https://www.videoscribe.co/). For
example,
[https://www.youtube.com/watch?v=MiybniIIvx0](https://www.youtube.com/watch?v=MiybniIIvx0)
is entirely made in software. These are conceptually similar, but obviously
yours is more text-oriented and minimalistic.

------
0x006A
please delete all those videos on the internet and just post the markdown. the
only use case is spam.

------
mitchtbaum
Great

