
Writing a video chat application from the ground up, part 1 - kimburgess
https://bengarney.com/2016/06/25/video-conference-part-1-these-things-suck/
======
perplexes
This is wonderful. I read the whole series in one sitting. It actually made
video codecs feel way more approachable, rather than some patented black box
magic I'll never understand.

It also reminded me of a recent article talking about how you can break audio
codecs by guessing which quantizer was used by the packet, then using it in
reverse to produce speech! Which I suppose is obvious in retrospect, that
lossy codecs are trying to compress data by making it perceptually similar,
whatever the domain.

I also appreciated the ties to video game networking. Gaffer on Games has had
a long-running series on designing multiplayer networking protocols with UDP
and you two approach bit-shaving very similarly (unsurprisingly I suppose -
it's a very specific process with its own tools).

Anyway, thank you! I learned a lot.

~~~
bengarney
It was a blast to write. Glenn is a smart guy with some great content around
game networking. There are good ways to do networking for games and other
real-time applications and TCP isn't really one of them.

~~~
Keyframe
What are your thoughts on ENet?

~~~
bengarney
Good not great.

~~~
corysama
Anything better than ENet I could investigate?

~~~
bengarney
Gaffer on Games, OpenTNL are good places to look.

------
munificent
My absolute favorite kind of writing is the kind that leaves me itching to try
coding something myself since it now seems so much more approachable than it
did before. Though I knew a bit about encoding, I never would have thought to
build something like this, and yet now I find myself wishing I could carve out
some time to try.

Great job!

~~~
bengarney
Thanks! I really appreciate it.

------
Sanddancer
If I may, I'd like to inject a plea for sanity here. Please, please, pretty
please with sugar on top, don't reinvent the wheel when it comes to chat
protocols. Right now, on my desktop, I have 5 different windows open dedicated
to various chat networks and chat protocols. I have steam chat, a window with
my irssi session, a pidgin session with connections to a slack session and an
aim session, a telegram session, and a skype session. Yes, implementing
someone else's protocols is complicated and sometimes painful, but when you
end up needing a half dozen different /types/ of connections, there is
something wrong and broken with how we're approaching the whole talk to other
people thing.

~~~
proksoup
Your problem statement is accurate.

I work on a chat product that uses XMPP ....

There isn't, I don't think, a common protocol that could be used, is there?

~~~
erlehmann_
XMPP has the jingle extension for audio and video calls:
[https://en.wikipedia.org/wiki/Jingle_(protocol)](https://en.wikipedia.org/wiki/Jingle_\(protocol\))

I have successfully used this for voice calls between the chat clients
Gajim/Pidgin (I do not remember which) and Google Talk (the now-discontinued
XMPP client by Google). I have also done video calls between two Nokia N900
phones, to see if it works. Voice and video via XMPP has worked since
approximately 2009 and is just neglected by the companies for business
reasons.

------
Someone1234
So all four parts are available, just click on the link in the last paragraph
to go to the next.

I love projects/blogs like this, since it is "back to the basics" and we all
learn something by better understanding how things from codecs, to
compression, and so on work. This one is wonderful and one of the best reads
this week.

~~~
bengarney
Thank you very much!

~~~
the_duke
A great read indeed, thanks for that.

------
sim-
While TCP is not ideal for this application by nature of it trying to be a
fully reliable stream protocol, one often overlooked advantage of its
congestion control is that it allows the stream to play nicely with others.
For example, if you develop a datagram-based transport whose data rate results
in congestion at some point in the network path, any TCP going through the
same point would back off to nearly nothing in an attempt to save the link.

You can be greedy and take the bandwidth anyway at the expense of everything
else, but possibly (in some conditions) this may even cause a worse outcome
even for your traffic. It's likely better to change your data rate target and
drop rarely than to send too much and drop randomly at a higher rate.

------
fovc
The series was amazing. Could not stop clicking through. I was sorely let down
at the end when there was no more to read. Also when it didn't turn into an
open-source high-performing P2P video conferencing app :)

------
ape4
Like the idea of the codec: trying to minimize the error

~~~
vernie
What do you believe other codecs are trying to do?

~~~
jsprogrammer
Minimize size? Maximize speed?

~~~
lotyrin
Minimize error (for a given size and run time constraint.) or minimize some
function with error, size, and runtime as factors.

Just minimize size/run time is "return;"

------
CorvusCrypto
This intro (and series as a whole) was AWESOME! My background didn't really
touch on compression at all and these parts were what I really loved learning
from the post. More please! (Any other resources for compression are welcome
:D)

------
lisper
Same story submitted eight hours earlier:
[https://news.ycombinator.com/item?id=12047710#12050483](https://news.ycombinator.com/item?id=12047710#12050483)

------
ausjke
Are there any code available somewhere and is this Windows-only?

~~~
bengarney
No public code. I do a lot of cross platform dev so it would be easy to port,
but initial work was on Windows.

------
dopeboy
Really neat. Reminded me of a live stream video conference system I built for
an embedded systems course a long time ago:
[http://www.cs.columbia.edu/~sedwards/classes/2009/4840/repor...](http://www.cs.columbia.edu/~sedwards/classes/2009/4840/reports/RVD.pdf)

As far as I remember, bandwidth was moreorless unlimited. It was nasty syncing
issues I remember giving me nightmares.

------
cm3
Regarding overhead of H.264 and VP9, using Intel's QuickSync or AMD's VCE
would make sense in a production version, in order to have a fast
implementation. That's VAAPI (and maybe VDPAU) on Linux and BSD. The encoders'
output will look good enough for video streaming.

------
fmilne
Thanks for writing this, I learned a lot! Last weekend I started a peer-to-
peer group video calling project. Seeing your whole approach made the entire
system easier to understand.

~~~
bengarney
Awesome! I'm so glad it helped!

------
Keyframe
imgui is great, but I have a bit of an allergy on C++. There's Nuklear
[https://github.com/vurtun/nuklear](https://github.com/vurtun/nuklear) which
is a re-take on it, but in ANSI C. It's interesting to see GUI rendering takes
so much of your processing time slot, or is it that everything else is so
little?

~~~
bengarney
GUI is slow primarily because of naive uploading of six video frames sixty
times per second. Not a detail that mattered for this initial work. It should
take almost no time with a smarter implementation.

~~~
Keyframe
Ah, so you're counting video upload to texture and render of it as part of GUI
cycle? Number makes sense then. You're basicaly redrawing whole gui each frame
along with video texture and uploading/updating that texture at 60fps.

------
ww520
This is excellent. It's not often to see a technical in depth posting.

The Dear ImGui library looks excellent with simplicity.

------
felix_thursday
Wow, this is incredible. Nice, succinct post, too.

------
zk00006
Inspired by Silicon Valley?

------
joncrane
When I read the headline all I could think about was Dinesh from Silicon
Valley writing a blog and bragging about his video chat project.

~~~
justinsaccount
When he wrote the video chat app I was wondering if 'middle-out' compression
would even work for a video stream.. you'd have to buffer some amount of the
data in order to get to the 'middle'.

~~~
nostrademons
"Top-down" and "bottom-up" in compression usually refer to which direction you
build the prefix tree. In Shannon-Fano coding (top-down), you start with the
set of all symbols and their frequency distributions and then recursively
divide them into roughly even-sized subsets, assigning a 0 as the prefix to
the first and a 1 as the prefix to the second. In Huffman coding (bottom-up),
you start with a priority queue of each individual symbol/frequency pair, and
then merge the two lowest frequency nodes together, building the tree from the
bottom up. In middle-out coding, presumably, the algorithm decides at runtime
whether to merge two existing codes into a single prefix tree, or to split an
existing prefix tree in some other way. There's some speculation [1] that this
is done in a probabilistic way.

All of these algorithms require known frequency distributions, which requires
that you have the full data available. In typical DEFLATE compressors (gzip,
pkzip, zlib, etc.), this is handled by dividing the input stream into blocks
and compressing each block individually.

A similar approach could be used for video - you could easily make the block
size a single packet - but in practice, you'd rarely want to use a lossless
compression algorithm for video chat anyway. Most video compression is lossy;
you can drop a lot of detail before the human eye notices. That's how you can
stream a full widescreen movie (which has an uncompressed size of 2 megapixel
* 4 bytes/pixel * 30 frames/second = 240 MB/sec) over a typical 5 Mb/sec
broadband connection.

[1] [http://news.mlh.io/i-hacked-the-middle-out-compression-
from-...](http://news.mlh.io/i-hacked-the-middle-out-compression-from-silicon-
valley-06-16-2015)

