

Camtweet is TwitPic meets live video (from Justin.tv) - emmett
http://www.camtweet.com/account/login_with_invite_code?invite_code=HACKERNEWS

======
emmett
We built this in about 3 days of work on top of the Justin.tv platform, I'd be
curious to hear what the HN community thinks about it. I've set up an invite
code so you all can try it out.

~~~
grinich
Why don't you have a native iPhone app?

~~~
emmett
Because you can't stream live video to or from the iPhone over 3G yet, and it
only works for h264, so it's fairly pointless. The existing apps are pretty
much crap because of Apple restrictions. We're ready to pounce if and when we
think it's a good idea.

~~~
DarkShikari
_Because you can't stream live video to or from the iPhone over 3G yet_

Yes you can; we've already done it. A coworker of mine hacked together an app
to do it in a day or so. There's also some companies offering services to do
it for you (Ripcode, etc).

 _and it only works for h264, so it's fairly pointless._

It's fairly trivial to do realtime transcoding (x264+ffmpeg+segmenter).

~~~
kvogt
Try doing faster than realtime transcoding of 2000 simultaneous streams. Not
_that_ trivial.

~~~
DarkShikari
Let's do some math then. Let's assume the videos are 320x240, for example.
Since we have so many streams at once, we can assume perfect scaling across
multiple cores and systems.

A 320x240 video encodes at about 195 FPS on my Core 2 Duo Conroe 2Ghz with one
encoding thread, using x264 and preset "veryfast".

Now, if we go up to a Quad at 3Ghz and use all four cores, we're up to
1170fps.

A Penryn is about 10% faster per-clock than a Conroe on x264. We're up to
1287fps.

A Core i7 is about 40% faster per-clock than a Conroe on x264, including the
effects of Hyperthreading. We're up to 1802 FPS, or 60 streams, and we haven't
even left our single processor. For all 2000 streams, we'll need 33 CPUs, or
maybe a few more if we allow for a bit of overhead, just in case.

Not that hard now, is it? Bonus points: you save bandwidth, since H.264
compresses a lot better than Sorenson H.263 (FLV1).

(Also, preset "ultrafast" is another factor of two faster than _that_ , though
the quality is low enough that nobody should be using it...)

~~~
jacquesm
In practice you are not going to be doing nearly as good as that. your co-
workers hack notwithstanding.

Assuming 'perfect scaling' and discounting such things as networking overhead
are going to upset that picture by a considerable amount.

You'll be lucky to get to 500 Mbit out per physical machine using 16 cores and
streaming to about 2000 users with 75 incoming streams. So that's roughly 5
streams per core.

To just multiplex 2000 streams to 40 K viewers with adaptive frame rate you'll
need roughly 30 sixteen core machines, note that that does _NOT_ include
transcoding yet.

~~~
DarkShikari
_You'll be lucky to get to 500 Mbit out per physical machine using 16 cores
and streaming to about 2000 users with 75 incoming streams. So that's roughly
5 streams per core._

Why do you have the same computers handling encoding as distribution? That's
stupid.

 _To just multiplex 2000 streams to 40 K viewers with adaptive frame rate
you'll need roughly 30 sixteen core machines, note that that does NOT include
transcoding yet._

Then your multiplexer is unimaginably inefficient. Last summer I worked for a
company that multiplexes live streams for _millions of viewers_ ; it is not as
hard as you think, especially since you only have to do the work once per
channel, not once per user, since all users watching a given channel receive
the same stream.

~~~
jacquesm
I think you are missing what adaptive frame rate is all about...

~~~
DarkShikari
You have yet to explain what "adaptive frame rate" actually is in this
context.

Do you mean that each user receives a different framerate based on the
download bandwidth that he has? You really only need a couple framerates to
cover all users--and I suspect you mean "bitrate", not "framerate", in this
case (though changing the framerate _is_ one effective way to improve the
quality of an adaptive bitrate scheme).

~~~
jacquesm
Adaptive means that based on the connection quality every user receives the
maximum quality that their link is capable of while guaranteeing audio
delivery with a minimum lag to stay as close to realtime as possible.

This means that every user has an individual stream because the connection
quality can change very rapidly over short periods of time, they might switch
a secondary stream on (or off) for a moment and whatever else can change the
quality of the connection. You can't use very large windows because that would
mean that if there was a hickup in the line it would take a long time to
recover, which would degrade perceived quality.

Video is relatively forgiving by the way, the audio is the real kicker, a
single packet delivered a little bit too late will instantly cause annoyance
whereas with the video you can be a little early or late and it doesn't seem
to matter too much.

Initial encoding happens on different machines (obviously) but a stream
multiplexer that maximizes the end user experience is a lot more complicated
than just encoding to a preset selection of rates, after all every keyframe
and subsequent updates have their encoding scheme determined by the throughput
measured during the transmission of the keyframe(s)+updates just preceding it.

For non-live consumption the situation is a lot easier, you can pre-buffer and
that takes care of most problems but in live transmissions that is not
possible unless you want horrible lag (which does not work when the connection
is two way, it screws up the interaction between people).

~~~
DarkShikari
_Initial encoding happens on different machines (obviously) but a stream
multiplexer that maximizes the end user experience is a lot more complicated
than just encoding to a preset selection of rates, after all every keyframe
and subsequent updates have their encoding scheme determined by the throughput
measured during the transmission of the keyframe(s)+updates just preceding
it._

It's really not that difficult; if you have one encode going per-user, you can
just reinitialize the encoder's ratecontrol, and if you don't have one per
user, you can simply use a short GOP and swap the bitrate at each keyframe, in
the same way that H.264 extended profile proposed using Switching Pictures
(but in a way that real software actually supports). Of course this latter
method requires multiple simultaneous encodes per channel, or an SVC-like
system (which almost nobody supports, and for which there aren't any good
software implementations anyways).

Is it really that important for Justin.TV to have less than one second
latency? The vast majority of the site is completely non-latency-critical
(from my cursory browsing of it). Heck, the broadcast company I worked for
last summer had over 10 second latency and nobody ever complained about that--
and that was live television, including sports games and so forth.

Plus, if Justin.TV is already doing a scheme like this, surely upgrading to a
better, faster encoder is going to _decrease_ , not increase, their CPU load.

 _Video is relatively forgiving by the way, the audio is the real kicker, a
single packet delivered a little bit too late will instantly cause annoyance
whereas with the video you can be a little early or late and it doesn't seem
to matter too much._

It sounds as if your buffer size for audio is too small. There is no reason
why video or audio should act differently in that regard; in fact, audio
should be easier, as losing a video packet will result in a loss of video
until the next keyframe, while in audio, every single frame is a keyframe.

~~~
jacquesm
I think the word 'trivial' means different things to you than it does to me :)

~~~
DarkShikari
It is certainly very simple in the general case (you have a video stream, you
need to transcode it); it just becomes difficult when you try to do fancier
things with it... which is perhaps true of all systems.

