Hacker News new | comments | ask | show | jobs | submit login
Skype does a clever trick when bandwidth is scarce
115 points by jloughry on Oct 29, 2013 | hide | past | web | favorite | 65 comments
On a Skype video conference this morning, I noticed the software did something very interesting. Instead of pixelating video when my crappy broadband internet connection slowed down briefly, it zoomed.

It was accomplished so smoothly I wouldn't even have noticed except that I was talking with a room full of people on the other end and suddenly I couldn't see the people on the sides any more.

Instead of reducing the full-frame video resolution when bandwidth grew scarce, as it usually does, instead this time Skype selected the middle portion of the frame and showed it clearly. It was very subtle. It was done completely smoothly. The effect was aesthetically pleasing, not disruptive at all, and an elegant solution.

Well done, Skype.

I think this behavior comes from the webcam, and not Skype. I have actually seen this exact behavior with the Microsoft LifeCam, which often auto-changes the width of the video stream. The larger width will show "people on the sides," and the more narrow width does not.

I believe that the LifeCam looks for movement in the peripheral area. If there is no movement there, the LifeCam will truncate the sides of the image. I have sometimes been able to force the sides to appear by waving my arms off to the side.

When I first noticed this happening, I was surprised that many "people on the side" don't move or talk at all, thus triggering a truncation. But I started looking for this, and most "people on the side" barely move at all.

In my experience, Skype video quality tends to degrade by simply freezing the screen. I have also noticed this behavior when Skype was off entirely, when recording a video of myself. So I'm pretty sure it's the webcam itself auto-controlling the width, and not Skype.

It is a Microsoft lifecam VX-1000. If there's enough room in that little thing for this kind of intelligence, I'm impressed.

The intelligence is usually in the driver.

I purposely did not install the drivers, but that's not saying much as the camera, OS, and Skype software are all Microsoft's, so there is no reason to believe Microsoft couldn't run any driver they wanted to in this situation.

If it is indeed the webcam and not Skype, it'd be cool to update the description and/or title accordingly. After all, it's rare to see a MS product given high praise around here - and they certainly seem to deserve it here.

Microsoft owns Skype anyway.

How does the webcam driver know how much bandwidth your socket has? The request to reduce quality in some fashion would have to come from Skype.

It doesn't. That's the point.

I'm glad someone posted my immediate thought.

Pretty sure you saw a resolution drop from "HD" which is 16:9 to "Regular" which is 4:3. It didn't zoom, it just went into a 640x480 mode instead of whatever mode it calls HD. It does it all the time to me. Personally, I find it annoying. The video is framed a way for a reason (so long staff on the sides!). Instead I'd rather see pixelization than a random change in aspect ratio.

This is the explanation. I use skype video pretty extensively and what you describe happens all the time.

If i understand you correctly it does not show the full picture in degraded quality (more compression artifacts, framedrops), but it takes the center part of the image (so perhaps the center 70%) and upscales it to full size. With proper upscaling algorithms this would result in a slightly blurry picture, but with (as it uses only 70% of the image) no compression artifacts or framedrops.

Sounds like a smart idea.

However, i just skyped and noticed nothing of the sort. framedrops and compression artifacts galore. :-/

I've never seen it do it before today. I wish Skype published release notes with their software updates.

I've seen this on my Mac with its inbuilt webcam. It annoys me intensely because when it switches it usually results in unwatchable video for a period of time before it stabilises, then it suddenly realises it has enough bandwidth and switches back, and that is followed by another period of disruption. Good idea in principle but needs refinement.

Since someone reading this might know, does anyone have any recommendations for webcams for OS X? The Logitech C920 seems to be the most modern HD webcam out there, but the Internet is conflicted on whether Mac's in-built UVC drivers will properly take advantage of it in FaceTime or not or if it will just be seen as an SD camera. So if anyone has any personal experiences with good HD webcams on OS X I'd be very interested to hear...

Edit: Actually, a "weekend project" I never got around to was to use OpenCV to detect faces and automatically encode just those regions with a higher bitrate, and sacrifice the rest. So that the areas we actually care about are encoded better and we can get an objectively better image for the same average bitrate. I'm sure someone must have done this already, though I couldn't find anything when I looked.

Interesting idea with face detection, though I'm not sure if there would be any wins over what the video codec already does.

I don't think this is necessarily a good idea. For one thing, automatically cutting off the edges of a frame is undesirable, exactly for the reason mentioned (in a conference call, it cuts the people on the edges off).

Secondly, zooming is not as effective as real compression. What I mean by this is: assume you have a high resolution image. One way to save bandwidth is to use a well-designed compression algorithm optimized for the human visual system. The second way is to just shrink the image, and then "stretch" it back to the original size, which -- in a sense -- is what Skype is doing here. Which is going to be more effective?

One way to save bandwidth is to use a well-designed compression algorithm optimized for the human visual system. The second way is to just shrink the image, and then "stretch" it back to the original size, which -- in a sense -- is what Skype is doing here. Which is going to be more effective?

It really depends on how much data you need to shave off. Beyond a certain point, you'll get better visual results by reducing image size rather than increasing compression. As an example, an SD movie encoded with H.264 to a file size of 300MB would look a lot better than an HD movie encoded to the same file size, even when played back at the same on-screen dimensions.

This made me think... Is that a misfeature or bug in the compression algorithm? Why wouldn't the video codec adapt appropriately? If lower res causes a more even and smooth image, why shouldn't it scale down?

I had typed a lengthy introduction to an explanation, but realized I don't know exactly where all the bits are going in a high-res low-bitrate image. All I can say with confidence is that if you squeeze a high-resolution image into the same size as a low resolution image, the low resolution image will look better. The high resolution image will basically be reduced to storing DC coefficients, so you'll see these giant 16x16/8x8/4x4/etc. pixel blocks of solid color, with maybe a bit of pattern on them that barely correlates with what the original looked like.

Maybe it comes down to the per-macroblock overhead; maybe fewer blocks total in the lower resolution image allows more bits to be allocated to frequency coefficients instead of DC offsets.

Yeah, I've observed the effect, but it never occured to me that it doesn't (or shouldn't) have to be that way :) We should be able to demand codecs that are smarter and more adaptive :)

It's hard to imagine why anyone would want a blocky highres lowbitrate stream if the lowres variants are always better, for a given timespan of video

Though this isn't implemented by the reference encoder yet, the VP9 bitstream supports this (encoding a downscaled frame and signaling to the decoder to upscale it before displaying it).

I know from personal experience that LYNQ speeds up the audio stream when the connection gets bad for a short time. It is using time stretching (changing the speed without changing the pitch). I can tell the difference, because I did a lot of audio editing in the past and know how time stretching sounds. I would say that the speed factor is up to 2-3x.

This makes people "speak" incredibly fast occasionally.

I've noticed this with Skype. It's pretty funny: you'll not hear much for a few seconds, and then hear the other party speak really quickly, with all the pauses between words dropped out.

I'm confused about why this would be a good way to save bandwidth. The same number of pixels are still required to give you a smooth picture. So unless Skype adds more pixels to the zoomed-in area (which would presumably negate the bandwidth savings), you're still getting reduced resolution. Why not forget the zooming and just reduce resolution?

I assume the heuristic they're using is that single faces are the primary use case, and will tend to be in the center of the frame. If I'm right, their solution would use more of the available bandwidth resolving the user's face rather than anything which may be in the background.

OTOH, things in the background are unlikely to move much which is ideal for frame-to-frame compression, so getting rid of those areas is unlikely to really buy you much.

That's how the human eye works; there are denser "pixels" in the foveal (central) region of the retina.

The camera delivers a fixed resolution. Zooming by necessity reduces resolution of the displayed image, you could probably notice if you were looking for it. Net number of bits crossing the wire is reduced so its a win.

>The camera delivers a fixed resolution.

I think you misunderstood the question. Cropping and scaling are both software choices for reducing resolution, so the question is why crop specifically?

I am certain the camera on the remote end was incapable of optically zooming. Skype evidently took the HD frame from the camera on that end, cropped it to something like VGA size, and sent only those pixels across the wire. On my end, it was re-sampled and displayed full-size. The picture on my end did not change physical size; it looked as if the camera zoomed in.

Mainly, I was impressed by how smoothly and seamlessly it was done.

Adding a Gaussian blur to images can significantly increase their compressability (for many codecs), perhaps that is what's happening here?

Really cool idea. The next step is to use basic face detection to focus towards the largest face in the stream.

Are you sure this wasn't someone twisting the zoom ring on a camera the other end?

Are you sure this wasn't someone twisting the zoom ring on a camera the other end?

I'm sure. I was watching when it happened, twice. Shortly before it happened, I did see some pixelisation, so I know the bandwidth was flaky. It seems to have started doing this after the most recent software update.

I thought about sending a nice note of thanks to Skype's support address, but it was just so neatly done that I wanted to give them more public praise.

I wonder if this effect could be made less noticeable by layering & blending the cropped image on the last full-size frame. The centre of the feed would continue to move, but the edges of the image would remain stationary.

Anyone having luck talking to people through XMPP Audio/Video with an opensource client?

It's ironic that Twitter, Facebook, Skype, Yahoo, AOL, Google et al. use XMPP in one or the other proprietary form.

It's opensource, many awesome clients are available and yet people still choose possibly backdoored and definitely monitored and text-mined closed-source solutions. I don't understand why the usage of the open solutions is so low. Can someone explain?

Since Google dropped Jabber server federation, open XMPP are less useful.

And, for voice/video, getting them to work through NAT is a pain, while Skype JustWorks™. Plus, it is a really good solution: I can change my connection from WiFi to cable, and the video recovers within a few seconds.

That said, I'd love it if everyone ditched propietary solutions. But, unless you really care about privacy and freedom, it's a hard choice to make.

I tried switching to Jit.si last month. After a week or so, I went back to Skype. Very poor audio and dropped calls.

Get someone in there to completely strip the UI, increase the audio quality, and I'm there. Paid user.

Good to know, I'm not alone with that experience then. All of my clients already have XMPP, but there is no good CrossPlatform Client, that JustWorks™. Which is sad. But maybe something new will replace this, for example WebRTC is on the way.

Google Hangout HD mode is a wider view. If you ratchet down the quality, one notch down from HD (the original view) is a narrower view.

I wonder if video input could be reduced to textures and geometry/transforms for low-bandwidth videoconferencing scenarios.

Sounds a lot like how video quality degraded gracefully in Vinge's "A Fire Upon the Deep". Transcendent beautiful 3d video with all kinds of other extras degrades seamlessly to janky paper cut -like actors in the book iirc.

Ha! That was actually my first thought after submitting my comment.

It gets really really REALLY processor intensive in most cases. Still might be doable, though.

Skype is also among the NSA partner companies.

before they partnered with the NSA, it used to be P2P to save server bandwidth... But I don't think the problem is with the company being a partner with the NSA, the problem is that the powers of the NSA are too broad (they can force companies to participate in these mass-surveillance programs, or throw them in jail).

> they can force companies to participate in these mass-surveillance programs, or throw them in jail

Or those companies could band together and give the NSA the collective finger.

Zooming will not reduce bandwidth requirement, unless the zoomed image is also pixilated.

OP probably meant the image was cropped.

You must have the paid version. When bandwidth is scarce for me, it just drops the call.

This is definitely not Skype.

Doesn't let NSA spy on it anymore? That should save some bandwidth.

Let's make "Langford's parrot" basilisk for NSA that everyone can transmit around when circuits are idle. Make sure it's not compressible; fill up their disk space right quick.

Good one! Might be a good suggestion for WebRTC spec as well!

Wouldn't this get implemented in the client code rather than the API itself?

To achieve the goal of saving bandwidth, it must have been done coöperatively between the remote machine and mine. Their end had to understand that my end was short of bandwidth (remember, this is cheap ADSL, and my broadband speeds are r-e-a-l-l-y asymmetric) and send fewer pixels, followed by which my end had to upsample those pixels to keep the picture the same size.

I don't really understand why at all. You send less pixels and the destination machine just displays those as is. What would giving it more information about /why/ really accomplish? What can it do? Wouldn't it make re-sampling decisions based on the content with or without low bandwidth?

Only when the quality degrades, the remote bitrate estimator tells the other side to reduce the bitrate if the decoder isn't keeping up. that's the spec. Otherwise it is on default resolution.

Oh, you're right of course. But my machine still has to complain to the other end that it's not getting frames rapidly enough, so there is still some coöperation going on.

This is a behavior of webcam. My Dell laptop does all this.

It's the Web cam, not Skype.

i don't want to say anything

WOW! Skype is sooooo cooool. Now, if they could stop spying on me that'd be great.


and secure the communications and not be centralized.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact