Hacker News new | past | comments | ask | show | jobs | submit login
Cisco Leap Frogs H.264 Video Collaboration with Real-Time AV1 Codec (cisco.com)
139 points by clouddrover on June 27, 2019 | hide | past | favorite | 55 comments



In my experience, the weakness of current conferencing systems isn't in the software - it's a hardware problem.

Specifically, their ability to capture and transmit usable speech is really poor. Between echo and poor noise gating, I'm not exaggerating when I say that I can make out fewer than 1/3 of the words in my team's conference calls.

I don't care how efficient the video codec is, if I can't clearly understand what the other people are saying, it's all useless.


Are you sure the problem isn’t the phone system? We use Polycoms with Skype and everything is crystal clear. Our AV guy says Polycom is the best you can buy at that price point, and he’s trying to sell me $15,000 systems.

We only ever get reports about problems when people are actually dialing into something over POTS.


Its likely a network problem causing large jitter buffers (see my other comment). Buying Polycom won't solve this problem, unless the jitter buffer is in adaptive mode and its wildly overestimating the jitter (a problem I've seen on Polycoms and Grandstreams before).


I think the latency is the killer for our remote conference calls. So much extra delay and effort goes into not talking over each other that isn't an issue in a live meeting.


Sounds like your conferencing system has large jitter buffers. If your internet is stable, you should be able to reduce the buffer.

To test your network, go to https://www.dslreports.com/speedtest and see if you have significant jitter or bufferbloat. If your rating is an A, then your good to go, otherwise its time to clean up your local network :P


Echo cancellation and voice activity detection (VAD/what they use for gating), AGC, etc is all done in software in modern VOIP (or firmware? It's on a dedicated processor). And it's quite good if you spend the money on it.

Those cheap Cisco conference phones do sound terrible, and I don't think people replace them very often.


Money isn't a fix all for AEC, frankly Google's WebRTC AEC is the best commercially available AEC. You can see it in action when on a call over Signal Private Messenger, it knocks out background noise (eg: loud car noises if your walking down the street) like a champ.


Like as a standalone product or integrated into conferencing systems? Because there are plenty of solutions made by audio companies with more institutional knowledge and time invested than Google.


WebRTC's AEC is integrated into most browsers, PulseAudio (https://www.freedesktop.org/software/pulseaudio/webrtc-audio...) and a variety of other systems.

Google, Mozilla and all the organizations that have collaborated on WebRTC have spent 7+ years tuning this component. Much like Opus versus AMR-WB, I doubt there is anything notably better in the proprietary software/hardware world.

The orgs that develop the WebRTC standard have spent years implementing and experimenting with nearly every algorithm that is available for each of the various components (resulting in Opus, AV1, WebRTC AEC, etc), combing research papers daily and often working with researchers directly.


Personally, I've heard (and been used for testing) proprietary AEC and it is way better than anything using WebRTC. The artifacting can be really bad, enough so that it's worth disabling (although noise reduction is quite good, it's the echo cancellation that kinda sucks and harms voice quality, giving you the classic garbling sound).

Stuff like WebRTC/OPUS is limited by the fact it runs in the browser or consumer hardware. It's not bad, it's just a different problem domain than commercial audio, and has different constraints. High quality conferencing equipment runs on a dedicated unit that your A/V integrator installs either in the room, or in a server closet and connects over Dante to the analog front end in the room, and usually is a package deal with good microphones (which could be wireless, or the hot thing now is to use beamforming).


Great opportunity for someone to combine a HomePod, appletv and iPad solution.


That sounds expensive for no reason.


pretty close to how our Zoom conference rooms are setup... iPad controller + Mac Mini + Revolabs UC


Coincidentally i'm trying out rav1e today (the open-source command line av1 encoder). At 1080p for a 2m40s movie trailer on a current-model mac mini 6-core, it's running at 0.153FPS. That means it's running at 157x slower than real-time. Kudos to these cisco chaps for being roughly 157x faster than rav1e!


You simply turn off all the tools that are slow, and you end up with an AV1 encode that is very fast, but quality wont be anywhere close to your normal AV1 encode.

I don't doubt it will be of higher quality than AVC. After all AVC was the work started pre 2000 and published in 2003.


It is easy to dismiss what Cisco demonstrated like this, but if you listen to the talk, they argue that having more tools available gives you more room to optimize for quality per bit per cycle, in the same way that having more choices allows you to optimize better for just quality per bit (or any other objective, really). So it's not just a matter of turning things off, but of making good choices.

It is also not just beating AVC. They claim that it is higher quality than their own HEVC work, which they invested in heavily before it become clear that the licensing situation was not going to just "work itself out". They once told us that even they were surprised how many of the HEVC tools you could wind up using for real time, if you used them judiciously (basically: all of them).

I have worked with some of the team responsible for this project before, and we collaborated with them on some AV1 tools (most notably the CDEF loop filter, but also entropy coding and a few other things). They know their stuff, and really do seem to be leaps and bounds ahead of everyone else on the RTC side of things.


Two important things it will be:

  1) Royalty (and annoying patent) free.
  2) A reason for hardware decoders to exist.


....Or having a PC roughly 157x faster.


I encoded about 100 1080p-sized images to AVIF yesterday. I was a little disappointed that each of them were taking about 30 seconds, fully loading an i7-2600 CPU. Realtime AV1 encoding is good news, maybe those optimizations can reach the still image format?


Your CPU is missing AVX2 & AVX512 (and numerous other CPU extensions), the amount of optimiztion that can happen for your CPU is limited.

https://en.m.wikipedia.org/wiki/Advanced_Vector_Extensions


You should open an issue, the fix might be as simple as changing your build flags.



Real-time is such an unfortunate term when talking about video collaboration. A codec that can achieve real-time encoding speed is still useless if it has a 10 frame latency to output the first data, and arguably not real-time.


The proper term should is low-latency. Real-time implies deterministic timing of events which must meet a deadline.

Real-time can give you low latency but so can a system with a high enough throughput.


~1/3 second is perfectly acceptable to me to be within the range of "real time", and is quick enough that I'd be willing to bet most people won't notice even in a two way conversation.


For live streaming and chat that's fairly acceptable. However, don't forget there are other potential use cases, particularly since cloud gaming is supposedly becoming a thing again.


... and on what kind of hardware, because it's not so much useful if you need a supercomputer for it.


Great to see improvements on the AV1 encoding side. I remembered reading something similar coming from Europe -> Allegro DVT AL-E210.

http://www.allegrodvt.com/products/silicon-ips/al-e210/

How does your solution compare to Allegros one?


Cisco are not using dedicated hardware logic like Allegro DVT I think. For phones etc, dedicated hardware will be very useful, but it takes time to get into products, which is why a real time CPU encoder is very interesting.

Looking at the video (https://vimeo.com/344366650) of the Cisco talk, at 20m18s they say a "real time low latency HD software encoder" [emphasis added], which confirms it.


When can we expect to see hardware AV1 encoding/decoding in mobile SoCs?


It's starting to happen. Here's a SoC announcement from Realtek for AV1 decoding:

https://www.realtek.com/en/press-room/news-releases/item/rea...


>hardware AV1 encoding/decoding in mobile SoCs?


"Mobile" is subjective. There are laptops with desktop chipsets after all... Who stops a phone from having a set-top box chip? :)


Fair enough, but those laptops are meant to be used tethered to a power source. Set top box chips are generally far more power hungry than their mobile counterparts, just like desktop chips vs laptop chips. The issue here is that I'm not sure there's much of a market for phones that don't work on battery for more than 15 minutes.

Then again, there's a market for water-cooling blocks for phones so I guess anything is possible.


Those laptops do come with a battery, though, so someone out there is enjoying a whole 22 minutes of battery capacity.

In all seriousness, though, integration into a set top box chip is a sign of commodity: A low power chip with limited application, where it is implemented not to power a large marketing department, but simply out of potential necessity.

This is to me a far bigger indicator of "things coming soon" than, say, integration into a high-end GPU.


So how far away is it until my phone can record 4K @ 60fps like it can with HEVC? I'm guessing we'll need hardware support built on these optimizations?


My guess, 5 years. And the cisco “optimizations” May work well for a conference call with a perfectly still camera and low spacial and temporal information density. Applying the same techniques to handheld action video would produce very poor results.


I haven't used Webex or any Cisco conferencing offerings recently but I have been very impressed by Zoom. It works really well when using computer audio with AirPods. I've used this setup for meetings across continents without thought to latency. This may be a result of Internet connections improving in general. I found video conferencing to be distracting in the past but surprisingly good now.


Well, real-time 4kp60 HEVC encoding is a reality since 2014 https://www.businesswire.com/news/home/20140904005279/en

Since that time HEVC encoding has only become faster and less expensive.

This real-time AV1 encoder achieves 1080p30, which is simply speaking 8 times less pixels than that 2014 demo. And compares against H.264 which says it all.


> And compares against H.264 which says it all

They compared against HEVC as well. To quote the article: "This means that we can substantially raise quality, while saving bits, all with a very usable CPU footprint. We have found that the real-world compression/speed trade-offs for AV1 are in fact excellent, and better than HEVC."

And the business problem with HEVC is: "HEVC (aka H.265) comes with unacceptable patent cost, risk and uncertainty."


It hasn't stopped HEVC from winning https://www.youtube.com/watch?v=vgE8-4rcXl0


What do you believe HEVC is winning? The major online video platforms (YouTube, Vimeo, Netflix, Twitch, etc.) are all going to AV1.

HEVC had its shot but the terrible licensing stunted its growth and AV1 will replace it. Leonardo Chiariglione says the MPEG business model is broken and I think he's right:

http://blog.chiariglione.org/a-crisis-the-causes-and-a-solut...


Leonardo Chiariglione also said "AOM will certainly give much needed stability to the video codec market but this will come at the cost of reduced if not entirely halted technical progress. "

which is a very real concern. (a bit tired arguing with people defending Google's contraption vs a codec with innovation from dozens of companies)


AV1 is not jut Google's child, but rather Amazon, Cisco, Google, Intel, Microsoft, Mozilla and Netflix's collaboration to create a standard that isn't beholden to MPEG-LA: https://en.m.wikipedia.org/wiki/AV1


And who has paid the programmers on payroll on aomenc?


> which is a very real concern.

It isn't. Innovation in video coding is driven by the need to achieve the same image quality in fewer bits for easier and cheaper distribution. All video platforms, providers, and users have a stake in achieving that improvement.

Video coding is not finished yet. There's still more to do and there will be an AV2 eventually.


I see. You know better than the MPEG chairman. Good to know, I can retire from that debate.


That's an interesting video, which part did you intend to support your HEVC winning claim?

Not the bit when he says AV1 looks much better than x265 (which he started and led development on) I guess.


> which part did you intend to support your HEVC winning claim?

I suggest you see the whole video.


I did, that's why I was a little confused by your summary. It was a lot more nuanced than I was expecting.

I enjoyed the bit when he flashed up the logos of Netflix, Amazon, Vimeo, Microsoft and other AOM members as showing strong support for HEVC from streaming platforms. And the bit when he said that support by AOM member Apple was really key to HEVC success in America. Or that AOM member Samsung was putting HEVC in TVs.

The guy seems smart and informed, but the video had a lot of dramatic irony since many of the corporate big guns he uses to quantify HEVC success seem to be more and more vocal about their dissatisfaction with HEVC and plans to ditch it for AV1 as soon as possible, which is a phyrric kind of victory.


> many of the corporate big guns he uses to quantify HEVC success seem to be more and more vocal about their dissatisfaction with HEVC and plans to ditch it for AV1 as soon as possible

Do you have any source on this?


Version 1 of HEVC was approved in 2013. Version 1 of AV1 was approved in 2018. So if realtime encoding for HEVC appeared in 2014 and we are in 2019, then it seems like the technical progress is on par.


That's for 4kp60, 1080p30 HEVC encoding was at IBC 2013. And there were more encoder projects. MPEG codecs create a lot of interest.


That's a Xilinx based FPGA encoder, which is a different thing from a software encoder using 1 core.

One of the other talks at the event where this was announced was by NGCodec who where showing off their Xilinx based AV1 encoder, though I believe it's intra only at the moment.


“Less expensive” only if you don’t count the license cost, which is still unknown in HEVC.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: