Hacker News new | past | comments | ask | show | jobs | submit login
Why don’t podcasts use VBR MP3s? Because iOS and macOS don’t accurately seek them (marco.org)
202 points by okket on Aug 15, 2016 | hide | past | favorite | 85 comments



That's not the only reason. We have been podcasting for over 10 years, and early on we tried to use VBR MP3s because we are teach people who know about that sort of thing.

As it turns out, not everyone is listening using a modern device. When we tried VBR a significant number of people could not listen because their MP3 playing hardware/software of choice did not support VBR files properly. They didn't realize this was the problem. They just complained that the file was corrupted while it was working fine for everyone else.

Also, lots of encoders, including Adobe Audition/Adobe Media Encoder, don't write the headers on VBR files properly. This also causes a lot of players to fail at playing them properly.

https://forums.adobe.com/thread/1072786

When you make a podcast, maximum compatibility is the priority over audio quality and file size. So standard old MP3 it is.


My favorite bug about this was on an _ancient_ MP3 player I had (an EigerMan F20), which supported VBR MP3s...incompletely. It didn't support decoding regions with certain bitrates, so it would just silently skip them, leading to extreme confusion on my part.


I had a similar problem and never considered that...


> When you make a podcast, maximum compatibility is the priority over audio quality and file size. So standard old MP3 it is.

Interesting how the incentives can be different. Stuck without proper broadband and paying for all the precious MBs for a few weeks I made a proxy for downloading/recoding podcasts as VBR on external server to save my data...


> using a modern device

How are we defining modern here?

I'm pretty certain my Nomad Jukebox 3 supported VBRs fine, and that's coming up on 14 years old now.


Marco and the others mentioned this on the example podcast episode (linked from he article).

Hardware support for VBR is pretty decent now.


I had an old MP3 player that would play VBR files with variable speed. I suppose the speed changed depending on the bitrate of the part of the file, but I wouldn't know how to check it in such details.


VBR have been the red-headed step-child of MP3 since the beginning. I have no idea why, maybe it was difficult to write good encoders and decoders for it for back in the day. The few times I used them, I found trivial space savings compared to a CBR and to be honest I thought I could hear a difference (maybe its not humanly possible if the implementation of the encoder/decoder is good but what if the implementation is sub-par?). Storage and bandwidth are so cheap, why bother today?


iPhone storage and cellular bandwidth are still extremely expensive. Over $100 for a measly 16gb of storage or 4gb of LTE.


$100 gets you 48GB of storage these days, and at my overage rates it would buy about 7GB of transmission. Not that this really changes your point.


Is 16GB really that measly when it comes to podcasts? Even at, say, 128MB per hour (!) you've got over 100 hours of capacity there. Enough for a full waking week, nonstop.


Android takes up 6.5GB, the dozen-ish apps I use seem to take 3.5GB. So, that leaves 6GB of general storage. The browser cache and all music, photos and podcasts need to fit into those 6GB.

A single episode of a podcast may be an hour and a half. If you are subscribed to 5 weekly podcasts, then that's about 1GB of data you'll need. If you're going on a long trip, maybe you stock up. That might take 3GB or 4GB... But that probably won't fit on your phone. It certainly doesn't fit on mine.


People with 16 GB iPhones really struggle with running out of storage. Some may be lucky to fit one 128 MB podcast on their phone.


>iPhone storage and cellular bandwidth are still extremely expensive. Over $100 for a measly 16gb of storage or 4gb of LTE.

...is "don't use an iPhone until they start adding microSDs to their phones and don't stream" too obvious or am I missing something here?


It's completely absurd when a 64GB microSD card is less than $20.

https://www.amazon.com/SanDisk-microSDXC-Standard-Packaging-...


I've had problems with hardware players that couldn't handle 320kbps blocks of VBR. Restricting the max bitrate fixed the issue but most encoders don't expose this as an option (lame does).


And audio data is relatively small that the savings aren't that valuable.


That's not true. In the ATP episode Marvo mentioned he wanted the theme song or clips to be able to sound really good (192 or 256kbps stereo) but their normal talking was nearly perfect at 96 mono.

Then add in all the gaps in people's speech where you need almost no data and he said he got pretty good savings.


If you want more context to this, there's a lot more in the most recent ATP podcast (with timestamp link included in the article).

Which kind of highlights a problem that I think is only going to grow over time - the general greatness of podcasts has one big black spot in that they're tough to share. I've seen so many discussions recently where there was awesome additional context or info in a recent podcast, but it was tough to share it with people in a format where they'd actually consume it. Whereas, if it had been in a blog, sharing would've been easy. Marco has the right idea with these timestamp links, but even that isn't perfect. A lot of people, especially technical people, won't even click on a non-text source, for better or worse.

It's a shame that a lot of good information is becoming harder to share due to this. It's not even that podcasts are "locked down" or behind a paywall or anything, it's just that people often don't have the time or motivation to listen to them. I don't have a great solution to this; some kind of automatic transcription maybe?


I like podcasts, but they're unnerving to me.

It's too much time wasted to get to the point. And as opposed to music, I can't program properly when listening to them

Not to mention a lot of podcasters act in a radio way (which make sense for radio), with repetitions, talking slowly, endless introductions, chit chat, etc


The "radio stuff" isn't nearly as much of an irritant if you're listening to a podcast while commuting.

It's still kind of grating when you listen to a whole bunch of episodes of something in a row, though. There needs to be markers in the files and software that can cut off the intro/ending of the "internal" episodes in a playlist. Like a much more drastic version of gapless playback.


Chapter markers are a potential solution IF podcast producers use them and mark them accurately and IF your podcast player of choice supports Chapter markers.


Overcast[0], the iOS app that Marco makes supports chapters and he usually adds them when he encodes his podcast and it's pretty nice. Hopefully he releases his personal podcast production tool soon so that it can be more widely used.

[0] https://appsto.re/us/jhe90.i


2x playback speed helps a lot I find.


Some are better than others in those regards. Most of the ones I listen to, I listen to for entertainment - so I don't mind some meandering. And decent podcast apps will give you reasonable skip options - I will frequently skip intros and outros (I'm looking at you, 99% invisible; I have never cared what your boy has to say.)


Yeah, but it's the intro and outro that makes the money.


Not always. I've stopped listening to Marc Maron's podcast because the last one I listened to started off with a 20 minute monologue about his cat. 99% Invisible used to always end the podcast with a quote from his kid about something; maybe it still does, but once the "main show" is over, I just skip to the end and mark it as read.


I do agree there is too much noise, but in the end of the day they are not the pure way to learn, though on the higher end of https://en.wikipedia.org/wiki/Educational_entertainment


> I don't have a great solution to this; some kind of automatic transcription maybe?

This was the original goal of YouTube, as a project: automatic video transcription to enable better accessibility and indexibility for audio/video. It's theoretically in there, but it doesn't do well at all. I'm honestly surprised it doesn't do better, given the Google Voice dataset that's likely backing it.


Actually that was the original goal of Google Video, which never got much traction and was discontinued eventually.

https://en.wikipedia.org/wiki/Google_Videos


The solution is AAC, regarding which a number of prominent podcasters introduced FUD several years ago and now the podcasting industry is reluctant to give up MP3 -- unlike pretty much every other audio-related industry in the world. Seriously, MP3 was on its way out the year the Backstreet Boys released that song, and it surprises me that people still stick to it despite the VBR thing being a shitty problem in software that long. None of the solutions that Marco found are great, either. The solutions were dismissed decades ago in manners like this:

http://www.foobar2000.org/FAQ#why_is_seeking_so_slow_while_p...

Here's an example of podcasters talking each other out of AAC:

http://podcastingguru.com/aac/

His is not the first article I've read conflating "AAC" with some kind of "Enhanced Podcast" buzzword and then dismissing it based on the flaws of "Enhanced Podcasting." I left the radio industry in 2006 and we'd already transitioned all in-house production and delivery away from MP3 then. Probably 50% of the spots we received from external shops in 2006 were MP3.

At some point you have to sit down with your listener and say that MP3 player you bought at Target in 2001 is holding us back, man. Marco is wrong on this one: it's AAC that would deliver all the good things that he wants, not VBR MP3. Fun fact: all AAC is VBR, in a manner of speaking, and it's way better than the hack that is VBR MP3s.


The problem with AAC is in the container format, as Marco explains in his podcast: While some important metadata like chapter marks which a podcast player might want to display are within the first KB in an MP3 file, this same information is interleaved somewhere in the stream in AAC.

Which means that you can't have all features when streaming an AAC file, in a time when streaming podcasts is getting more popular every day. You first have to download the whole file before you can display e.g. chapter information.

Of course you could just embed that information in the RSS feed, but you could do the same with jump tables for VBR MP3s. So AAC does not solve all your problems, you just get a different set of problems.


No, this is incorrect. It's up to the implementation of the muxer to determine where index information goes, which is typically at the end of the file. This is simply because writing a file is a single pass operation, and you don't know the contents of the atom until you've gone through the all the data for each stream. Many muxers provide an option to place it at the beginning of the file by adding padding and overwriting it afterward, which solves the progressive streaming problem.


Weren't they talking about .mov files at the time?


The MP4 container was based on the Quicktime file format (.mov).


Thanks! Makes sense.


Those are all problems with a specific, objectively crappy container, as you've noted. There are many alternative containers and no reason the podcasting community couldn't come up with its own, such as by extending Matroska. This obviously creates separate problems, but I'd like to see the podcasting community start to shape its own destiny a bit rather than expect others to innovate for them or kludge older formats to do things they weren't supposed to do.

Why not a podcasting specific container that handles streaming and fixed cases, includes support for all the novel stuff podcasters want to do, and so on? Container development is one of the easier parts of multimedia, and there's too much thinking around a specific codec being tied to a specific container (just like your comment). Legacy support is of course the big elephant in the room, but I'd venture a well-founded guess that the average podcast listener is listening on a device that supports apps. (Maybe I'm wrong.)

I can't think of a single good container format. They're all rubbish. Even Matroska is arcane and very poorly documented, and suffers from the end goals and types of media it is used for being encoded in the design. Containers are a giant space in which to innovate and have been that way for years. Unfortunately, open source multimedia development is largely tied up by a certain group of people with a certain background and very specific use cases in mind.

That context from the podcast would have been awesome in this piece. Why wasn't it there?


One of the OP's requirements is that this work in iTunes, which precludes making a new container format.

I also think existing containers would meet the requirements - for example, Ogg meets the requirement of both being storable and easily streamable. Also, the Matroska spec is being updated to a new version, and being standardized in the IETF CELLAR working group. I'd highly suggest participating if you'd like to help improve both the format and the specification.


Codec Encoding for LossLess Archiving and Realtime transmission (cellar) documents: https://datatracker.ietf.org/wg/cellar/documents/

Matroska draft: https://datatracker.ietf.org/doc/draft-lhomme-cellar-matrosk...


I just don't understand why Marco has bitten so hard into this problem. It doesn't matter at all. Everybody skips the ATP theme song anyway, and VBR will only negligibly improve quality of the rest of the show (as Marco himself has pointed out). And it's not like the theme song sounds bad in its current state! I have never noticed any artifacts or flaws in the recording at all. This is almost literally only something the ATP crew would notice.

Assuming Apple fixes this VBR problem — unlikely — it would still be a far better idea to switch to something like AAC instead of using VBR and massively inconveniencing a) your entire audience on older iOS versions, and b) a large swath of your audience that uses players with equally poor VBR support. This whole brouhaha makes absolutely no sense.


The ATP theme song by Johnathan Mann is unexpectedly awesome, and I never ever skip it. And as Marco pointed out in the ATP episode, this also affects any musical interludes or short musical samples within any podcast.


Yeah, I'm sure lots of people listen to it all the way through. I did for a while myself. But it's not worth inconveniencing so many of your listeners just to make your theme song (or very occasional clips) sound a tiny bit better — especially considering the fact that most people listen to podcasts in their car or even through the iPhone speaker, as Marco himself discovered from his analytics. Others have already noted that the easy solution to this problem exists in the form of AAC, which is supported ubiquitously at this point. Another solution is to just use 128kbps CBR; a 1.3x size increase is really not that big of a big deal, and 128 is transparent for many use cases.

Marco's solution is way, way out of proportion to the actual "problem". You'd think the guy who so often kvetches about Apple prioritizing design over function would know better than to make the same mistake!

Also: I know you don't air your dirty laundry in front of your audience, but I'm a bit disappointed that a careful and analytical engineer like Siracusa didn't call Marco out on this nonsense. (Especially after the .marcosweirdformat discussion, which was downright comical given the inconsequential nature of the problem in the first place.)


What nonsense? He explicitly said in the episode that he studied some options and they were all bad ideas. He made a few million from the sale of Tumblr to Yahoo, has FU money in the bank and he is just scratching an itch and exploring his hobby. I don't see anything wrong with that.


The nonsense is that this is a solution looking for a problem. It's scratching an itch by trying to move a mountain. I am certain that nobody in the history of the podcast has ever complained about the sound quality of the theme song or audio clips, especially after they switched over to 96kbps. Whereas there is no doubt that any technical solution — especially any of the ones proposed, including switching to VBR — would inconvenience a whole lot of people.

(By all means, Marco should continue working in this direction if it interests him. If Apple ends up adding those ID3 tags to their MP3 decoder, it will have a net positive impact on the world. I just think it's a rather foolish endeavor.)


>I am certain that nobody in the history of the podcast has ever complained about the sound quality of the theme song or audio clips, especially after they switched over to 96kbps.

Source?

>Whereas there is no doubt that any technical solution — especially any of the ones proposed, including switching to VBR — would inconvenience a whole lot of people.

It sounded to me like the entire point of his approach is to find a technical solution without inconveniencing people.


"a 1.3x size increase is really not that big of a big deal" - you say this about Marco "All my image resources are vectors so the OverCast App is 5 Megabytes large" Arment? For the same reason that UTF-8 is awesome, VBR podcasts are likewise a perfect system, presuming we can get over the chapter marking hurdles.


Not vector, but drawn by code.


I've always divided 2D graphics of the kind you see in Apps (let's ignore 3D Graphics cards with shaders/textures/etc.. for a bit) into two categories - bitmap and vector graphics. Bitmap is pretty straighforward, and vector graphics (googling for a second) is "...the use of geometrical primitives such as points, lines, curves, and shapes or polygons" - which presumably Marco is doing to keep his app size so small?


It sounds like he got into it because it piqued his interest so he did a deep dive on the subject. Nothing wrong with that. I will get frustrated with people sometimes for not doing things that they ought to be doing, but I would never sh*t on someone for following their bliss and spending their hobby time on things I personally am not interested in. If this pisses you off then I can't imagine what sort of vitriol you would direct at someone who is into, say, quilting. "Aren't quilts solved problems already?!? You just go down to Target and buy one, WTF people!"


When your solution to an incredibly minor and easily-solvable problem is to ask people to bug Apple engineers about adding new features to a decade-old encoder, maybe you're on the wrong track.

I'm frustrated by Marco's extreme fussiness, not his curiosity. (And there's no doubt his curiosity often leads to interesting places. I learned some great tidbits about MP3 encoding in the last few episodes!)


> When your solution to an incredibly minor and easily-solvable problem

What solution is that which he didn't mention? He mentioned AAC, custom formats, just do it and ignore the share link issues (LOVE the share links).


I'm referring to the article. I presume that if Apple ends up supporting MLLT ID3 tags natively based on Marco's request, he'll feel free to switch his podcast to VBR without guilt. (Even though this will cause problems for all his non-iOS11 listeners.)


I think you are the one over-reacting really. Marco spent some serious effort investigating something interesting to him and discovered a hole in some important software infrastructure. We can argue about how important the hole is, but there's no doubt it's a hole and filling it in wouldn't inconvenience anybody.

Less seriously, why would you skip over the song? - it's a lovely little song and my only problem with it is that it's an earworm that I find myself internally singing all the time. "John didn't do any research .... da da da". Of course John always does some research anyway. Now if only John would occasionally concede a point or give the other two some credit, maybe just once, life would be perfect.


I mean, we're almost at 200 episodes at this point! Kudos to Jonathan Mann for writing a song that seemingly never gets old, but at this point it just feel like I'm brainwashing myself into remembering a bunch of Twitter handles.


It's not just improving the quality of the theme song, it's decreasing the size of podcasts. A 10-20% reduction in bandwidth costs isn't anything to sneeze at, especially for larger distributors like relay.fm etc.


Theme song by Michael Mann is one of the highlights of the podcast for me (Up there with finding out if Casey Liss's iMac is still crashing) - and, regardless of where I am in the world, I always enjoy singing along to it - though I do moderate if there are people around me; I have never once even been tempted to skip it, and any effort that goes into enhancing it is effort well spent (at least for this audience of one).


That's MP3s. AAC/MP4 has better efficiency, and I can't imagine any devices in 2016 don't support it...?


If only OGG support was more common :(


Opus if the free software lossy codec of choice now, I think. It can be put in an ogg container though.


Yup, .opus files are the Ogg container. While Opus is newer, both Opus and Vorbis are much better than MP3.


OGG Vorbis audio is supported by all Android devices, which is the vast majority of mobile devices in the world.


Is it a vast majority of podcast listeners?


There are many degrees of VBR MP3 seeking support, and it's often surprisingly bad. In the case of Firefox's <audio> element, it was very poor (nearly unusable in my testing) until a few months ago:

https://bugzilla.mozilla.org/show_bug.cgi?id=994561

https://bugzilla.mozilla.org/show_bug.cgi?id=1163667

I had a fun time testing different js audio players, and going though their many confused bug reports, issues and threads all stemming from this. Even if Apple fixes their stuff, I still wouldn't use VBR MP3 for anything that's going to get streamed. There's always going to be some platforms that screw it up, and even a future browser introducing poor seeking seems very possible.


You're correct about that there are different degrees of seeking support.

Seeking VBR MP3 with perfect accuracy is trivial by forward-reading. However, this is obviously highly inefficient on long duration seeks.

For instant seeking support, you need to (partly) depend on the optional VBR headers. This comes with its own set of issues, e.g., the most commonly used Xing header contains only 100 seek table entries, which may not provide enough resolution for large files.

I'm still surprised about the complete lack of support for those headers in AVFoundation, since I would consider it a low-hanging fruit in terms of improving usability for the majority of use cases (excluding pod casts).

Disclaimer: I've worked on MP3TrackDemuxer for Gecko/Firefox.


> However, this is obviously highly inefficient on long duration seeks.

Is it that bad? On the podcast, Marco claimed that seeking forward through an MP3 to find the correct time is pretty efficient, and that the reason for all this trouble is that streaming (specifically, requesting the correct byte offset from the server, if you want to jump to a specific time on the stream) is the problem he's trying to tackle here. He said that if it weren't for the streaming use case, he'd be fine with forward-reading.


He did say it wasn't an issue for downloaded files.

For streaming I believe he said you only get eight bits, which over a two hour podcast is about 30 seconds of precision. Then of course the podcast sometimes goes longer, not to mention how long Gruber's podcast goes.

Sounds like Matco is right it would work well for a 3 minute song but not s multi-hour podcast.


So is this why I have weird problems fast-forwarding through some NPR podcasts? I'll find myself at the end of the podcast (as indicated by the time bar), but the podcast itself is still playing.

Man, Apple really is neglecting podcasts. Reminds of an article I read recently - "Podcasts Surge, but Producers Fear Apple Isn’t Listening"

http://www.nytimes.com/2016/05/08/business/media/podcasts-su...


It's not Apple's fault; It's MP3 in general. Seeking in VBR MP3s is hard unless you decode the whole file and make note of the offsets for timing.


And decoding the whole file, in turn, would be easy in any other use-case (iTunes does this for "gapless playback" calculation) but podcasts are quite often streamed, and only once.

Probably the simplest "clever hack", if Apple wanted to fix this on their own, would be for all the podcast feeds registered through iTMS to get their files retrieved by the iTMS servers once, chewed through to calculate this information, and then all the calculated "keyframe" offsets injected as an opaque extra data field for each feed item of the iTMS-served podcast feed, where the Podcasts app can pick it up.


That article is a little misleading- it's actually about podcasters who want them to embrace/extend podcasting itself so that they (the podcasters, not the company) could better monetize podcasts. I can't imagine that the ways in which this might be done (DRM/subscriptions/user tracking/etc.) would appeal to HN readers, even if it would help the most popular podcasters make more money. IMO, benign neglect is the best possible situation for podcasting as a medium right now.


What are you talking about? Marco's been trying to avoid that, as opposed to what the Stitchers of the world are doing.


"That article" refers to the NYTimes article linked in the previous post not Marco's article. I was confused at first too.


Not a fans of VBR MP3. VBR is great. But as some has shared maximum compatibility is much more important. Let's do some maths ( Correct me if I am wrong )

At 128Kbps that is 1MB per minutes, an hour would means 60MB. Since most podcast uses 64Kbps we are talking about 30Mb, a saving of 20% means 6MB per client, 10,000 listeners saves you 60GB per episode, assuming a Weekly episode, that is 240GB....... on an 2015 Est of ATP listeners around 80,000, that is ~ 2TB of Transfer. That is the transfer a $10 Linode would give you.

So is it really worth the risk of VBR concern on a little saving worth $10 to $20?

Since i didn't exactly listen to the podcast, I wonder what problem does he has with AAC. Because the argument about most having a newer devices with proper support for VBR MP3 also works for AAC. And AAC sounds A LOT better.


You raise a good point, but if you're ever in a place with a shoddy internet connection, you appreciate optimized filesizes.


Not a fan of VBR MP3 files in general. They're more complicated to deal with when writing a codec, due in no small part to the seeking problem. And VBR is one of those optimizations that only works on things that didn't need optimizing in the first place (i.e., low-entropy frames.) There's just not much point to it.

If you need VBR MP3 for some reason, chances are you really need a better format.


Oh, that explains why when I pause music on my iPhone and resume, it doesn't continue playing from the same place.

Damnit, Apple.


How does one actually see the bug report? What is `rdar://27848317` supposed to open?


This in reference to Apple internal bug tracking system named "Radar", the number listed here is the bug ID and only Apple folks and probably the person that created that bug in the Radar system can see it


I assume this is the same problem I see in my car (VW) when streaming Spotify via Bluetooth? The progress bar in the car seems to go at double speed and runs out about halfway through each track.


I thought so too but it's actually a bug in the radio itself (if you're driving a MK6).

You can go to the dealer and reference Technical Service Bulletin (TSB) 91-13-03 and they'll do a software update to fix it.

--- There's another bug in that radio but I won't tell you until after you see if you want to get it fixed because its really annoying.


I actually didn't know that MP3s had a VBR option...I guess this shows how uncommon it is.

I use VBR Ogg whenever I can, and I always thought it was weird that MP3 didn't have support for that feature. I guess I was (as always) a bit ignorant :D.


> VBR encoding is far more space-efficient and better-sounding than constant-bitrate (CBR) encoding

While this may be true for podcasts, it's def false for music files.


How so? Any VBR encoder could choose to produce CBR at the targeted average bitrate. The fact that it doesn't means that there is some better setting in terms of either space or quality.


VBR is a strict superset of CBR (= all CBR files are VBR), so this has to always be true.

BTW, if you're going to claim any mp3 encoding is better sounding than another one, I hope you've done an ABX test.


Using a good encoder like LAME?




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: