Whereas the "yellow book" CD-ROM standard got to take advantage of over four years of technology advancement. Plus it doesn't have a real time requirement, so there's plenty of time to calculate more advanced checksums and error correction codes.
In many respects, an audio CD is more like a vinyl record than a data storage format.
The design is nearly identical to what one would get if one simply literally (almost) digitized a vinyl record. A single long spiral of data on a disk. The most significant shift is that vinyl records read from outside in, while audio CD's read from inside out. Even the method of manufacture is essentially a clone (audio CD's are "pressed" just like vinyl records are "pressed", just in different pressing machinery).
> built with technology from the late 1970s. It had to handle data rates that were incomprehensibly massive for the time, so only the most insanely basic error handling was feasible.
Yes, the limits of late 1970's technology had a huge impact on the overall design, but at the same time the design also took advantage of the fact that audio when being played back in real time is reasonably tolerant of a small level of error resulting from the read-off of the disk. Most listeners would never hear the random errors because they either are below the noise floor of the amplifier tech. at the time, or because the resulting analog wave was close enough to not be noticeably different.
Equally significant (and equally interesting) is how they spin. Whereas vinyl has constant angular velocity—they spin at the same rate from beginning to end, resulting in lower information density as you get towards the inner grooves—audio CDs have constant linear velocity, meaning that the rotational speed gradually changes as the disc is played so that the information density is equal from beginning to end.
For the era, this is very cool tech. (I'm also awestruck about how they managed to get the laser head alignment to work perfectly back then, it's practically magical.)
Some drives max at an advertised 48x, because early CDs still exploded at 52.
It's actually the opposite: highest information density (musical notes played per surface unit) is in the innermost grooves.
Since linear velocity is lower in the center than the border, the size dedicated to each piece of data is larger in the outermost places, which may increase the quality of songs placed there.
I don't really know if the machine making the disc is precise enough and if the human ear is capable of hearing the difference.
Your definition is "musical notes played per surface unit", but I think that's inverted. Taking a musical note here to mean a fixed note length, more surface per note allows for additional expressiveness. You can even "fit more notes in" if the notes are a shorter duration.
A time span on the outer part of the record has more groove length dedicated to the same amount of playing time, so it has a higher information density in my book. I guess my definition for density would be groove length divided by playing time.
I don't really agree with that, or at least it only makes sense to me if you're willing to go at a level of abstraction so high that makes it relatively meaningless.
For one thing CDs store digital data, zeroes and ones, not analog like a vinyl. On top of that the data is written using 8-to-14 modulation (every 8 bits of data is written using a unique 14 bit pattern on disc). This provides very basic error correction and ensures that you never have very long stretches of zeroes. Then on top of that you have a cross-interleaved Reed–Solomon coding. You also have subchannel data that contains things like position in the track and other metadata. You also have a table of content at the beginning of the session to easily seek to individual tracks.
Besides the fact that it's disc shaped and read in a spiral I don't find many similarities with vinyls. Sure the format itself is largely driven by the music track format (which is why you still address CD-ROMs with "minutes" and "seconds" even though it's meaningless there) but then so is Spotify.
That's not a counterargument. That's literally what "digitized a vinyl record" means.
>The design is nearly identical to what one would get if one simply literally (almost) digitized a vinyl record.
The rotating disc format is simply due to technical constraints and not caused by vinyl mimicry (unless you also consider that hard disc drives and floppies are nearly identical to vinyls as well). The rest is completely different from a vinyl on almost every level, besides the fact that they both store stereo audio.
If anyone is terribly curious what it sounds like, I've put up an OGG copy here: https://www.dropbox.com/s/u88u1xpdmdxbqal/03%20%E5%BC%98%E4%...
Oh well. I'm sure it was a great learning experience anyway. :)
Edit: If anyone is more curious about this, a large amount of the rips in the TLMC were ripped by one uploader (or a group of people using a singular alias) on the Japanese P2P programs Share and Perfect Dark.
He was so prolific in the scene that he has his own Baidu wiki page (albeit, it's not as strict as Wikipedia with notability, I still find it impressive).
All of his rips, as far as I know, were affected by this "normalize" setting, so even though he ripped hundreds (thousands?) of albums, a large majority of them probably wouldn't be considered "archive quality".
Unfortunately, the majority of rips fall over EAC's normalize threshold, which means the volume is decreased, which ends up truncating some of the bits (assuming the album was using the full range of the 16bit audio).
Edit: I noticed you used the term "fidelity", so you might be talking about being able to perceive the damage to the audio, which is not necessarily what I'm talking about.
The point of EAC (Exact Audio Copy) is to make archival quality copies of audio CDs, this means as close to bit-exact as possible. Modifying the audio in any way would be counterproductive to that purpose, even if it doesn't necessarily affect the actual sound quality.
For this album in particular, the audio CRCs of the other tracks don't match up between TLMC and a fresh rip:
For comparing .wav files you can also treat them as plain binary data, and use standard diff tools. For example I used `cmp -l` to count differences between rips, and `vbindiff` to view them.
a) reverse one of the channels and them sum them and you'll sum will contain only stereo data
b) reverse and sum (a) with the original track to get only the mono content
However human hearing doesn't just import the signal, the initial processing is done during acquisition itself. Psychoacoustic lossy audio encodings like MP3 rely on this, an MP3 sounds to you very much like the original but a naive audiodiff would find huge differences.
When you've got a .wav file open in a hex editor, it's no longer really about the music itself.
As a fellow ripper of physical CDs in 2018, the point is that a correct rip of CDDA from the CD is the one and only way to be sure you won't ever have to rip that music again.
Any other file, any other delivery/download mechanism, any other format ... you'll be fooling around with that song again sometime. But not if you have the correct, error free rip of the WAV/PCM from the CD.
So it's really not about the sound vs. some other format ...
I've been ripping in AAC 256kbps for 15 years now and have yet to see any reason to revisit any of those CDs. Even if a more efficient encoding mechanism gains mainstream support, I'll just use that for new rips only since storage is cheap enough.
That's exactly the point. Using anything but a lossless codec for ripping music is an unwise decision.
However I store it on a local NAS as well as a an external SSD (in case I need it in a portable way, which rarely happens) - I opted for an expensive SSD because it's very light, small and portable, but a traditional external drive would handle music just fine if price is a major concern.
At the end everyone has different requirements, but since I consider ripping a huge amount of music a major task I wouldn't want to repeat, I'd make sure I do it right from the beginning - and opting for a lossy format doesn't fit into that idea.
I mantain both: a tree of lossless files and an exact copy already encoded as 320kbit mp3 that is automatically mantained (just because i can quickly cram those into a usb thumbdrive and they'll play anywhere -- i can't say the same about flac support).
I'm glad iTunes offers to transcode my lossless collection to lossy AAC on the fly before syncing to the iPhone (yes, some people still do that).
Of course my concern is storage in mobile devices. Yours seems to be flac support? However is that really an issue these days? I use ALAC (due to using iOS devices), but even then I never run into compatibility issues. Any worthwhile software/hardware supports either format - with Apple being the big outlier requiring ALAC.
Part of my original point was that I've been using AAC exclusively for 15 years now, and with all the content encoded in AAC now I don't see support for it disappearing in the next 15 either.
Rephrasing the question: What are people typically using these days when not going for FLAC? Some of my mp3s are definitely from the "HDD space is expensive" age 15+ years ago...
That doesn't necessarily mean it was ripped without errors. I think op was trying to make sure he has an exact copy, not just one that sounds alright.
Sure, op has no absolute guarantee, but the fact that his crc32 matches that of the AccurateRip DB is, apart from his statistical approach, another strong indicator that he actually got it right. The alternative is that either the other user who submitted the crc32 to the database coincidentally ended up with exactly the same read errors, or that it's a hash collision (not entirely unlikely given the size of the checksum).
It's reasonable to assume op got a perfect rip.
> That single AccurateRip entry for this album matched my CRCs for all tracks except track #3 – they had 0x84B9DD1A, vs my result of 0xA595BC09. I suspect that original ripper didn’t realize their disk was bad.
(Btw did you see the addition at the beginning of the article? Funky stuff going on...)
I didn't know this existed but I am grateful for you changing this and now I need to buy a new hard drive.
Were you a fan of Touhou back in its heyday, or is it just one of the many that rouse your curiousity?
It reminds me of traditional Japanese court poetry, where half the esthetic work is done by knowledge of allusions and borrowings and the poet is trying to express a familiar theme in a slightly better way rather than seeking for novelty like most forms of literature.
Or in simple terms: A large collection of cover songs for the soundtracks of a popular Japanese bullet-hell game series.
TTA usually. There is an Ogg Vorbis torrent about a tenth the size, but I'm not sure it's being kept in sync with the original TLMC.
It used to be the largest torrent in circulation and also the one that broke some client programs due to its huge size.
I've seen colour emojis used in emails to grab my attention (as opposed to something plain black on white)... And I know I don't like it.
A LOT of sites also do 'on-the-fly' unicode emoji swap-outs for little png images. Twitter and Facebook are a good examples of this. That way the emoji is always in colour regardless of the platform.
The next release was also announced with an emoji-decorated title...
I guess I'm old already at 32
According to HN, you're past tech retirement age. So yes, you should feel old.
e.g., take fluege.de (a flight comparison site) and flüge.de (a competing flight comparison site). The plain-ascii transscription of ü is ue.
Now, flüge.de can’t buy fluege.de, or use that domain, and has no alternative domain.
But if you show the punycode version in the URL bar, you just increased the phishing risk significantly, because you’ve trained users to enter their payment data on a xn-randomblabla.de page.
Can you have emoji domain xn--?? What about and emoji tld?
Meanwhile, the total number of TLDs was 1543 TLDs as of April 2018. As of May 2017, there were 255 Country code top-level domains.
8 / 1543 ~= 0.5% of all TLDs support emoji.
8 / 255 ~= 3% of all ccTLDs.
Yes. See this wikipedia page: https://en.wikipedia.org/wiki/Emoji_domain
There are no punicode TLDs, but there are unicode ones, and punicode ones could definitely exist.
By example for the unicode ones http://nic.xn--gckr3f0f/en.html (a japanese unicode gtld)
Yes. See http://xn--i-7iq.ws
Edit: HN does not let me enter emoji characters.
It's clean, it's simple, it's fast, and more importantly it's NOT phpBB.
What’s next, emoji in stock symbols? Might as well...
And I think it's good browsers will change it into the the dashed version.
1. Decide (policy, enacted by humans) for each TLD if its registry has rules that will prevent abuse, if so whitelist this TLD and show IDNs as text for this TLD, everything else is punycode.
2. Algorithmically detect "confusing" IDNs and show punycode instead for those.
For anyone interested, here are some links to the details of error-correcting codes (ECC) used on CD-ROM:
But I think CDDA possibly uses different (less) ECC... so info in links might not be 100% relevant.
It's a shame this cool tech is not being used anymore. As an information theorist, I used to be able to point to optical disks as an application of my field, and be like "see all that math is useful for something," but now I don't have anything shiny to point to anymore :(
I strongly disagree, there's many shiny things you can point towards.
Point to a smartphone (and also the relevant sections of the LTE/HSPA/UMTS/WCDMA/GSM / 802.11 standards documents); there's a literal panoply of coding/error-correction maths that's crucial to every single one of those standards. You can actually go to the ETSI website and find the LTE standard and get free PDFs with the exact parameters of all the codes they use, in enough detail to reimplement them yourself.
In those standards, there's all sorts of lovely little state machines for coping with errors, like LTE's "HARQ loop" where your phone will tell the cell base station that it didn't receive a chunk of data correctly; at which point the tower will resend a differently punctured version of the chunk of data, and your phone's modem will try to soft-decode the data with both versions to work with. Oh, and that exchange (including processing on both ends and radio transmission latency) takes under 10 milliseconds to complete -- the standard places an exigent and strict deadline on how long your phone has to respond with its acknowledgement/request.
Also hard-drive platters are extremely shiny (more so than optical disks) and those also use error-correcting codes, as do SSDs. Did you know that bit cells in modern SSDs are so small that the number of electrons it takes to affect a measurable voltage difference for the sense amplifiers is less than 100 (https://twitter.com/whitequark/status/684018629256605696)? All your data lives in those differences of tens of electrons per bit cell; no wonder there's ECC machinery hard at work in SSDs!
I googled "HARQ loop" and because I visit HN so much the first result was https://news.ycombinator.com/item?id=11151232, so I learned that the data-processing portion (where the decode attempt is retried) must complete within 3ms!
I've been wondering for some time about good (bandwidth-efficient) ways to do error correction/recovery in the area of general-purpose high-efficiency byte transports, and just in case, wanted to put this here in case you're/anyone is interested. How good would it be to throw TCP's "flood of ACKs" out the window and instead compare frame checksums (is CRC32 good?) of every say 32 frames, and send a bitmask (in this case 8 bytes) every n frames noting which were correct and which aren't? This would send ~32 times less data (and I just learned a TCP ACK is approx. 74 bytes! https://stackoverflow.com/questions/5543326/what-is-the-tota...)
(My interest/focus is within the domain of good ideas lacking widepread implementation/uptake. I've found that these always seem to be hiding in the woodwork.)
But if your error rates are high and your bandwidth is high you should already know how to fix this and get very slightly less bandwidth without errors, so why is this suddenly TCP's problem and not your transmission medium?
I replied to the parent comment, FWIW.
I want to design a protocol capable of withstanding both high and low error rate scenarios; everything from usage on 64kbps satellite to flaky home Wi-Fi to wobbly 4GX cellular performance around tunnels and buildings. The (slightly pathological) application I have in mind (mostly as a theoretical test) involves video and interactivity, so basically requires simultaneous high throughput and low latency (essentially worst case scenario). My goal is to build something that responds well to that, perhaps by being able to have its performance metrics retuned on the fly.
I read somewhere a while ago about how TCP is a poor general-purpose catch-all protocol, and that eschewing it where possible (cough where there aren't any proxies) will generally boost performance (if the replacement protocol is well-designed).
It would be awesome if I could to assign significance to and act on each incoming network packet. This would mean I wouldn't need to wait for enough packets to come through and be reassembled first. So packets could arrive out of order or get dropped and I'd be able to recover the data from what did make it through instead of throwing what I did receive away and just starting over.
Yes, a lot of work; I'm trying to figure out if it's worth it depending on the situation.
My original musing was about TCP ACK. If I'm following correctly, I now understand that apparently TCP ACKs apply to 728KB of data (1492 * 500 = 746000), so if a failure occurs... does 728KB of data get resent?! I presume not where the link is slow or congested or has a high error rate, but if that's still happening where a lot of data is being sent, that still feels quite high to me.
Thanks for replying; you helped me properly think through my idea and realize it would never actually work: I hadn't realized that I was assuming packets would arrive in order. If I use a 64-byte string to represent a 512-bit bitmask of ok/fail checksum tests, that would only work if both sides of the link are considering the same set (and sequence order) of packets.
Another thought, with far fewer moving parts (/ chances for imposition/assumption...), was to simply report a running success/failure percentage, with the remote end using that information to get as close as possible to 0% errors. That doesn't narrow down what needs retransmitting though. Back to the drawing board, then...
I have no delusions of reinventing TCP/IP, and I'm not even interested in reinventing the wheel. I do find networking very fun, but I guess I'm still working on understanding how it works, building an accurate mental model, and (in particular) honing/tuning the (very important) sense of knowing what solution approaches will work and what will fail.
TCP has Window Scaling, allowing high bandwidth links to have up to 2GB or so on the wire, and Selective Acknowledgement, allowing recipients to acknowledge ranges of correctly received packets if some are lost. I picked 500 entirely out of the air, actual numbers will vary enormously.
A TCP/IP "stack" automatically tunes the settings for each connection, if your stack is modern it will do a pretty good job. Work on improving this automatic tuning in consideration of both theory and real world practical experience is an ongoing topic of network engineering.
You are right though, for unicast networks there is little use of L3/L4 FEC. Maybe with the age of IPFS and other content-addressed networks we will come back to broadcast-FEC networks, like those used back when for 3GPP multicast services like e.g. TV.
Does this include things like turbo codes? I've been curious about FEC for a while now
The standard API to talk to the CD-ROM does not have a way to get the C1 and C2 error correction data. There is only a way to get 1 bit for each of the 2352 main data bytes after the C2 error correction to indicate that there was an error with it or not. But getting that data doesn't actually seem to be useful on the drives I tested it with, they all seem to have weird firmwares.
https://github.com/saramibreak/DiscImageCreator is kinda the gold standard for this.
Surely there's some forward error correction built into, well, pretty much any physical media these days? I can't imagine BluRay would need less of it than a CD.
QR codes, dude. ;)
ECC is still used in RAM, and bitwise math is still used in RAID arrays. Again, not shiny, but still valid examples.
I used it to rip a CD collection consisting of dozens and dozens of disks. It was a true zero click interface. It took no command line arguments - customization was via code editing - so it was not particularly user friendly in that regard.
Nowadays look for the downloads of purchased CDs on Amazon or use some GUI app - at least it retains settings from one execution to the next.
A custom CD ripper script was the first program I ever wrote that actually served a useful purpose. Back then, somebody told me not to bother and just use abcde. 16 years later, I have finally come around to follow that advice. :-)
I hear that they instruct you to run EAC under Wine if you have a *nix box.
Due to dependency shenanigans I couldn't run it on Debian so I created an LXC container on my server running Ubuntu 18.04, passed the /dev/sr0 block device and it worked.
I'd be curious how well cdparanoia handles his CD if at all. Furthermore I read a post on a forum where someone tested a bunch of drives with scratched CDs and the Samsung SH-2xxxx drives performed the best. No idea of the exact methodology he used.
(Before you ask. I didn't buy this drive based on his post, bought it 7 years ago for my desktop on a whim) Still gotten 100% track quality out of whipper with some minor scratched discs.
EAC was the standard because it was the only free tool that could bypass firmware error correction (with 'secure' mode) and log the whole process securely, in a way that could not be tampered with easily.
« EDIT: After further investigation, I no longer believe it’s a factory defect. If I write the beginning or end of the affected track to a blank CD-R and rip it, the rip fails with the same error! Give it a try yourself with minimal.flac. »
Uh... so what's going on there? Something that breaks the error correction algorithm, or what? Can anyone with a burner repro this?
It is likely "weak sectors", the bane of copy protection decades ago and of which plenty of detailed articles used to exist on the 'net, but now I can find only a few:
This comment just made me realize that I haven't had a system with a burner in it for probably about 10 years.. I'd be SOL if I needed to burn a CD for some reason.
Sadly, I wasn't any help other than confirming his issue.
No you actually can; it's called "soft combining": https://en.wikipedia.org/wiki/Hybrid_automatic_repeat_reques... and it's a crucially important feature in high-performance air interfaces (like LTE).
CD1 position 0xFFFF
CD2 position 0xFFFF
Yes. Soft combining (in systems designed for it) isn't even done on bits, it's done on the actual analog (well, a digitised version of the analog value, sometimes termed a "log likelihood ratio") values of the signal in question -- before any quantisation happens.
This isn't to say that you can't apply the techniques here (where stuff obviously gets quantised unless you can find a way to get the raw signal from the photodiodes), but you need access to bits with the least amount of error-correction involved (or find a way to infer what the statistics of the original raw bits are from what the decoding machinery outputs).
All those compensate for the various ways light may enter the camera (lens mount, lens frontmount, telescope housing, telescope lenses and mirrors) and let you subtract those out after averaging your result.
What OP was talking about was something akin to using the noise in a photo of one subject to reduce noise in another photo of a completely different subject but with the same camera.
That works, to a point, but not great.
There really needs to be a media museum, an international library of all that has ever been produced, from a hundred different recordings of the old composers to whatever crap your 15 year old neighbour is uploading to soundcloud.
Also there's https://interviewfor.red/
If we allow piracy like that to exist, how will the owners of the copyright be able to ensure that the works are kept accessible?!
This didn’t seem to be an issue of wear or damage – the CD itself was probably defective from the factory.
It would be very interesting to look at the surface of the disc under a microscope, to see if you can find the defective area --- knowing which track it's in, you can determine the approximate diameter at which it occurs, and then rotate around that diameter looking for abnormalities. The pits and lands are invisible to the naked eye but easily viewable with a light microscope:
At 1200 bits/mm linear density of a CD, the 5KB (40Kbit) defective section will correspond to ~33mm of track --- probably quite obvious if it is a physical defect. (A "logical" defect, where the bits are OK but the ECC is wrong, will not be apparent at the physical level.)
The text is across the entire diameter and I don't know why it would cause problems for only this track. Possibly the shape of the text?
edit: photo https://i.imgur.com/fdtAAPG.jpg
Note that CD uses 780nm infrared so what the human eye sees is not necessarily what the drive sees --- this is why "transparent" or "black" CDs work.
Finally I reassesed my assumptions and found a copy of the track in the wild -- hey! Same glitch. Then I emailed a friend who I knew also had a copy of the album -- same glitch!
Eventually found an obscure forum where someone also complained about the same glitch.
I always wonder how that made it out of the studio -- was it a glitch in the recording equipment of the concert? Surely they used multiple recorders?
I used to worry about this kind of thing, using cdparanoia and so forth on Linux to manage my music. When I went Apple I eventually went iTunes, which is a mixed blessing. I feel like it has made me able to appreciate more music more easily by lowering the cost barrier, but it has reduced the amount I appreciate an album substantially. And it's hugely inconvenient when they disable my account for strange reasons or my internet connection vanishes. So I may return to manual management someday.
I got a nastygram from the IT department for sharing my iTunes catalog. I replied that it was my understanding that the iTunes streaming was legal. They concurred, but said that format-shifting my albums had been illegal. I countered that I format shifted my CDs in the US, where it was legal.
In the end, the IT security people agreed that I was probably totally legal, but they asked me to please refrain from leaving the iTunes sharing turned on because it made their lives hard. That was a compelling argument, and so I turned off iTunes sharing. :-)
EDIT: I should note that format shifting was later made legal in NZ last I knew. Hopefully it still is.
As a used CD seller who often ships internationally with no complaints, I am interested in learning more about what is behind this statement.
If you ship CDs, I have only one request: please please do _not_ put bubble wrap _inside_ the jewel case! Some sellers do this and it just destroys those little plastic retainer teeth.
Till now it never failed on discs that current, modern and newish fevices had issues with.
Everybody who thinks that copyright is required should see the success of 2hu.
ZUN's blanket ban on commercial distribution makes it really difficult and expensive to legally obtain anything when the only distribution channels are specialized doujin stores -- it's still commercial, but the ones getting the profits are various middlemen and not the arrangers and performers.
This doesn't make sense to me. Why didn't this work?
The data is not terrible valuable and probably already mostly stored elsewhere, but I'd like to give it a try. Is there hope for ddrascue? I've had mixed results in the past. Are there better alternatives?
But I was hoping for something better.
Difference in data density? A cd stores ~800mb, a DVD ~4700mb.
My technique is akin to what an artist would do in "restoring" a damaged photograph. Often the glitch is only one data value being off, and it does produce an audible click.
A lot of value was lost when that site went down.
* The rip log generated by EAC includes a CRC of the audio data. I can calculate this independently to ensure the data is the same as what EAC wrote.
* The rip log itself has a checksum that can validate the CRCs are intact.
* I archive my rips in Google Cloud Storage along with a sha256. If my NAS's copy goes bad I can fetch the backup, and validate that it's the original data.
* Store par2 parity information alongside your cloud storage and/or NAS: https://github.com/Parchive/par2cmdline
* Use a file system with strong data checks and repairability, basically meaning, ZFS with a sufficient redundant set up. "zpool scrub" can do wonders, and you can guarantee backups to a different pool are identical to the source.