
Compact Disc Structure - arantius
https://byuu.net/compact-discs/structure
======
byuu
Emulator development has cross-sections across many diverse fields of computer
science, and I'm trying to share my experiences with all of them. If anyone
here has suggestions for future articles for my site, I'm all ears ^-^

While folks are thinking about CDs: I've not been able to get an F2/F3-frame
CD image. If anyone has such a thing (not copyrighted, just a CD containing a
text file or public domain music piece is fine), I'd love to get it so I can
implement a CIRC encoder/decoder! My GitHub repository (linked in the article)
has an RSPC Reed-Solomon encoder/decoder, a disc scrambler/descrambler, and
Q-subchannel encoder/decoder. I'm trying to build up a comprehensive ISC-
licensed C++ library for working with every frame level of CD data.

For folks interested in the lead-in / lead-out data, to my knowledge only very
few drives allow reading this raw data, namely older Plextor drives with the
0xd8 (READ_CDDA) command. If there are any hardware engineers who want to save
an entire _generation_ of CD history in a more exacting archival format, we
could desperately use either hacked drive firmware for a modern CD drive, or
custom CD reading hardware to preserve this data. Time is really of the
essence.

The vanguard of CD preservation right now, to the best of my knowledge, is
Claunia's DiscImageChef project:
[http://discimagechef.claunia.com/](http://discimagechef.claunia.com/)

I'm also working with the Game Preservation Society in Japan (
[https://www.gamepres.org/](https://www.gamepres.org/) ), which has archives
of thousands of rare and obscure Japanese CD media, so by all means if anyone
reading this can help, please let me know.

~~~
rossy
> _While folks are thinking about CDs: I 've not been able to get an
> F2/F3-frame CD image. If anyone has such a thing (not copyrighted, just a CD
> containing a text file or public domain music piece is fine), I'd love to
> get it so I can implement a CIRC encoder/decoder!_

I don't know if you've already seen this, but I've heard of someone getting an
image like this. They used it to discover what the AccurateRip drive offset
values mean in terms of the physical location of the subchannel data and
interleaved main channel data on the disc. There's a vague description of
their process in there as well.

[https://club.myce.com/t/offsets-handling-syncing-of-audio-
da...](https://club.myce.com/t/offsets-handling-syncing-of-audio-data-vs-q-
channel/100931)

> _Using my neat hardware setup I now have captured a precious 0.1 seconds
> worth of 588-bit channel data frames from a pristine audio CD, each frame
> containing, among other things, a subchannel byte and 24 bytes of audio
> data._

> _I can reproduce the CIRC, get the de-interleaved audio data, and also do
> the Reed-Solomon C1 /C2 stuff in software from my analog laser readouts -
> showing an error-free read, identical to what I get with my Plextor using
> somewhat more conventional methods._

~~~
byuu
That is exactly what I need, but unfortunately the post is from 2004, and he
didn't link to that precious sample of data =(

Still, some very interesting info there. It looks like if and when I get this
data, I may need a follow-up article that goes even deeper.

~~~
rossy
Hmm, poking around on that website, there's some more:

[https://club.myce.com/t/announcing-laser2wav-a-software-
only...](https://club.myce.com/t/announcing-laser2wav-a-software-only-audio-
cd-decoder/117197)

That tarball seems to have fallen off the internet, but there's also this.
Check out laserbits.txt. I wonder if that's the data.

[https://github.com/sidneycadot/Laser2Wav/tree/master/python](https://github.com/sidneycadot/Laser2Wav/tree/master/python)

------
userbinator
For those really curious, the standard explaining the detailed structure of
data CDs is free:

[https://www.ecma-
international.org/publications/standards/Ec...](https://www.ecma-
international.org/publications/standards/Ecma-130.htm)

DVD-ROM is here too, for comparison:

[https://www.ecma-
international.org/publications/standards/Ec...](https://www.ecma-
international.org/publications/standards/Ecma-267.htm)

 _And indeed there 's really not much point in doing so. Any disc copy
protection scheme trying to mess with CIRC codes, or worse, EFM codes, would
have a really hard time having any drives read the resulting discs._

Some copy protections did play around with EFM, leading to "weak sectors".
Amusingly enough, when I tried to search for info on it for this post, I found
an article previously linked from HN, and which quoted me...

[https://john-millikin.com/%F0%9F%A4%94/error-beneath-the-
wav...](https://john-millikin.com/%F0%9F%A4%94/error-beneath-the-wavs)

------
sohkamyung
Having helped to design some early CD-ROM drives many years ago (initially
using Philips CDROM components), the information here brings back memories.
:-)

On a hardware level, the first thing we do is do a eye pattern check to see if
all the modulated signals are coming in as expected [1]. If not, a hardware
check on components is performed.

[1]
[http://www.repairfaq.org/REPAIR/F_cdfaqd.html#CDFAQD_004](http://www.repairfaq.org/REPAIR/F_cdfaqd.html#CDFAQD_004)

------
nullc
I've long been sad that no one has yet built a state of the art decoder for
the recovery of error riddled CDDA discs.

A lot of old CDs are not holding up as well as advertised, resulting in an
actual loss of archived information.

The RS decoders used in typical CD players doesn't come anywhere near the
theoretical performance possible from the format, they're only used as erasure
codes. In particular, list decoding with a prior on the audio continuity
(using techniques like the Postfish declipper) should do really well because
of the the enormous and channel independent interleave used by CDDA... but
I've never seen it detecting. 30 years ago that sort of thing would have been
computationally infeasible, but it should be possible now.

~~~
byuu
> The RS decoders used in typical CD players doesn't come anywhere near the
> theoretical performance possible from the format, they're only used as
> erasure codes.

Reed-Solomon is fascinating, especially the way both RSPC and CIRC have both P
and Q parity. Depending on the order and how many passes you do of each, you
can recover more and more data. If you can guess _where_ the error is, you can
correct up to twice as many errors. And as a last resort for RSPC, you can
potentially brute force corrections via the checksum values.

Reed-Solomon error correction ends up being more of an art than a science, and
it's why different CD-ROM drives have different success rates at reading
damaged discs.

Explaining Reed-Solomon would be a fun article, but probably too intense and
niche a subject. Tons of linear algebra over galois fields, and intimidating
if not overly complex algorithms like Berlekamp-Massey, Chien search, and
Forney's formula (some of which can be brute forced these days.)

RSPC makes it even more complex with two overlapping channels of parity. CIRC
(cross-interleave) makes that even more complex still with differing delay
slots per byte of error correcting data. See pages 45 and 46:
[http://www.ecma-international.org/publications/files/ECMA-
ST...](http://www.ecma-international.org/publications/files/ECMA-
ST/Ecma-130.pdf)

~~~
tinus_hn
The same error correction is used in qr codes. To correct errors you have to
guess which bits are unsure.

~~~
nullc
You don't have to guess, in that if there are below a threshold number of
errors there is a unique decode which will always be right.

And a decoder that uses Berlekamp-Massey can find that unique decode with a
lot less work than trying every possibility. :)

------
pedrow
Fascinating article , thanks.

> Some bands got clever ... by rewinding, you would reveal a hidden "track 0"
> of audio.

Anyone know any discs that have this?

~~~
rossy
There's a good list here:
[https://en.wikipedia.org/wiki/List_of_albums_with_tracks_hid...](https://en.wikipedia.org/wiki/List_of_albums_with_tracks_hidden_in_the_pregap)

I was using Arcade Fire's Reflektor as a test disc for this a little while
back.

------
1wd
> older Plextor drives, which CD-ROM preservationists have the ability to
> acquire, although they are quite pricey

How could such a drive be identified? Is there an exact model? How pricey are
they?

~~~
einr
From
[https://github.com/saramibreak/DiscImageCreator](https://github.com/saramibreak/DiscImageCreator):

 _PX-760, PX-755, PX-716, PX-714, PX-712, PX-708, PX-704, Premium2, Premium,
PX-W5224, PX-4824, PX-4012_

I don't believe this is an exhaustive list, Plextor had a lot of good models
back in the day. You'll want to look at older, true Plextor models, all SCSI
or ATAPI. Once SATA rolled around they'd started rebadging drives from other
manufacturers that do not have the same capabilities (or quality)

Depending on the model you're looking at maybe $50-$150 on eBay, which is
pricy for a used old CD-ROM drive but not a lot of money for someone who
really needs one. The Premium/Premium2 models in particular are sought after
by audio professionals and (used to) get quite pricy.

~~~
aidenn0
$50 isn't that pricey. Similar to the cost of a new BD-ROM drive. I was
imagining a lot more from TFA. What's a good way to hook up an ATAPI drive to
a modern computer? I'm curious if mine still works.

~~~
markrages
It's just an IDE interface. You can buy USB or PCI-express adapters for $20 or
so.

~~~
aidenn0
FWIW not all USB/IDE adapters support ATAPI (I have one that does not). Looks
like there are plenty available that do though.

------
garaetjjte
I don't quite get why scrambling XOR is necessary. Isn't avoiding consecutive
bits already achieved by EFM?

~~~
tenebrisalietum
I think because of you can still get long sequences of 1's or 0's depending on
which bytes are next to each other. [http://www.reverse-
engineering.info/CD/EFM_Table.html](http://www.reverse-
engineering.info/CD/EFM_Table.html)

~~~
garaetjjte
But there are also 3 "merge bits" between each EFM code, so this shouldn't
matter?

edit: The problem solved by scrambler is that number of pits and lands must be
equal on average, or else drive couldn't distinguish pit and lands as
dark/light decision signal level wouldn't be longer average of output signal.
There is excellent description here:
[https://archive.org/download/CDCrackingUncoveredProtectionAg...](https://archive.org/download/CDCrackingUncoveredProtectionAgainstUnsanctionedCDCopyingKrisKaspersky/CD%20Cracking%20Uncovered%20-%20Protection%20Against%20Unsanctioned%20CD%20Copying%20-%20Kris%20Kaspersky.pdf),
page 33 in PDF under "Sync Groups, Merging Bits, and DSV" header)

------
jovica
asd

