Hacker News new | comments | show | ask | jobs | submit login
The Internet Archive has digitized 25,000 78rpm Gramophone records (archive.org)
699 points by yurisagalov 6 months ago | hide | past | web | favorite | 97 comments

House of the Rising Sun. As interpreted by Josh White, advisor and confidant to F.D.R. Priceless ;)


I find myself on Internet Archive a lot during these dog days of summer. Delving into classic texts like Edgar Rice Burroughs A Princess of Mars or Aldous Huxley's The Perennial Philosophy. Discovering a forgotten H. P. Lovecraft story in the Weird Tales archive. Mining old time radio shows like Suspense for story inspiration. And using the Internet Arcade for screen grabs that can be used in retro-style game texture art. It makes me think I should do a better job of preserving my own output. You never know what future generations may find useful!

You might enjoy librivox.[1]

It's like the Project Gutenberg of audiobooks, recorded by volunteers. The recordings are new, but many of them are of old books, now out of copyright.

The quality of their readers varies, but there are some surprisingly good readers on there, such as my favorite so far, David Clarke, who did a superb reading of The Count of Monte Cristo.[2]

[1] - https://librivox.org/

[2] - https://librivox.org/the-count-of-monte-cristo-version-3-by-...

When I was in college in the early seventies Josh White Jr regularly performed in the local clubs. Heard more than a few stories about his famous father.


I just noticed the Weird Tales scans the other day. I was mainly looking for Robert E Howard stories but there are lots of other good things in them as well.

Thank you! Made my day. History is amazing.

TIL Josh White.

Fun to read some of these reviews, apparentl from random internet folk, like on jungle boogie - https://archive.org/details/78_jungle-boogie_the-bobby-true-...

Some guy just wanted to tell everyone some neat little facts about this thing he apparently knows a lot about. I find it fascinated how much people care to know about things like this.

EDIT: whoever this "arc-alison" character is, they're prolific - I'm finding their informational reviews all over this archive.

Not to take anything away from the effort, but most of that information is on the disk itself - and you can hear that it's not the same as the Kool and the gang track :-)

I feel really rude for saying that because I would hate to discourage people from contributing on the internet, especially to a project like this. And on that note, this project is awesome! As a crate digger, I can see myself spending a lot of time trawling through these tracks.

It's ok to copy information. Just makes it more available/searchable, for one.

Agreed. To clarify, I wasn't saying that it's not worth the effort, I was saying that in that particular case the reviewer wasn't showing a deep knowledge of the subject.

You should see the amount of esoteric knowledge on the reviews of Grateful Dead shows hosted on the Archive. It's impressive.


So many great shows too. Such an awesome resource for fans.

The records I clicked on have this notice

Digitized from a shellac record, at 78 revolutions per minute. Four stylii were used to transfer this record. They are 3.8mm truncated conical, 2.3mm truncated conical, 2.8mm truncated conical, 3.3mm truncated conical. These were recorded flat and then also equalized with NAB.

The preferred version suggested by an audio engineer at George Blood, L.P. is the equalized version recorded with the 2.3mm truncated conical stylus, and has been copied to have the more friendly filename.

I'm trying to guess but can't imagine what the reasoning for this is. I've tried A/B/C/D testing a few tracks on some crappy speakers and can't discern any difference.

While it's certainly admirable to try and digitize it as thoroughly as possible, I just can't see how a difference of 0.5mm in the stylus width is worth increasing your work load 4x times over (having to record each record 4 times rather than just once).

From http://great78.archive.org/preservation/:

"During the 78rpm era there are no standards for speed, stylus size, or record/playback equalization. Within the trade there is broad agreement that optimizing playback requires both knowledge of the documentation that’s available on these parameters for each label over time, and some amount of judgment. There are many reasons why judgment is necessary. One reason is that the disc may be worn from being played many times with the correct stylus size. Better results may come from using a different (“the wrong”) size stylus because it sits in a portion of the groove that is in better condition. But there’s no free lunch. Using a smaller size may mean a noisier transfer as it plays a less cleanly molded part of the disc. Using a larger size may increase tracing distortion that is the result of the larger size not fitting all the way to the bottom of the smaller grooves of higher frequencies. [...]"

I remember having a 78 rpm only Victrola, and it having an assortment of different sized styluses.

Looking at the "about" page for the project, it explains that they're using a special turntable with 4 styli that can record simultaneously. So it doesn't really increase the workload by 4 times to archive in this more thorough way.


That's extremely cool. With decent headphones (ATH-M50x), there is a noticeable difference between the styli, so probably worth the extra work for archival.

The next step up would presumably be a 3D-scanning method like IRENE: http://irene.lbl.gov/

Now that is cool.

I can't tell which is which from the filename, but I did a basic layman listen to https://archive.org/details/78_baby-its-cold-outside_frank-l...

1. "friendly filename" sounds good, little static/pop, etc

2. Super loud squeal thing in the background, yuck. Voices sound poorly equalized

3. Quieter than (2) but way more noise than (1), voices causing weird audio artifacts in my headphones (as if they're blowing their available range) and are radically changing volume in the middle of a line

4. Weird squirrely noises on high-volume peaks, sounds like crap on the loudest parts. Right channel is like, totally f'd in the A man.

5. Seems to be same as (1), but with the standardized filename. I gotta agree with mr audio engineer, it just sounds the best.

This may help you refine your intuition about why it could matter: https://www.youtube.com/watch?v=GuCdsyCWmt8&t=5m0s It's an electron microscope animation of a needle passing through a record's groove.

(I would note that due to the way the recording is made, it may be the case that a real needle would jerk around a bit more. On the other hand, it wouldn't have to jerk around much before this entirely stops working, so I'd guess in the end it's probably pretty accurate.)

Wow! Those were very good demonstrations of the storage formats

They aren't really incresing their work load by 4x because their turntables have 4 arms, as pointed out here: https://news.ycombinator.com/item?id=14961307

"Better safe than sorry" is a token mindset when archiving, especially when archiving lossy things.

> I've tried A/B/C/D testing a few tracks on some crappy speakers and can't discern any difference.

You might want to try with a half-decent usb dac and a set of good headphones if the goal was an a/b test?

And you'll need to be well past "crappy speakers" land for the DAC to matter. A good DAC helps with things like lower noise and better EQ at the extremities of the frequency range, and we're talking about 78's. Anytime a transducer (mic or speaker) is involved, it's the vast majority of your coloring right there.

It's possible you just got a good record as well. The benefits may be more prominent on records that are worn or otherwise have playback peculiarities. There has always been an often subjective art to vinyl, both in it's production and it's playback.

While I'm not sure about any specific record, the reason they'd use different styluses is probably because they don't know which particular style was used to cut the original media. The closer you are, the better the reproduction should be. Too narrow a needle and you'll wobble in the groove too much. Too wide and you'll end up ignoring higher frequencies and loose content. I'd also imagine the state of the media to be relevant, i.e. scratches and other defects will be picked up differently on each size of needle so you could even piece together a composite of each needle if one picked up a scratch and the others didn't.

If you really want to do this thoroughly, you'd probably have to sample the same recording from different records several times. Then you can use a "consensus" algorithm, that reconstructs the original audio in some optimal way. (But better yet to publish the original recordings, so others can still try different algorithms).

And India has gone ahead and banned them!


Yeah i dont know why its banned. Any idea?

Same here. There has been no notification yet. I am as lost as you. :(

Lots more info here for the curious: http://great78.archive.org

You can see a picture of one of the four-armed turntables here: http://great78.archive.org/preservation/

They had me make a Twitter bot that's tweeting out all of the 78s (with preview audio) as well:


More sample fodder for the EDM artists and rappers. Always a good thing.

The "Bibliothèque nationale de France" (national library of France) did the same kind of thing with hundreds of thousands vinyl records from their archive, including international ones published in France: http://www.bnfcollectionsonore.fr/

Interesting. Is there a way to search these archives?

Can't forget this one: https://archive.org/details/78_i-dont-want-to-set-the-world-...

The surface noise works well with the song.

Hmm, I mistakenly confused this song with the Opera one Andy played over the PA system. That one's apparently "Sull'aria" from Mozart's "The Marriage of Figaro."

I have no experience with this stuff, but I wonder if they could use a laser record player to capture the record, and then replay it with different simulated stylus sizes. Not exactly kosher probably, but could be an interesting experiment. Plus scanning time could be greatly reduced I imagine.

There've been attempts to do this [1], but I think it's probably not as easy, than simply playing it back.

[1] https://mediapreservation.wordpress.com/2012/06/20/extractin...

This is super cool without a doubt, but they could use an actual laser - not photographs to do it:


The laser turntables sound surprisingly bad because unlike a stylus a laser will not remove any foreign particles in the groove. Unless the record is perfectly clean, it's going to sound very noisy indeed.


The Laser turntable was a really bad idea, one that attempts to solve a problem that doesn't really exist in practice: record wear.

A stylus rides on the physical surface of the record. The information (audio) is on the physical height and x-position of the groove, not on the image of the groove. Thus the laser turntable also reads dirt and even damage that wouldn't be read if the stylus was used.

On a proper turntable with a proper (reasonable quality, not-worn) stylus, a vinyl record can be played in excess of 1000 times before noticing audio degradation, according to AES tests done in the '60s. Under an electron microscope it was reported that the vinyl surface appears to "flow" or "compress" under the action of the stylus.

78rpm "shellac" records are designed so the needle wears (!). For this, the osmium needles of the era should be discarded after two record sides. What happened in real life was that the needles weren't discarded, so they developed cutting edges that did damage the records.

Thanks for the extra info. Super interesting.

I don't think I'd agree that the laser turn table was a bad idea necessarily though - I mean if the laser turntable worked as you'd want it to, it could be really nice, because you wouldn't need to have any moving parts potentially, or fewer moving parts (series of mirrors to position the laser or something). And fewer moving parts -> machine will last longer.

It has more moving parts than a normal turntable.

Normal turntable, moving parts:

    1. turntable spindle
    (2). Motor (sometimes integrated into the spindle assembly)
    3. tonearm bearings
ELPJ laser turntable

    1. turntable spindle
    2. motor
    (3). tray loader
    4. tangential tonearm positioning motor
    5. laser servo mechanism for precise positioning
    6. laser focusing servo
Conventional turntables last a long time. In practice they last many generations.

Great point. How hard would it be to create a special cleaning machine?

Record cleaning machines have existed for a long time (dare I say as long as records have existed?). The best kind use vacuum and a surfactant to lift and remove dirt and grime. The cheapest manual ones just use brushes and liquid.

This is incredible. Thanks for sharing

I tried this decades back with a (then) expensive Hughes HeNe laser, a 20x microscope eyepiece for beam focusing and a phototransistor detector feeding a HeathKit audio amp.

The laser spot size covered about three grooves. But for playing three grooves at once, it sounded good enough to be interesting. Didn't follow up, however.

Certainly a bit of everything on there... :)


Any sound restoration software would greatly improve these recordings.

For example this one from 1902: https://archive.org/details/78_medley-of-emmetts-yodles_yodl...

I'm sure Izotope would give the RX license for free in exchange for a blog post (or any other audio software company).

While that's true, I suspect that for archive, the original is best -- anyone can (software licenses notwithstanding) apply current state-of-the-art restoration techniques, but if all we preserve is the restored version then it's going to be really difficult to apply a better restoration technique in ten years.

The Internet Archive already does automatic conversion to uploaded artifacts, and it's very much an additive process: The original file is always preserved, and the new versions and file formats are stored alongside it.

Looks like there are some recordings by Sergei Rachmaninoff himself [1]

[1] https://archive.org/details/georgeblood?sort=&and[]=subject%...

Imagine after World War 3, the aliens sift through the remnants of humanity, find this archive of digitized 78rpm records, and turn into mustachioed corduroy-wearing hipsters.

This is just great! Listening to these songs instantly sets me back to a relaxed inner state. Together with that sizzling noise of the gramophone record in the background, so calm and chilled.

I currently listen to "A Duke Ellington Panorama", just nice!

Thanks for that and keep up the awesome work!

Very cool that they offer 24bit flac downloads. I'm sure this sentiment is shared here, but I am always impressed by the efforts of this organization

Honest question: Why is archive.org using 24-bit FLAC when >51% of the internet tells you HD Audio is useless and can even worsen the listening experience?

"A consumer will never need 24-bit. Ever."[1]

"24 bit audio is as useless as 192kHz sampling"[2]

I would love to hear a good explanation so I can decide about the future of my audio library.

[1] http://gizmodo.com/5768446/why-24-bit-audio-will-be-bad-for-...

[2] https://people.xiph.org/~xiphmont/demo/neil-young.html

HD audio is useless if you want to listen to audio that somebody else has prepared.

However, if you want to prepare audio (or any media) yourself, it helps to have input material that's much higher quality than your output format so that you can mess around with it and still have something that's at least a little higher quality than your output format.

I guess archive.org hopes that these recordings will be remixed and re-used and incorporated into future creations, as well as preserving their original form.

I feel like [1] hints at the reason:

> Finally, the digital effects used in studios to mix music benefit from the higher 24-bit resolution file for microscopic processing duties.

Using 24-bit over 16-bit keeps the door open to any post-processing the user wants to do (e.g. restoration/enhancement). I very much doubt that they consider it necessary, or even useful, for general listening.

I think 16-bit is fine for basically every "enjoyment of music" use. I do have a hardware player capable of playing 24-bit audio, and I use it, but it's not some breathtaking experience.

When the source is analog (like these old records), I can see the argument for preserving them at the highest reasonable depth. The fundamental purpose of the archive is to preserve, regardless of whether or not people subjectively prefer the sound.

If the subjective experience of 24-bit is unpleasant to some people, you can always go down, but you can't add back bits that were never captured, so it's my opinion that 24 is the appropriate format here.

Of course, from a philosophical perspective, we're still talking about two methods for reaching infinity and which gets "closer"

How would one go about removing the pops and clicks from recorded audio programmatically?

I really like some of the audio here but it needs some post processing. The only thing I can find to do it is audacity and it doesn't look very friendly to scripting.

I was musing about this last night. Obviously there are standardized de-noise utilities (and I'm sure much better minds than I have given a lot of thought to those) but the fact that we have four separate recordings gives us more options.

To outline the starting assumption and desired ending points, what we have are four tracks with different needle sizes. The smaller needles tend to "wobble" in the track more (will pop/hiss more) but the larger sizes may miss fine detail (treble) that the smaller needles pick up. The assertion of the Internet Archive is that the needle that is closest in size to the one used to record the track will produce the best output, but again we are not limited to just a single track, we can programmatically combine them to produce a better output. The desired end-goal is a "clean" track with maximal spectral quality and minimal pop/hiss.

I think there are two distinct tasks here. Maximizing spectral quality and de-popping the track.

For the first task, my layman's description of "maximizing spectral quality" would be that we combine the frequency ranges that each track is "best" at. In other words the finest needle has the best treble, while the biggest needle has the cleanest bass. That might be implemented by some kind of averaging, or a weighted average (eg weight tracks that are "most different" from the average track, or from the cleanest track).

Then you de-pop the resulting track. In terms of machine learning, this should be something that is amenable to deep learning. If you train a net to identify what a "pop" or "hiss" is then you can have it directly produce a clean output, or produce a "pop/hiss track" that you can then subtract from the input waveform (same thing).

If you want something more programmatic, you could again play around with generating a "noise track" by subtracting the "clean" signal (biggest needle) from the "most detailed" signal (finest needle), perhaps also repeating this with each other track as well. The "noise track" would still have some signal inside it and you would need to apply some other method to further separate that out, but you would be working only on a portion of the total signal so in theory you would lose less detail than working on the whole signal.

At the end it is part of the long-standing problem of being able to differentiate "noise" (the pops and clicks) versus "data" (the audio of the recorded program).

If anything, let me tell you that "clicks&pops" can be greatly minimized simply by using a better "turntable+tonearm+cartridge" combination!!

Those pops and clicks are part of the charm!

Is there a way to stream these indefinitely on shuffle without having to pick each one manually?

Too bad it doesn't seem to be easily searched by label--from a historical perspective, it would be cool to be able to search for say, Paramount or Gennet or Okeh.

Though it's not in the facets you can indeed search by label (here categorized as Publisher) easily enough:


Cool, thanks.

Some very very good stuff in here. I've gotten pretty into 20s thru 50s music over the past couple of years. I usually buy compilations on LP, though, so it's a treat to find these straight off the 78s. A big portion of the stuff never even makes it to digital.

Just at a glance, I'm seeing The Light Crust Doughboys[1], basically a string band supergroup. Multiple members would go on to found famous western swing bands (Bob Wills, Milton Brown). Very proto-rock-and-roll -- listen to that electric guitar -- Elvis would cover some Western Swing numbers[2] in his early days[3].

Also seeing some older stuff, including a few recordings by the (arguable) best banjo player of all time, Vess L. Ossman[4] (from 1907). Pretty cool to listen to these march numbers and then hear them evolve into jazz/ragtime only a couple years later[5] (this is a recording by Fred Van Eps, the second best banjo player of all time, from 1914).

EDITS: seeing some other personal favorites:

Hank Penny, a favorite western swing singer of mine[6]. He usually does it hot/upbeat/fun.

Blind Blake, a guitarist who could play the fretboard like a ragtime piano[7]!

Oh, and here's the WWII-era Bob Wills I was waiting for[8]. Got that classic Leon McAuliffe pedal steel playing. No Tommy Duncan vocals, unfortunately.

Neat! An old solo Art Tatum[9]! Widely considered the best pianist of all time... And another, a whole album[10]!

Really classic early electric guitar playing on a jump blues number by T-Bone Walker[11]. I actually believe he's one of the first to use the electric guitar in blues.

Great steel guitar playing on this Gene Autry cowboy number[12].

Looks like there's a lot of Django for all you gypsy jazz fans[13]. Never heard this take on Avalon before, I dig it.

Lot more to dig through and lot of obscure stuff I'd like to give a shot, but I'm out of time for now...

1: https://archive.org/details/78_pretty-little-dear_light-crus...

2: https://www.youtube.com/watch?v=4wGCTFWhoqQ

3: https://www.youtube.com/watch?v=8bSVEA0ZAVw

4: https://archive.org/details/78_policy-king-march_vess-l.-oss...

5: https://archive.org/details/78_notoriety-rag_van-eps-trio-ka...

6: https://archive.org/details/78_get-yourself-a-red-head_hank-...

7: https://archive.org/details/78_tampa-bound_blind-blake_gbia0...

8: https://archive.org/details/78_texas-playboy-rag_wills-bob-w...

9: https://archive.org/details/78_deep-purple_art-tatum-mitchel...

10: https://archive.org/details/78_art-tatum_art-tatum-james-swi...

11: https://archive.org/details/78_t-bone-blues_les-hite-and-his...

12: https://archive.org/details/78_silver-haired-daddy-of-mine_g...

13: https://archive.org/details/78_the-quintet-of-the-hot-club-o...

No Pops? There has to be Pops:



(Though these are songs you can already find on CD or on Spotify.)

Absolutely! Was just trying to focus on artists people might not be familiar with.

One of my favorite early big-band tunes: https://archive.org/details/78_streamline_art-shaw-and-his-s...

Your recommendations would be much easier to follow if you linked them next to their context. Like this, there is an unholy amount of scrolling back and forth involved. :(

Sorry, I will change it (EDIT: unfortunately too late to change it now). I guess I didn't consider that. I typically would use inline links but HN doesn't support it as far as I know.

Just curious, what's the copyright on this kind of material?

From the project's "about" page:

"This collection has been made available for use in research, teaching, and private study only. Copyrights that may exist in these materials have not been transferred to the Internet Archive. The Internet Archive does not advise as to the copyright status of items in our collections. Our terms of use require that users make use of the Internet Archive's collections at their own risk and ensure that such use is non-infringing and in accordance with all applicable laws. It is the user’s responsibility to determine whether permission may be required for a given use of these materials, or whether such use is authorized by law."


This is one time I wish I would have read the comments first as I went and searched this information :)

Interesting. Thanks!

Copyright on music is one of the thornier areas of the law: to start with there's the dual nature of music copyrights (both the composition and the specific recording have a copyright). Possibly even more relevant here, sound recordings only came into the federal regime relatively recently in the 70s, leaving the works created before then to a hodgepodge of state laws.

The Archive is taking a fair use stance here, but as with some of their other projects there's a significant "ask for forgiveness, not permission" element at work as well.

I don't know about everybody here, but i am listening to so much new things to me on this archive that i'll definitely donate to the archive team today, congratulations for this fantastic job!

Wow, this is great! I've been a serious record collector for 20 years, but never got into 78s.

My eventual life goal is to do something similar with my Brazilian record collection... have the skeleton of such catalog at: https://www.novedos.com/collection.

This is the crowning gem from the Internet Archive (from the 78 RPMs and Cylinder Recordings collection).

Cab Calloway, The Man from Harlem


So the obvious win here besides archiving art is that this is out of copyright sample fodder*

*IANAL and this may not be the case for all the material but I'm sure that there is mountains of inspiration to be mined.

I'm curious about something and I can't find the answer on the web site -- Why were these recordings played and digitized in stereo when the records were mono?

> Why were these recordings played and digitized in stereo when the records were mono?

Probably to allow for better post-processing in case one groove wall is in better condition than the other. They seem to be going for completeness rather than hard drive efficiency.


> This means we deliver both groove walls of 4 different stylus sizes with and without EQ for a total of 16 channels of audio. The most comprehensive presentation of 78rpm discs ever!

What did they smell of? It was really unusual. Tesco, briefly, had an own brand hand soap liquid in the 1990s with exactly the same smell.

Is it possible to search based on genre or geographic origin?

New stuff for Machine Learning. GAN. Magenta.

thanks for the heads up. this is just amazing!

I am become The Avalanches, mixer of old songs

Very cool

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact