I used ML to find public domain Krazy Kat comics in newspaper archives

vharuck · on June 28, 2019

This is honestly one of the best uses of ML I've read about. A large amount of boring work that isn't easily handled by a descriptive model.

jf · on June 29, 2019

And, what's more, it was easy - I hope that people are inspired to take ML further than I did, there's a lot of cool things waiting to be re-discovered in newspaper archives.

JoeDaDude · on June 29, 2019

I'm tempted to repeat this for the Little Nemo in Slumberland comic strip [1] that ran in the early 1900's. Like the Krazy Kat comics, the Fantagraphics collections of the strips are long out of print and expensive when found usd.

[1] https://en.wikipedia.org/wiki/Little_Nemo

egypturnash · on June 29, 2019

The Fantagraphics collections of Nemo are nothing compared to the Sunday press collections - two amazing books the size of a newspaper broadsheet, with full-size, lovingly-restored art. They were both $100 new and prices have only gone up.

bryanrasmussen · on June 30, 2019

I was thinking The Yellow Kid https://en.wikipedia.org/wiki/The_Yellow_Kid

jf · on June 29, 2019

Yes! I was hoping that people would be inspired to do something like that and I would be delighted to help you however I can. My contact info is in my HN profile.

wodenokoto · on June 29, 2019

I’m shocked how out of hand the price went for advanced auto ml for such a tiny dataset.

I can’t imagine the ridiculous costs if one were to train on tens of thousands high- or even mid-res images!

jungobongo · on June 30, 2019

Made an account just to say your work is Awesome. As i read the title, thought to myself "AI goes boink" Glad to see ai programs beings used in novel ways.

P.S: thanks for all the comic. Cheers :)

aaronbrager · on June 28, 2019

Wonder why he didn’t use the Core ML image analysis tools built in to Xcode?

jf · on June 29, 2019

... I didn't know that they existed? I just ran with the first thing I could get working.

swamp40 · on June 28, 2019

Uh, why don't you publish your own $600 book now?

egypturnash · on June 29, 2019

None of these books would have been $600 when new, and IMHO none of them are worth that (and I am a comics nerd who has spent a few thousand dollars on reprints of old comics over the course of her lifetime, including the Kat). I doubt any of the bots listing these collections on Amazon for $600 even actually have a copy available to sell; there is a bunch of weird shit that happens with the prices of stuff that's out of print nowadays.

There is also a good bit of distance between "I have a bunch of scans of Krazy Kat" and "I have a file ready to send to a book printer to turn into a bunch of books of Krazy Kat". Scans need to be cleaned up, grey halftones need to be dealt with (terrible things happen to them when you scan them), and multiple sources ideally need to be checked to find the best possible copy - for instance, compare these two images of the same strip, one from HeritageAuctions.com and one from newspapers.com:

https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-comics.ha.c...

https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-newspapers....

zoom in, look around; one is Herriman's original, with a stain it's acquired somewhere in the century that's passed since he drew it, and a scribble of non-photo blue pencil indicating an area filled with grey halftone when it ran in the paper; one is a scan of a ragged piece of newsprint, with a halftone that's closed up a lot in the printing and scanning process. Neither of these is quite what you'd want to put in a book.

I suppose one could apply more ML to automatically try to clean up all of these diverse scans into something worth putting into a book, but I feel there is a huge can of worms being opened up there.

jf · on June 29, 2019

> There is also a good bit of distance between "I have a bunch of scans of Krazy Kat" and "I have a file ready to send to a book printer to turn into a bunch of books of Krazy Kat".

I assumed as much myself. However, based on the comics that I spot checked in my Fantagraphics hard copies, I'm no longer sure that is the case?

In particular, I'm really curious how Fantagraphics missed the comic published in 1922-10-29. (See the section titled "Figuring out dates for comics" in my write up for details)

> I suppose one could apply more ML to automatically try to clean up all of these diverse scans into something worth putting into a book, but I feel there is a huge can of worms being opened up there.

I, for one, hope that someone opens that can of worms!

egypturnash · on June 29, 2019

Have they been scraping the bottom of the barrel and reprinting stuff they can only find shitty scans of lately? I haven’t been getting their recent volumes, I kinda feel like I have all the Krazy Kat I need in my life right now.

I’m just gonna assume a humorous accident involving craft beer and flannel shirts for the one missing strip. :)

jedberg · on June 28, 2019

Not a terrible idea. The beauty of public domain right here. You can curate a new collection and get paid for your effort.