
I used ML to find public domain Krazy Kat comics in newspaper archives - jf
https://joel.franusic.com/krazy_kat/about/
======
vharuck
This is honestly one of the best uses of ML I've read about. A large amount of
boring work that isn't easily handled by a descriptive model.

~~~
jf
And, what's more, it was _easy_ \- I hope that people are inspired to take ML
further than I did, there's a lot of cool things waiting to be re-discovered
in newspaper archives.

------
JoeDaDude
I'm tempted to repeat this for the Little Nemo in Slumberland comic strip [1]
that ran in the early 1900's. Like the Krazy Kat comics, the Fantagraphics
collections of the strips are long out of print and expensive when found usd.

[1]
[https://en.wikipedia.org/wiki/Little_Nemo](https://en.wikipedia.org/wiki/Little_Nemo)

~~~
egypturnash
The Fantagraphics collections of Nemo are nothing compared to the Sunday press
collections - two amazing books the size of a newspaper broadsheet, with full-
size, lovingly-restored art. They were both $100 new and prices have only gone
up.

------
wodenokoto
I’m shocked how out of hand the price went for advanced auto ml for such a
tiny dataset.

I can’t imagine the ridiculous costs if one were to train on tens of thousands
high- or even mid-res images!

------
jungobongo
Made an account just to say your work is Awesome. As i read the title, thought
to myself "AI goes boink" Glad to see ai programs beings used in novel ways.

P.S: thanks for all the comic. Cheers :)

------
aaronbrager
Wonder why he didn’t use the Core ML image analysis tools built in to Xcode?

~~~
jf
... I didn't know that they existed? I just ran with the first thing I could
get working.

------
swamp40
Uh, why don't you publish your own $600 book now?

~~~
egypturnash
None of these books would have been $600 when new, and IMHO none of them are
_worth_ that (and I am a comics nerd who has spent a few thousand dollars on
reprints of old comics over the course of her lifetime, including the Kat). I
doubt any of the bots listing these collections on Amazon for $600 even
actually have a copy available to sell; there is a bunch of _weird shit_ that
happens with the prices of stuff that's out of print nowadays.

There is also a good bit of distance between "I have a bunch of scans of Krazy
Kat" and "I have a file ready to send to a book printer to turn into a bunch
of books of Krazy Kat". Scans need to be cleaned up, grey halftones need to be
dealt with (terrible things happen to them when you scan them), and multiple
sources ideally need to be checked to find the best possible copy - for
instance, compare these two images of the same strip, one from
HeritageAuctions.com and one from newspapers.com:

[https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-comics.ha.c...](https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-comics.ha.com-17113-17707.jpg)

[https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-newspapers....](https://d2tsqpgsubhmz2.cloudfront.net/1919-02-02-newspapers.com-457807708.jpg)

zoom in, look around; one is Herriman's original, with a stain it's acquired
somewhere in the century that's passed since he drew it, and a scribble of
non-photo blue pencil indicating an area filled with grey halftone when it ran
in the paper; one is a scan of a ragged piece of newsprint, with a halftone
that's closed up a lot in the printing and scanning process. Neither of these
is quite what you'd want to put in a book.

I suppose one could apply more ML to automatically try to clean up all of
these diverse scans into something worth putting into a book, but I feel there
is a _huge_ can of worms being opened up there.

~~~
jf
> There is also a good bit of distance between "I have a bunch of scans of
> Krazy Kat" and "I have a file ready to send to a book printer to turn into a
> bunch of books of Krazy Kat".

I assumed as much myself. However, based on the comics that I spot checked in
my Fantagraphics hard copies, I'm no longer sure that is the case?

In particular, I'm really curious how Fantagraphics missed the comic published
in 1922-10-29. (See the section titled "Figuring out dates for comics" in my
write up for details)

> I suppose one could apply more ML to automatically try to clean up all of
> these diverse scans into something worth putting into a book, but I feel
> there is a huge can of worms being opened up there.

I, for one, hope that someone opens that can of worms!

~~~
egypturnash
Have they been scraping the bottom of the barrel and reprinting stuff they can
only find shitty scans of lately? I haven’t been getting their recent volumes,
I kinda feel like I have all the Krazy Kat I need in my life right now.

I’m just gonna assume a humorous accident involving craft beer and flannel
shirts for the one missing strip. :)

