

Show HN: Build a Kindle book of PG's essays - olasitarska
https://gist.github.com/4104455

======
phreeza
I did a script like this a while back, and asked pg before releasing it. He
asked me not to do it so I refrained.

edit: just checked my email, i actually asked him about putting up the epub,
not the script. but same difference I suppose.

~~~
reitzensteinm
I can't speak for how pg feels, but that is a _very_ different kettle of fish.

In your case, you'd have been modifying and distributing his content, rather
than providing users a tool for consuming content they have obtained directly
from pg.

Think hosting mirrored ad free sites vs creating AdBlock.

He might still have a problem with this (if only because of server load), and
it would have been a good idea to check first, but your experience isn't
evidence one way or another.

------
PanMan
Impressive how short this can be with the right libraries: 40 lines of code to
scrape an index, download the pages, get the right parts, and make them an
ebook.

However, wouldn't it be more efficient if one person would do this, and
publish it? Now everybody has to scrape PG's site. Thanks for the code tho!

What is the licence on the articles?

------
wslh
Use lxml.html instead of BeautifulSoup

~~~
kami8845
and PyQuery. Shameless plug: <http://doda.co/7-python-libraries-you-should-
know-about>

~~~
harpb
+1 for PyQuery - it is definitely my favorite out of them all.

------
chongli
I tried running it and it chokes on "Chapter 1 of ANSI Common Lisp". I think
that's due to the link being a txt file rather than html, causing an exception
to be thrown: "Error: URL doesn't exist".

~~~
olasitarska
Fixed, thanks!

------
tangue
Those essays are available for free on his site. If you want to scrap it for
yourself, you're in a grey zone, but it will be fine. But hey, authors deserve
respect : if an author wants to publish an ebook, he will.

~~~
georgeorwell
Why am I in a grey zone for manipulating a bunch of bytes that I have
downloaded into a format I find convenient?

The bytes were distributed legally by the copyright holder.

I downloaded the bytes legally using an ISP that I paid for.

The author did not use a robots.txt indicating his wishes that I not download
the bytes using an automatic tool.

The bytes are unprotected by DRM.

I have no intention to distribute the bytes to anyone else.

I have not broken the DRM on my ebook reader.

~~~
tangue
You can scrap for yourself (or in Instapaper, but that's another story), but
sharing a script and telling the others to do so is bad.

Call me old fashioned, but I'm thinking that an author has his word to say on
the way to structure and distribute his works.

------
riffraff
isn't kindle's format different from epub?

~~~
davidw
Yes, but if you email an epub to yourself, or use KindleGen, it can convert
ePubs. Under the hood, they're pretty similar.

------
sebcioz
Could you upload result - epub file?

~~~
dirkk0
This is the resulting ebook file:
[https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.e...](https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.epub)

And this is the (Kindle compatible) .mobi conversion:
[https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.m...](https://dl.dropbox.com/u/728316/Paul%20Graham%27s%20Essays.mobi)

The conversion was done through this service: <http://www.2epub.com/>

~~~
paulovsk
Thank you, I always wanted to read it on my kindle.

------
smartial_arts
Ola, this is brilliant!

~~~
olasitarska
Thanks ;) It's rather a simple script, but I've always wanted to read PG
essays on my Kindle.

------
denzil_correa
Super cool stuff. :-)

~~~
olasitarska
Thanks! :)

