

Project Gitenberg - mdturnerphys
http://gitenberg.github.io/

======
transfire
The idea has a lot of merit. So for that two thumbs up. But I would much
rather see a separate website for it. Using Github feels very strained.
Perhaps Github would be willing to help set you up with your own instance of
their platform which you could modify to better suit the purpose. Maybe even
Project Gutenberg would be interested in participating in that.

BTW, I recently learned the Gutenberg was _not_ his name and is really a
significant historical inaccuracy. His name was Hannes Gensfleisch.
"Gutenberg" was just one of the places his family resided.

~~~
a3_nm
> Perhaps Github would be willing to help set you up with your own instance of
> their platform which you could modify to better suit the purpose.

Alternatively, you could use, and customize, an open-source self-hostable
alternative, such as gitorious
[https://gitorious.org/gitorious](https://gitorious.org/gitorious)

> BTW, I recently learned the Gutenberg was not his name and is really a
> significant historical inaccuracy.

Do you have any source for this fact that Gutenberg wasn't using "Gutenberg"
as his last name?

~~~
csandreasen
I don't think the 'Gutenberg' name could be described as a historical
inaccuracy, but there is some truth to it. From Wikipedia:

 _Wallau adds, "His surname was derived from the house inhabited by his father
and his paternal ancestors 'zu Laden, zu Gutenberg'. The house of Gänsfleisch
was one of the patrician families of the town, tracing its lineage back to the
thirteenth century." Patricians (aristocrats) in Mainz were often named after
houses they owned. Around 1427, the name zu Gutenberg, after the family house
in Mainz, is documented to have been used for the first time._

------
DavidAdams
My biggest question is this: did the idea for this originate with the pun, or
did they think up the great pun afterward?

~~~
sethish
Idea first, pun after. But thanks!

------
prosody
What advantage does this offer over Project Gutenberg's own Distributed
Proofreaders[1]?

[1] [http://pgdp.net](http://pgdp.net)

~~~
fernly
As a long time DP contributor, I'd say the models could not be more different.
DP is elaborately and deeply organized around the model of a great many people
doing a great many small units of work (stop by once a day and proofread just
one page for very specific things). This minimizes the responsibility any one
person needs to feel for a book. And like the citizen science projects at
zooniverse, individual mistakes are corrected in a multi-pass process where
multiple people (5 or 6) see every page.

The new project follows the Github programming model, which means if a person
wants to contribute to a book she has to clone the book, make local changes,
push, issue a pull request. This is far, far more complex than stopping by
pgdp for a ten-minute proofing session. Very few of the drive-by proofers at
DP could manage that technically, or would want to be that involved.

Most important it lacks the QC inherent in the DP model of having multiple
later reviewers catching earlier errors. Who will do line-by-line vetting of
the accuracy of pull requests? Besides the inevitable detail mistakes, there
are potential problems similar to those faced by Wikipedia: who will even
notice if some local zealot decides to insert editorial comments in, or to
bowdlerize the language of, some classic?

It's true that DP's work ends when the book is posted to PG, and that PG has
only a feeble update method (email to their errata contact). I could certainly
see something like this project as an adjunct to PG, dedicated to continually
refreshing the library, but with editorial control over which books go into
it.

~~~
sethish
Oh yes. GITenberg is not a replacement for DP in the slightest. New books
aren't likely to come out of GITenberg as it currently exists. That is what
distributed proofreaders is for.

At some point, I would like to investigate DP tools and see if there is
something I could contribute.

------
kbar13
One thing I would like to see out of this project is a better version control
system for prose. Git is great for code, but it's not at all any good for
editing text.

~~~
spain
+1, in code the basic unit is usually a line, but in prose it is the sentence.
I've tried using git with LaTeX and it always ended up with weird situations
where you had to put each sentence on a different line to make it work
effectively.

~~~
azernik
I think this could be made easier by using the git database as a substrate -
there's nothing in that that's tied to line-oriented files (maybe the
compression algorithms make some assumptions?).

It's the diff-viewing infrastructure that needs to be completely replaced for
prose.

~~~
davvid
Indeed. git diff --color-words will often do the trick for prose, but the real
solution is plugging a sentence-oriented diff viewer into git difftool. git
difftool is extendable by setting git config variables, so it's very much
doable.

------
ldng
That's a great step. I was toying with a related idea last week actually. To
me the next great step would be to great around that a framework/tools to help
translation of of those ebooks.

What often happens is that editors have one translation of a book, say Les
Misérable, and keep reprinting the same translation independently of the
quality. So I was thinking that a github like platform to foster translation
would be a great idea. Looks like gitenberg might by the project just for
that.

But maybe it should pick a clone (gitlab ?), self host and fork/extend that
tool to ease the use so that non-developer could use the site without git
knowledge. Then again, tailorisation for translation might not be needed.

~~~
guynamedloren
> But maybe it should pick a clone (gitlab ?), self host and fork/extend that
> tool to ease the use so that non-developer could use the site without git
> knowledge

We're on the same wavelength here. I forked GitLab to make a 'github for
writers'. Still backed by git repos, but a simplified web UI for less
technical users. If you're interested in working together, lets chat.

~~~
kaoD
Hey, I was writing a very long comment about GitBook, Penflip, Leanpub,
Softcover... then found your profile and realized you're Penflip's owner.

I'll still reproduce the comment here in case you can get some insights or
anyone else is interested.

\---

I've been researching on Markdown+Git book publishing (inspired by 'Markdown
to Ebook'[0]) and found that there are already three 'GitHubs for writers':
GitBook[1], Penflip[2] and Arturo.io[3]. Each has its own strengths and
weaknesses:

# GitBook

Just a publishing platform backed by Git.

## Like:

\- Standalone app.

\- It is a bookstore.

\- Publishes to major stores.

## Dislike:

\- Ugly MSWord-like typesetting.

\- There's no "social collaboration" at all, seems like it's just backed by
Git. Not sure if small-scale collaboration (couple authors) is supported in
the app or you have to deal with Git complexities yourself.

\- Seems technical-oriented. No fiction categories.

# Penflip

Penflip seems to fit your idea more _(note: now I know why it fits your idea
:P)_ , being collaborative like GitHub.

## Like:

\- Collaborative.

## Dislike:

\- It has no integration with bookstores.

\- It's not a bookstore and you can't discover books easily.

\- Looks more like a "free books" platform using 'free' in the FSF sense.

\- It's hard to find a complete book to peek into, but the output seems to be
just as ugly as GitBook's. AFAIK they let you customize output, but seems
typesetting is not LaTeX-like and suspect won't be up to the job. Defaults are
very important, it should be beautiful right out of the box.

# Arturo.io

Arturo.io's page is currently down (some cert error). It looked just like a
bunch of webhooks for GitHub. Seemed immature and not for less technical users
(still requires Git/Hub knowledge).

\---

As non-Git alternatives I found Leanpub[4] and Softcover[5].

# Leanpub

Social bookstore and publishing platform.

## Like:

\- Lean Publishing.

\- The author-reader interaction is awesome.

\- Their PDF output is beautiful.

\- It's a bookstore (90% royalties!) with social-network aspects.

\- Makes it really easy to create bundles, sharing royalties with other
authors, etc. Really awesome feature.

\- Tools for marketing are awesome. Integrated with Google Analytics.

## Dislike:

\- I can't download their toolchain (but a local workflow is somewhat
reproducible following 'Markdown to Ebook'[0])

\- Does not rely on Git, but in Dropbox. No proper version control.

\- No collaboration.

\- Does not publish to major bookstores (but allows you to do so).

# Softcover

Major Leanpub competitor. In the publishing aspect seems to be pretty much the
same, but their store philosophy is different. Their aim is not to be a
bookstore but just a payment processor. You deal with your own marketing, set
up your own domain.

## Like:

\- I can download their toolchain. As far as I can tell, I could self-publish
not using their platform without hassle. Not being tied to a provider is a
_HUGE_ selling point for me.

\- AFAICT still supports Lean Publishing with their generated landing pages.

\- Their PDF output is beautiful.

\- It's a book payments processor. 90% royalties!

\- Lets you control your own marketing, domain, etc. (has downsides)

## Dislike:

\- Since you control your own marketing there are no social network aspects.
Each book is supposed to be its own page. No bookstore. No way to explore and
discover other books.

\- A bit too technical.

\- DIY version control, no integrated collaboration.

\- Does not publish to major bookstores (but allows you to do so AFAICT).

Even though I'd love a Git-backed workflow I'll stick with one of Leanpub or
Softcover because of how beautiful they look. I can still Git it myself. Major
selling point for me and the non-techie friends I've been talking to.

The bookstore integration in both is a big selling point too!

I still consider Leanpub since I can replicate their toolchain and seems so
easy and powerful for my non-tech friends. Letting users discover your book in
the bookstore is really useful.

\--- EDIT:

Now that I know you're from Penflip I will summarize:

I can see you're not a competitor with publishing platforms. As far as I can
tell, you're more like GitHub, in the private-repo business instead of taking
a cut from sales. Penflip seems great for social collaborative stuff, but I
wouldn't choose it if I planned on selling my book.

As I said typesetting is very important. Your platform is awesome, but
rendering really put off my friends. Penflip books like like HTML rendered to
a PDF (which I guess they actually are). Did you consider moving to LaTeX-
based rendering for PDFs? Markdown -> LaTeX -> PDF is the way to go.

Git is a great selling point, but secondary. Book authors just don't know it
yet, even though it's one of those features that you just love when you try.

\---

[0] [https://leanpub.com/markdown-to-ebook](https://leanpub.com/markdown-to-
ebook)

[1] [https://www.gitbook.io/](https://www.gitbook.io/)

[2] [https://www.penflip.com/](https://www.penflip.com/)

[3] [https://arturo.io/](https://arturo.io/)

[4] [https://leanpub.com/](https://leanpub.com/)

[5] [https://www.softcover.io/](https://www.softcover.io/)

~~~
sethish
This is a fantastic overview of this part of the publishing space. GITenberg
has a mailing list, and needs to start collecting breakdowns like this. I
would love it if you would join us:
[https://groups.google.com/forum/#!forum/gitenberg-
project](https://groups.google.com/forum/#!forum/gitenberg-project)

~~~
kaoD
Thanks for your kind words, I recently researched the topic and thought
someone might benefit from it.

I fail to realize how this could be useful for GITenberg though. Do you intend
to publish the books or automate the publishing perhaps? If so, as far as I
can tell GITenberg files are not structured, and won't lend themselves easily
to automated publishing.

I guess the great thing about GITenberg is anyone could do their own
structured .md version and request a pull. Would be cool with some automation
to generate and release cool PDFs if .md file is available.

------
chrisballinger
Congrats Seth! I had to unfollow you while you were making all those repos
because it clogged my feed.

GitHub should really put some work into improving their feed algorithm so one
project can't just clog it all.

~~~
sethish
In retrospect, I didn't need to make 80k+ commits with my own account.

------
FesterCluck
Has any consideration been given to works which may start in this platform? My
wife is an aspiring author, and we'd like more information. I'm sure there are
many topics to cover, and we're interested in hearing all of them. However, I
specifically wonder about the adoption of open source licensing to such works.

Thanks.

~~~
chippy
I imagine it working similar to the original Project Gutenberg site:
[http://www.gutenberg.org/](http://www.gutenberg.org/)

For example, known books by established publishers, but with a self-publishing
arm [http://self.gutenberg.org/](http://self.gutenberg.org/)

------
atheken
This is interesting, but I am not sure that I would have done it with multiple
repos. Why not build a single repo with a convention for adding/updating
works. As it sits right now, there are 2100+ _pages_ of repos. It also means
that in order for me to contribute to more than one of these, I'll need to
pollute my own account with multiple forked repos.

~~~
atheken
From another perspective, one repo should allow you to gain more traction as
all stars/forks/pull requests/commits will be aggregated on it, and thus
produce higher visibility on GitHub (and probably anything that scans github
stats).

Additionally, using a single repo would allow me to fork and specify my own
styles that I want applied to any work I "compile", and these might be hyper-
specific.

I'm actually willing to help consolidate these repos if you're willing to go
in this direction. I'd also like to hear reasoning for multiple repos if
there's something I'm missing.

------
dredmorbius
Nice, but NB that page is _REALLY_ hard to read.

    
    
        body {
            color: black;
            font-weight: normal;
            font-family: verdana;
        }
    

Helps a lot from my experience.

~~~
sethish
Gah! This website was thrown up quickly. I wasn't intending to post to HN
until after I had fixed up the website. Someone beat me too it :-S

Pull requests welcome:
[https://github.com/GITenberg/gitenberg.github.com](https://github.com/GITenberg/gitenberg.github.com)

~~~
mdturnerphys
Sorry about that :-/ A librarian friend posted it yesterday and I thought it
worth sharing here. I would have held off if I'd known the creator was on HN.
I do feel guilty about racking up all this karma.

------
lucb1e
For anyone else who finds the font too thin and light to comfortably read,
this helps:
[https://readability.com/bookmarklets](https://readability.com/bookmarklets)

------
alessiosantocs
I really love the idea behind this! I think it's a way to disrupt the books
industry with all those editor firms. What's powerful about this is that every
person could be listened and her book could easily spread around the globe.

I found [https://www.penflip.com/](https://www.penflip.com/) a few months
ago... It isn't focused on building a digital library yet but what I like of
this project is the good execution. It would be nice to merge them together!

------
ryanackley
I like the idea of git for ebooks. That being said, a lot of the free books
available from project gutenberg have been around for quite some time.

Besides translations, what can people besides the author contribute? Doesn't
it, on some level, ruin the character of these books? If you look at a non-
fiction book from 80 years ago, is it worth bothering to correct the
information when you can probably find it at your fingertips on wikipedia?

~~~
gavinpc
Like others, I am doubtful that this is the best way to go about it.

But to answer your question, the main area where I've found Project
Gutenberg's epubs could be improved is in their navigation outline (the
toc.ncx file). For example, they often use top-level headings for each line
from the title page, then put the entire book under the last line. Whereas
other books are closer to what you'd expect, albeit at inconsistent levels of
detail. For my project, I abandoned their TOC's altogether and created a
simpler format.

The images are also at a bare-minimum of resolution. In some cases, higher-
quality versions are available in the public domain (such as on Wikimedia
Commons). Most of the books are also scanned on archive.org, and so can be
referenced there in facsimile. These tend to be higher-resolution scans
(although those are all monochrome that I've seen).

For corrections in the works proper, I have occasionally submitted corrections
by email but never received a response.

Otherwise, they are perfect, and I thank them for their outstanding work.

 _EDIT_ : There are also rare cases (I think Seneca was the one I came across)
where the id's are not unique across the book, even if they are within the
HTML files. I couldn't find anywhere in the EPUB specification that would
require this, yet for practical purposes I think they should be made unique
across the book, since the division into HTML files is arbitrary.

Further to that, there are some PG books that have a unique (serial) ID on
every paragraph. Again, this is not required, but it's extremely helpful when
it's there (for anchor referencing). It would make the whole library more
usable if this were applied consistently, and the serial id's are apparently
mechanically applied.

------
sethish
If folks are interested in contributing, the mailing list is here:
[https://groups.google.com/forum/#!forum/gitenberg-
project](https://groups.google.com/forum/#!forum/gitenberg-project)

------
gluejar
One obvious need is for a build system that makes ebook files out of the git-
managed source. And what should our source be, anyway?

------
fiatjaf
Why don't you add some kind of index/search?

~~~
sethish
Because parsing the original metadata from Project Gutenberg is time consuming
to write. I wasn't going to submit it to HN until I had an index/search api,
but someone beat me to it.

~~~
arafalov
But what/where is the metadata? Is it functionally equivalent to the
Gutenberg's info (e.g. in the RDF dump). Or something else?

I was looking to write an alternative search for Gutenberg, based on the RDF
dump, so would be happy to collaborate/discuss ideas.

~~~
sethish
Yep. The RDF/XML data. I have a mirror of it on github:
[https://github.com/sethwoodworth/PG_rdf_metadata](https://github.com/sethwoodworth/PG_rdf_metadata)

I would love to have a complete python parser for the metadata. I strongly
recommend collaborating with the Gutenberg package posted to HN a few weeks
ago (and his rdf branch): [https://github.com/c-w/Gutenberg/tree/migrate-to-
rdf](https://github.com/c-w/Gutenberg/tree/migrate-to-rdf)

GITenberg has a mailing list and would love to have you!

[https://groups.google.com/forum/#!forum/gitenberg-
project](https://groups.google.com/forum/#!forum/gitenberg-project)

------
Taylorious
I don't understand the weird obsession with Git. Its a version control system
not the cure for cancer. Anytime someone shoe-horns it into a product they
talk about how Git is so amazing and solves all these problems, but what they
are really talking about is just a version control system, not Git
specifically.

Using Git for just about anything other than what it was built for is a
terrible idea. I mean the underlying system is incredibly powerful and could
be useful in various projects, but the interface is horrific. I swear its like
someone tried to make Git as difficult as possible to use. Programmers have a
hard time understanding and using Git, non-programmers will just laugh and
walk away. Every time a programmer has an issue with Git, whoever helps them
has to sit down and explain the underlying system for 20 minutes and draw a
bunch of sticks and bubbles. Non-programmers will never put up with this.

~~~
hhsnopek
> Every time a programmer has an issue with Git, whoever helps them has to sit
> down and explain the underlying system for 20 minutes and draw a bunch of
> sticks and bubbles.

This isn't true at all for a lot of people. I know a lot of people that just
read the docs and are able to solve the issues. Others will Google the problem
and find the solution on stack overflow. Everyone learns differently...

> Anytime someone shoe-horns it into a product they talk about how Git is so
> amazing and solves all these problems, but what they are really talking
> about is just a version control system, not Git specifically.

Git is amazing and does solve a lot of problems, but there are problems that
aren't solved by Git. Even Linus himself says this here:
([https://www.youtube.com/watch?v=4XpnKHJAok8](https://www.youtube.com/watch?v=4XpnKHJAok8)).

Using the github API, rather than git, for creating epub books and pdfs is a
great. Using git to control changes as the do is perfect as well.

> Non-programmers will never put up with this.

Ermm don't assume that everyone gives up right away. With the GUI interfaces
we have today, Git is really simple once you learn it.

~~~
recursive
> Git is really simple once you learn it.

Pretty much everything is simple once you learn it. That's what learning is.
But git certainly doesn't go out of its way to make that process easy.

~~~
Dylan16807
>Pretty much everything is simple once you learn it.

I wouldn't say so. A lot of things are designed-by-committee implemented-by-
the-lowest-bidder messes that are painful and complex even once you know how
they work.

Git may have some weird design decisions but for the most part it's well-
implemented and follows a simple conceptual model.

