

GITenberg status report - sethish
https://groups.google.com/forum/#!topic/gitenberg-project/i3gV2OjEeAQ

======
jpfr
Taking asciidoc as the master format is already far ahead of the Project
Gutenberg approach of not having a master format at all!

Now, in order to attract manpower, GITenberg needs to empower users who want
to make their favorite book "nice".

A proper table of content, footnotes, images, formatting of tables,
quotations, prose, etc. There is so much that can be done in addition to the
current (often times pityful) state of the books. Imagine. With the master
format, this is the last time anyone needs to do this kind of work. And
anybody can read your favorite book for free. Nicely formatted and on any
device.

~~~
acabal
I'm currently working on a project that does just that. The goal is to take
texts transcribed at Gutenberg and compile them carefully against strict
typography and quality guidelines. Master format is epub and texts are also
carefully annotated with semantic tags. It's not quite Gitenberg in that our
goal probably won't be to process every Gutenberg text, and the point is to
also to process each ebook against an opinionated standard. The project's work
is also released to the public domain.

I've got around 40 books done and I'm 90% ready to launch the project at
standardebooks.org. Drop me a line if you're interested in learning more and
maybe contributing. Contact info is at the site in my profile.

~~~
sethish
If you're interested, I'd love to combine forces. Eric (the author of the
newsletter) joined up with GITenberg. Github gives a LOT of surface area for
additional contributors. And with our library initiative, we have the
opportunity to distribute our ebooks directly to libraries. Meaning your
standard ebooks can be the standard ebooks for libraries.

------
sethish
GITenberg status report. \--------------------------------

Seth started GITenberg back in September of 2012. It was pretty much a one
person effort. Through this mailing list, a few other people started thinking
about what it could be. I discovered the project and joined up in March of
2014 when I was exploring similar ideas. The project got some good exposure on
Hacker News last August.

### Knight Foundation Grant When I heard about the Knight News Challenge for
Libraries, I suggested to Seth that GITenberg might be a good fit. Together
with Raymond Yee, Seth and I put together a proposal. We got help from Jenny
Lee, Phoebe Espiritu, and Emily Nimsakont.

[https://www.newschallenge.org/challenge/libraries/feedback/g...](https://www.newschallenge.org/challenge/libraries/feedback/gitenberg-
modern-maintenance-infrastructure-for-our-literary-heritage)

There were 676 entrants in the News Challenge, and believe it or not,
GITenberg was one of 22 entries to receive funding. We've been awarded a
$35,000 "Prototype Grant", which will allow us to spend some real development
time to start turning the idea into something that really works. More to the
point, we have a deadline (in late June!) for demonstrating the GITenberg
concept.

Now the work begins.

### Next Steps

Aside from 45,000+ repos on GitHub (a significant achievement by itself)
GITenberg has so far been more concept than reality. If you tried to adopt a
repo and submitted a pull request, you'll surely be aware that the GITenberg
of today is more of a sketch than a working system. To make it a working
system, we'll have to assemble a lot of cooperating components. Thankfully
most of the components we need exist, and people are working on them. This
became very clear at the Hack day sponsored by New York Public Library in
January.

So I think it's important to make that sketch more explicit.

### Core Vision

The core vision is that for any text in Project Gutenberg, anyone will be able
to fork a repo, commit a change, and GITenberg machinery triggered by the
commit will derive ebook files and metadata products. The commit can be
submitted as a pull request, and accepted PRs will get fed back into Project
Gutenberg. We hope.

At this point, I should comment about Project Gutenberg. To fulfill its
mission, Project Gutenberg has to be very conservative in its processes and
operations. It doesn't have the resources to engage in speculative projects.
So while the Project Gutenberg is enabling the experimentation we're doing,
(and happy that we're doing it) we expect that GITenberg will need to prove
itself before the PG feedback is a real thing.

One thing that Project Gutenberg has been thinking about for years is the
source format for its texts. For a good while, that format was 7 bit ascii
text files, and there was a lot of resistance to migrating to anything more
"modern". Now, the plain text you get from Project Gutenberg is utf-8. Sort
of. The html files are maintained separately, and are not uniform; there's a
lot of hand-coding. Changing the source format to RST, XML or TEI has been
discussed. The PG ebook files (MOBI and EPUB) are built using a script called
ebookmaker which digests the html files. The HTML files are thus the "source"
files as far as the ebooks are concerned. It should be possible for us to
duplicate this workflow in the GITenberg machinery.

On the metadata side the situation is more obscure, and we're still working to
understand it. There's a set of RDF files, there are metadata records
associated with each ebook folder.

### Book Formats

We've surveyed the components now available, and we feel that we can also
improve on the existing workflow by migrating away from HTML as a source
format. At this point, asciidoc appears to be the best fit for a format that
can be a source format for the required product files, while at the same time
fitting with the established PG text corpus and the Git-based version control.
It looks like the best choice for ebook and web formats is the HTMLBook flavor
of HTML5.
[http://oreillymedia.github.io/HTMLBook/](http://oreillymedia.github.io/HTMLBook/)
There’s a converter for asciidoc that makes htmlbook files.
[https://github.com/oreillymedia/asciidoctor-
htmlbook](https://github.com/oreillymedia/asciidoctor-htmlbook) and css themes
that support htmlbook. We expect that alternate paths into HTMLBook can be
developed (or already exist) for LaTeX and TEI source formats. Pandoc has done
quite a lot.

Internet Archive seems like the best destination for GITenberg produced ebook
files.

NYPL Labs has done some really nice work on generating covers for PG texts, we
expect to integrate that work as well.

On the metadata side, we've started looking at YAML as an appropriate
serialization for PG-associated metadata. conversion to MARC and other formats
should be straightforward in the backend.

### Issues

Github itself has presented us with a set of challenges to address. The large
number of repos in the GITenberg organization breaks some Github tools. For
example, GitHub for Mac became unstable for me, and some 3rd party
integrations would time out when we tried enabling them. We broke our Github
pages. So we need to understand this better; Github support has been very
responsive. There's a separate organization "gitenberg-dev"
[https://github.com/gitenberg-dev](https://github.com/gitenberg-dev) that
we're using to let us easily work on code untill we fully understand how to
work with 50,000 repos; at this point, you probably don’t want to be a member
of the Gitenberg organization but you might want to join gitenberg-dev, even
if you’re not a developer.

The non-programmer usability of Github is another problem. We're going to set
up a "github for poets" sandbox to see if this challenge can be addressed.

Despite the Knight grant, and the efforts of some committed volunteers, this
is still a very small effort. GITenberg can't succeed without a lot of help,
cooperation, and collaboration. I hope everyone on this list will be help us
nurture that success.

Here’s something each of us can do to get the ball rolling: Decide on a
Gitenberg repo to contribute to. Star it in Github. Then add it to the list of
active repos at [https://github.com/gitenberg-
dev/wiki/blob/master/activerepo...](https://github.com/gitenberg-
dev/wiki/blob/master/activerepos.csv) (send a PR or create an issue
[https://github.com/gitenberg-dev/wiki/issues](https://github.com/gitenberg-
dev/wiki/issues) )

If you’re new to Github, instructions are at [https://github.com/gitenberg-
dev/wiki/blob/add_how_to/how_to...](https://github.com/gitenberg-
dev/wiki/blob/add_how_to/how_to.md)

There's a huge amount that we don't know, and so much prior work we've yet to
absorb but we're really encouraged by all the expressions of support we've
received. Thank you all!

