

Thousands of early English books released online to public by Bodleian Library - diodorus
http://www.bodleian.ox.ac.uk/news/2015/jan-27

======
ommunist
Not exactly released. And not exactly to public. As the site says "Please note
that it is very rare for us to set up free trial access for individuals.
Individual pricing is not available and all trial requests will be considered
on a case-by-case basis."

~~~
tfgg
It seems to be available to the public
[http://www.bodleian.ox.ac.uk/eebotcp/](http://www.bodleian.ox.ac.uk/eebotcp/),
and on github at
[https://github.com/textcreationpartnership/](https://github.com/textcreationpartnership/).

That first site is remarkably poorly designed for actually finding the
information, a common theme I find for websites created by librarians. There
should just be a box at the top listing the download links!

Web developers working for libraries: Far too often I visit a site and and am
confronted by acres of text explaining what the project is, who is involved,
how to enter data in forms and all sorts of hand holding, BUT NOT THE ACTUAL
DATA! Usually I find there's some link hidden in the least visible part of the
page, like the lower right hand side, that actually lets me get started! Does
Facebook have paragraphs of text with a welcome message from Mark,
explanations of what Facebook is, how you use it, or does it just let you dive
in and get started?

edit: And despite all that text it doesn't explain what phase I and II are.

I should say this is an amazing bit of work and it's really important that
it's being released public domain, and a good sign of the direction libraries
are taking. It's just a little frustrating that the final delivery step is so
obfuscated.

~~~
ommunist
I would like to point your attention to this: You don't have permission to
access /cgi/t/text/text-idx on this server. Apache/1.3.39 Server at
eebo.odl.ox.ac.uk Port 80

------
idiotclock
One of the problems searching in these books is that there is no standardized
spelling in early modern literature.

There is a project called DREaM, at McGill to standardize for "distance
reading" (macro analysis).[1] It uses a program called VARD (a text
preprocessor trained to correct spelling).[2]

Strangely, this application is licensed with the creative commons. I think
this means that it is closed source. Does anyone know of any open source
alternatives?

It cannot handle such an immense amount of data,[3]

[1] [http://earlymodernconversions.com/introducing-
dream/](http://earlymodernconversions.com/introducing-dream/)

[2]
[http://ucrel.lancs.ac.uk/vard/about/](http://ucrel.lancs.ac.uk/vard/about/)

[3] [http://www.matthewmilner.name/2014/11/18/VARD-and-EEBO-
TCP/](http://www.matthewmilner.name/2014/11/18/VARD-and-EEBO-TCP/)

~~~
sp332
There are lots of different CC licenses. None of them are closed-source,
although there are some which don't allow modification (No-Derivs / ND).
Others disallow copying for commercial use (NonCommercial / NC), and a couple
allow almost anything (Attribution / BY, No Rights Reserved / CC0).
[https://creativecommons.org/choose/](https://creativecommons.org/choose/)

~~~
dragonwriter
> There are lots of different CC licenses. None of them are closed-source,
> although there are a few which don't allow modification.

Licenses that don't allow modifications are closed-source. At least, they are
inconsistent with the Open Source Definition (specifically, with criteria #3:
"The license must allow modifications and derived works, and must allow them
to be distributed under the same terms as the license of the original
software.")

[http://opensource.org/osd-annotated](http://opensource.org/osd-annotated)

~~~
sp332
Oh right, I was just thinking of open publication.

------
walterbell
15 years of work! Well done to those who funded and performed this extensive
effort.

Files are hosted by github and box. Will Internet Archive be included?

For marketing this accomplishment, a few choice examples of long-inaccessible
text may attract new readers.

~~~
pbhjpbhj
Presumably IA can just get the Box and add the files to their own archive.

~~~
walterbell
Not sure how IA works, but casual observation suggests they give upload
accounts to libraries and the cataloging is done by the library, not by IA.

------
Normati
"The text files were created by manually keying the full text of each work,
based on millions of digital facsimile page images"

!!! This is not silicon valley. I wonder how they ensure accuracy.

Link to the books [http://ota.ox.ac.uk/tcp/](http://ota.ox.ac.uk/tcp/)

~~~
th0br0
Well, maybe they used Amazon Mechanical Turk ;)

~~~
jbaiter
I know from a similar German project
([http://www.deutschestextarchiv.de](http://www.deutschestextarchiv.de)) and
they have two independent non-German speakers transcribe the digital
facsimiles to ensure that the transcriptions are as accurate as possible.

