I like old-fashioned ephemera...
If the text of an older work presents problems for a lot of modern readers, I would prefer an annotated version with a glossary or even full-blown explanations of antiquated customs, concepts, and words.
And I agree, it also takes away some of the flavour of an older book.
Not to sound too negative — the project itself looks interesting and of lasting value.
I know it's not what you meant, but one thing I'd genuinely love from a project of re-typesetting old public-domain works is to ensure that any "typewriter symbols" like "->" or "* * *" are turned into their proper dingbats ("→" and "⁂", respectively.) Also, ensuring the (double and single) quotation-mark characters are used, but that scare-quotes and actual measurements (6'1", etc.) don't use them. Just, general heavy-handed replacement of ASCIIisms with proper Unicode equivalents.
Though, once you start down that road, it brings you to interesting places: if a book says "ye olde shoppe", do you want to keep the "ye", or do you want to consider it a typographical limitation for what should have been properly typeset as "þe"?
Scare quotes are a adaptation of quotation marks for use/mention distinctions, and shoukd use proper quotation marks the same as anything else; measurements should use the prime and double prime characters.
# pease -> peas (but "pease pudding")
# régime -> regime (but "ancien régime")
# manœuvre -> maneuver (for the Americans)
# manœuvre -> manoeuvre (for the British)
# cosey -> cozy (for the Americans)
# cosey -> cosy (for the British)
Further they have script to 
> Try to convert British quote style to American quote style
So this is not 'standardized', this is 'Americanized'.
I was going to make a joke about how they'd probably try and get rid of the pesky 'e' at the end of Shakespeare, however it turns out, instead they actually have the opposite rule:
Shakespear -> Shakespeare
"God, my shepherd! I don’t need a thing. You have bedded me down in lush meadows, you find me quiet pools to drink from. True to your word, you let me catch my breath and send me in the right direction. Even when the way goes through Death Valley, I’m not afraid when you walk at my side. Your trusty shepherd’s crook makes me feel secure."
Don't go there.
Still, if you're not translating between languages, it's probably best to stay with the original. Compare, say, Kipling's "007". This assumes some knowledge of railroading in the steam era. "Modernizing" Kipling would be a terrible mistake. It's probably best to leave anything post-1800 entirely as original. And don't even try to "clean up" Shakespeare. It's been done.
I say this because there are some authors that I read who have revised their works many times, and sometimes I prefer an earlier version over a later revision.
Anyway, it's great to see books stored in git. Mark my words, this will be a big thing for education!
On the other hand–I'm not sure that someone, who has written an Emacs minor mode to perform automatic ſ-insertion where it is grammatical to do so, is necessarily best placed to speak to the general case here.
That's not old-fashioned spelling or hyphenation: that's old-fashioned typography.
A critical difference, because once you've corrected OCR errors so that you correctly have the spelling-as-in-original, but not done any “modernization”, there is no issue with long-s to addres.s
I want the old language warts and all. I want to learn how words were expressed in a historical context. That teaches me more than just the words themselves, that transports you to their time.
I did a search in the Google Groups about 'modernization' and you get quotes like this coming up  (about William Wollaston's "The Religion of Nature Delineated"):
> I've got a first draft ready, after about a month's steady work. I could use some proofreading help, in three areas in particular:
> 1. general typos (the use of the long-s in particular I'm sure has led to several)
> 2. suggestions for improved use of commas. They did this weirdly in the 18th century, and I think a new edition could do well by bringing the practice up to date, but it isn't my strong suit
... and in reply ...
> then it sounds like we'll have to do some significant spelling modernization
So they've moved from just 'tasteful' to 'significant'.
I don't see how you can draw a line for this kind of thing. But I guess I don't know very much about the painful process up digitising old books.
The original print edition of this novel contains a pathological number of commas—so much so that a modern reader would find them distracting at best and plainly ungrammatical at worst. The editors of this Standard Ebooks edition have made an effort to remove the most egregious cases of these ungrammatical commas, so that modern readers can better enjoy this unique tale.
I hope we did a good job, but if you disagree then all of the commas we removed were done in a single commit, that you can roll back: https://github.com/standardebooks/william-hope-hodgson_the-h...
You can also review the changes there to see if you agree with our judgment.
This book was a unique case. I think the only other time we've done a big editorial change like this was for Pride and Prejudice, also to remove some crazy commas, but for P&P there is a lot of precedent--other editions of Austen very often make those same kinds of changes.
What will they do with Jane Austen's inconsistent spelling, which was common at the time? https://duckduckgo.com/?q=jane+austen+spelling&t=lm&ia=web
Curious, who is the founder of this project? Interested to hear more about it's background and the team behind getting this off the ground.
A couple of questions:
- How come there is no search function?
- Why are authors sorted by first name?
- Do the results of the proof reading get fed back to Project Gutenberg, et al.
- Will readers like FBReader be able to add this catalogue?
So, sounds like a good idea and I hope it succeeds but it's not quite there yet.
There is no search function because our catalog is so small. It's growing though, so maybe it's about time to add a search function too. Sort by first name is an oversight that I hope to be able to fix later this week.
Our edits don't go back to Project Gutenberg, because our final files are so different from what PG produces merging would be impossible. We also introduce typographical and spelling changes that they might not want to accept.
Yes, FBReader and other readers that use OPDS can add our catalog: https://standardebooks.org/opds/
Edit: Just tried reading the Alice compatible epub file using Calibre 2.55 on Ubuntu 16.04 and it seems to render fine. Maybe you can send a note to our mailing list and we can discuss this in more detail so I can get things fixed?
"acabal was kind enough to respond to a comment that I made on Hacker News (https://news.ycombinator.com/item?id=14570035) and to ask me to bring the discussion here.
The problem that I mentioned was that the two epub2 files that I downloaded didn't render properly in Calibre on Debian even though they rendered properly in Ionic on Nokia N9 and FBReader on Android. It seems that the file has soft hyphens (thanks gwillen) that Calibre was rendering as tabs (or something similar).
In my comment I said that I wasn't sure whether the problem was the files or Calibre.
I have now tried another Calibre installation. This time on Linux Mint 17.3 Rosa and it works perfectly.
It seems that the version delivered by Debian doesn't render correctly. Unfortunately the version that renders wrongly is Calibre 2.5.0 on Debian whereas the one that works is Calibre 1.25 on Linux Mint. So I am still confused.
Time to download Calibre 3.0.0.
So, many thanks to everyone and good luck with the project."
Perhaps my I have a different calibre version, I'll check again. I opened the Budrys file in Calibre's editor and what I saw that a lot of perfectly correctly spelt words were highlighted as being misspelt. I copied some text from there to Emacs and then saw that there was a hyphen (actually I'm not sure exactly what character it is but it looks like a hyphen) at the point where the reader renders a space.
> FBReader and other readers that use OPDS can add our catalog: https://standardebooks.org/opds/
Your site looks very pretty but I feel that it is hard to discover things. Sometimes fewer or smaller graphics can make it easier to find one's way around.
Anyway, I do understand that it is tough to find time to make everything perfect (I'm a software developer and my to do list never gets shorter).
Thanks for finding time to reply.
You may also be interested in our toolset (GNU-compatible only at the moment, we're working on converting everything to Python but we're not there yet): https://github.com/standardebooks/tools
I'm happy to answer any questions anyone has. We're also more than happy to have new contributors, if you're interested in working on and proofreading a public domain ebook that you've been meaning to get to.
Some of you have mentioned concerns about the modernizations we do. The key word I think is "light modernization". Mostly that just means bringing spelling up to modern standards, and removing a lot of hyphens in words that are no longer hyphenated. A common one, for example, is to-morrow -> tomorrow. Another one we recently added was lacquey -> lackey. Generally we leave punctuation and grammar alone. I liken this to modern books replacing the "long s" character--it's just presentation that doesn't affect the meaning. Modern readers would rather see "successful" instead "ſucceſsful" even though the latter is what was originally printed.
I struggled for a long time with my desire to see older books with modern spelling and typography, versus preserving the intent of the author and original publishers. Over time I've come to realize two things:
1. Many books back in the day were heavily edited by the printer and publisher without the author's input anyway, so you'll get various editions over time that look totally different. Jane Austen books are a good example of this--early editions often have a pathological overuse of commas, while later editions published after her death just remove a lot of them without comment. So when we're producing our own ebooks, we accept that there's a level of editorial discretion involved, and that "the author's intent" was a very fuzzy and often totally ignored topic hundreds of years ago anyway. How can we tell what the author's intent was in the first place, if various printers and publishers have meddled with the editions for hundreds of years already?
2. For those of you who want to read the originals in their totally unedited form, other projects like Project Gutenberg or Wikisource already have those faithful transcriptions for you, and places like Internet Archive, Hathi Trust, and Google Books have the page scans for you. By lightly modernizing our own productions, we in no way diminish your access to the painstakingly-preserved digital editions; we're just adding another option for you to read.
Is there an API so that people can download the books (and help by offering a public mirror)? (example for Gutenberg:
Also, I think it might be interesting to have multiple editions in the future. It would be nice to support editions in reduced reading levels for works which are not artistic prose (such as translated prose, or philosophy). For example, in the first paragraph of On Liberty, Mill writes:
> A question seldom stated, and hardly ever discussed, in general terms, but which profoundly influences the practical controversies of the age by its latent presence, and is likely soon to make itself recognised as the vital question of the future.
This sentence has five clauses, and one parenthetical; difficult prose. The first chapter of On Liberty contains a total of 80 adverbs, ~69 uses of passive voice. It also contains a bizarre convention of referencing previous phrases with ordinals.
Are there many people who would struggle to read the original who are nonetheless sufficiently interested that they would read a 'simplified' version?
I've gone ahead and corrected the typo, thanks! :)
I've been meaning to read Don Quixote for a while now just downloaded it, hope I find the strength to start it.
I'm glad you picked Don Quixote, I personally took the time to transcribe the over 900 endnotes in our production from the Ormsby edition. AFAIK our edition is the only digital transcription of those endnotes available online--not even Project Gutenberg has them yet. Enjoy!
Slightly off topic: Would there be scope for a bilingual epub format for wide-screen devices displaying native text and translation side by side? This could be useful for Ancient Greek and Latin verse, Norse sagas, devotional readings of sacred texts, Hamlet in the original Klingon and even automagically machine-translated documents.
Dead-tree parallel texts have something of a market at the foreign language section of one's local book-store.
There's no subreddit but we do release announcements on the mailing list, and we have an OPDS feed you can plug in to your reading software of choice: https://standardebooks.org/opds/
People are concerned about how and why you modernise spellings and are wondering whether that could be optional - see comment by throwanem above. Can you comment on that?
Also how are you paying for this?
We're volunteer-based so there's nobody to pay but our web host, which so far has been mercifully inexpensive...
Liberated? That ephemera might actually be integral to the story and you are NOT the arbiters of intent. Please keep your modernizing out of my lit'ratur.
Pap, The Adventures of Huckleberry Finn
Personally, I would like to see our storage formats move towards more dynamic documents, so that the reader can literally flip between the original content and the "modernized" variation. Ebook readers already have integrated dictionaries, why not the discussions and interpretations integrated too? (I ask in a partially rhetorical sense; is fighting the inevitable changing of language really gaining anybody anything? Is it not possible that that "problem" is a red herring?)
I say this because I have never in my life been rendered so furious by the action of a literary editor as to have Shirley Jackson's We Have Always Lived in the Castle completely ruined for me by some importunate jackass who insisted upon putting a long, discursive deconstruction of the entire novel into a preface of an edition I incautiously happened to choose. By the time I realized what I was reading, it was too late, and I found myself unable to appreciate the actual story at all, having had it helpfully predigested for me by this ham-handed oaf who so highly valued his unique and precious insight into what Jackson was really trying to say that he put it first in the book, where it lay in the path of the not perfectly cautious reader as a beartrap in that of a cheerful weekend rambler.
I should like any such initiative as that you describe to take as axiomatic it's worth not doing that kind of thing, is what I'm trying to get across here.
I can imagine a "Do-Gooder" helpfully going through Huck Finn and modernizing all that messy vernacular but that book ain't Mark Twain's and it ain't something I'd want to read. Some folks may have trouble understanding what is being said but modernizing the text would suck the life right out of the story.
To your point, I think the ultimate prize is both the original and a modern/translated digital version. Anecdotally, I might have developed an appreciation of Shakespeare MUCH earlier in life had I known not just what the characters were saying but combined with how they were saying it.
Think of it more like modernizing spelling of Shakespeare, so that we can enjoy the text and not spend time parsing spelling like:
Had, having, and in quest, to have extreame,
A blisse in proofe and provd and very wo,
Before a joy proposd behind a dreame...
Let me not to the marriage of true minds
Admit impediments. Love is not love
Which alters when it alteration finds,
Or bends with the remover to remove.
O no, it is an ever-fixed mark
That looks on tempests and is never shaken;
It is the star to every wand'ring barque,
Whose worth's unknown, although his height be taken.
Love's not Time's fool, though rosy lips and cheeks
Within his bending sickle's compass come;
Love alters not with his brief hours and weeks,
But bears it out even to the edge of doom.
If this be error and upon me proved,
I never writ, nor no man ever loved.
O no, it is an euer fixed marke
That lookes on tempeſts and is neuer ſhaken;
It is the ſtar to euery wandring barke
Whoſe worths vnknowne, although his higth be taken...
In that example, "vnknowne" is pronounced the same as "unknown", despite having an extra "vowel" in the typography.
A mechanical modernization and standardization would seem to run a significant risk of damaging some of these, though proper manual final review would hopefully catch and revert the problematic cases.
How many syllables are in the word 'wandring'?
How many are in the word 'wandering'?
Sadly, the modernization breaks the meter and it's no longer a sonnet. I have a sneaky suspicion The Bard was deliberate in his choice of spelling...
You needn't imagine it; it's happened, more than once. For example: http://www.nytimes.com/2011/01/07/books/07huck.html?pagewant...
As someone who grew up in Hannibal, Missouri where Sam Clemens did, let me tell ya it's still a tad diff'rent from N'York.
If someone intends to "modernize" a book, particularly the dialog, they need to make it clear that it is a translation from one dialect of the language to another. They need to not credit the author those new words and say they just fixed typography. There needs to be an "as interpreted by", and that person needs the blame by name.
Try thinking about this visually. If I modernize ANYTHING on the Mona Lisa, whose work is it? Are my brushstrokes over his really not changing anything fundamental about the work or is it just another option for viewing Leo's most famous portrait?
PS: And, of course ;-), the idea is to contribute your own designs! It's really just an HTML template with some CSS.
I see that the page is a bit slow. If you need any help to port it to a static format (for performance), please let me know.
Based on your previous choice, I suspect she's much too young for Flaubert? His writing is marvellous.
On the other hand, avoid Malot and his ilk - it's just depressing stuff.
Specifically, see our typography manual, semantics manual, and the step-by-step guide to producing an ebook (all linked from the contributors page) for details.
You can also check out our toolset, which automates a lot of what we do: https://github.com/standardebooks/tools
Thanks for all your work!
Possibly some number of the books have been vastly improved or cleaned up, but there is no way to tell them apart from the ones that are simply dups of the PG files.