Hacker News new | past | comments | ask | show | jobs | submit login
Standard Ebooks (standardebooks.org)
701 points by pauloxnet 8 months ago | hide | past | favorite | 153 comments



Editor-in-Chief here, happy to answer any questions!

Of interest might be my blog post on how SE runs on a small VPS using classic web tech: https://alexcabal.com/posts/standard-ebooks-and-classic-web-...

(This post is slightly out of date as there is a database now; but it's used for managing Patrons - and soon a cover art listing and approval system - not for serving the actual ebooks, which are still served as described in the post.)

Our volunteers have spent the last few months preparing a few notable books published in 1928 to be released today, Public Domain Day. Those are the top 5 books in the ebook list, starting with The Mystery of the Blue Train. Check them out!

We welcome new contributors if you'd like to work on producing a new ebook. In the next week we'll also have a brand new cover art database launched, so if you'd rather help by cataloguing new cover art for future ebooks, get in touch at our mailing list!


Hi Alex, I shared the SE link here to help with donations and I hope it's working.

Thank you for your beautiful project.

For a few years, every January 1st, for public domain day, I have been promoting SE on social media, the thread on Mastodon is the one with the most involvement. https://fosstodon.org/@paulox/111680544393923401

It would be nice to have an SE account on Mastodon that posts about every new book published, since IMHO it's the social network more aligned with the spirit of SE.


That's great, thank you! We've had various people ask about us getting on Mastodon but frankly I really dislike social media and have only the vaguest understanding of how Mastodon works.

If we did, then someone would have to volunteer to run the account, and also the account must be able to delegate posting powers to another user without exposing the account's master password (like Tweetdeck or Facebook are able to do). If that's possible and you're interested in helping, please send me an email!


For what it's worth, they have an Atom feed - https://standardebooks.org/feeds/atom/new-releases


I have a suggestion: You could optimize the website to be easily readable and navigable on the Kindle's web browser, and recommend it as an option. I've often found it to be the easiest way to get non-store books on my Kindle. I've also noticed that cover images are handled correctly when the ebook is downloaded straight onto the device, with no need for a separate image file.

A hurdle for this though, is that building a good website for the Kindle browser is a pain, as the browser's support for various html/css/js features and standards is all over the place, with no debugging tools available.


I believe our website does have some basic Kindle browser support. The problem, as you noted, is that Kindle's browser is terrible.

I say the same thing in every ebook thread: On a purely technical level Kindle is a terrible ereader designed by people who seem to hate books. Buy almost anything else.


A jailbroken kindle is okay - they make an adequate PDF reader and they can be found easily for less than the alternatives, at least in Britain. I do agree they're somewhat poor when used as intended.

They're also quite a nice embedded ARM Linux machine for a lot less than I could make one or buy one from elsewhere, but I suspect that isn't the core market for a kindle...


Recommendations?


Kobo, with either stock OS or KOReader (I use this, in part because the font size can be easily increased for my daughter who so far needs text larger than stock) or Plato.


Is the build quality and backlighting as good as the Kindle? And do they have a seamless option (no notched screen)?


I no longer have a Kindle to compare, but I'm very happy with build and lighting on my Kobo Libra 2. I've used Kindles since the Kindle 1, and there are some Kobo things that I don't like as well as Kindle, but it's a better-than-decent e-reader and I'm glad to be out from under Amazon's thumb.


I've been happy with the half dozen Kobos in my house.

Weirdly, about half of them have developed a problem after about 5-7 years, whereby they intermittently stop charging. Replacing the battery doesn't seem to fix it. Might be a problem with the soldering of the USB connector to the PCB?

As a bonus they are Linux based, and you can do fun things like replace internal SD cards with bigger ones, login using telnet and install new applications.


I imagine the Kobo is high on that list.


Onyx Boox. They run full Android, which means sync with Play Books, Kindle, Libby, etc.


What’s the problem with the kindle? I use an old paperwhite and I have no issues reading epubs in it that I send by email


Thank you so much for the work you and the whole team, and the contributing community, do! I've read a bunch of classics thanks to your editions, and have donated in the past. This post is a reminder for me to do so again!


Came to say the same. I have a recurring donation set up and I'm always happy to see updates and mentions of the project.


I've been eagerly awaiting the new Lord Peter Wimsey novel! To avoid burnout, I've been reading them as they enter the public domain instead of reading the whole series all at once, and I was hoping that it would be in the first batch this year. Thank you so much for your hard work!


heh, that reminds me of when I used to eagerly hunt used bookstores for anything by henry cecil (an out-of-print humorous writer). it was always exciting to find one I hadn't read before. and then his entire works got reprinted and you would have thought I would just buy and binge read the lot, but somehow the excitement went out of it and I just ended up reading a couple more. I should go back and catch up on him, actually, it's been years and years since I last read one.


The page <title> for collections could stand to lose the "Browse free ebooks in the" preamble. It makes it harder to distinguish when looking at a list of open tabs. Consider:

- "Browse free ebooks in the Encyclopædia Britannica’s Gateway to the Great Books set[…]"[1]

- "Browse free ebooks in the Modern Library’s 100 Best Novels set[…]"[2]

- "Browse free ebooks in the Modern Library’s 100 Best Nonfiction set[…]"[3]

(Indeed, the titles are even much longer than that. It feels SEO-ish; not sure why that would be a priority for a free culture project like Standard Ebooks, especially give the momentum and cachet it already has.)

Collections should also have placeholders for unavailable titles. For example, currently the "Utopian Trilogy" collection[4] contains exactly one item, in spite of the true size of the set it actually belongs to. When an item is not available because of copyright, that (along with the year in which SE will first be allowed to make its own edition available) should be made clear. Where it's unavailable because no one has yet proofed the text for an SE edition, a clear call to action can be made.

And it's seemingly minor, but on the subject of editions, I wish SE followed closer to the print tradition instead of the modern Web millieu and clearly identified its microeditions as exactly that: distinct editions of the same text. (Yes, that means there are possibly dozens (or hundreds?) of different editions, given that errors can be found after the fact and the SE house style may even change, necessitating updates. No, that's not a problem.)

1. <https://standardebooks.org/collections/encyclopaedia-britann...>

2. <https://standardebooks.org/collections/modern-librarys-100-b...>

3. <https://standardebooks.org/collections/modern-librarys-100-b...>

4. <https://standardebooks.org/collections/utopian-trilogy>


>The page <title> for collections could stand to lose the "Browse free ebooks in the" preamble.

That looks like it might be search engine optimization.


It seems like maybe it's for SEO.


I see that you use public domain images for books - do artists also contribute work from scratch (with an appropriate release)?


Nobody has offered as of yet, and if someone did I think the quality would have to be extremely high for me to consider it.


Do you happen to have a wishlist of artwork or a particular project that would benefit from custom artwork? I would like to contribute art to the project, whether it ends up used or not. I used to work as a digital artist professionally.


Sci-fi works are the hardest to find cover art for as naturally there is zero public domain sci-fi themed fine art. If you can paint in a fine art style, contact me via email and let's chat.


This is such a cool project. Every time it hits the front page I browse the selections like I’m at a book store.

Have you considered making books sortable by popularity? It might be more approachable for new users if they see books they recognize at the top.


That's a frequent request but it would also require having our catalog in a database, which we don't have right now. I do think the time is soon for doing that for several reasons, but there's no spare time in my day at the moment.


Perhaps there’s no need for a db? If you have basic web logs, some volunteer can find out how many times a book was downloaded etc, and use that to do a one-off “best of 2023” etc? A kind of SE Wrapped thingy?


Do you have any thoughts on providing manually pre-formatted PDF files? Em-dashes, curly quotes, etc. are all nice, it's a step in the right direction, but in the end the EPUB file needs to be interpreted by the ebook reader on the fly and in terms of typesetting quality the outcome is far from what physical books provide, since you still get orphans, weird hyphenations, ugly/misaligned chapter titles. For me, nothing beats reading a print-ready PDF file.


That's a common request but there are no plans to officially offer PDFs. We offer a variety of reflowable file formats, and each format is more burden to maintain; since PDF is a famously difficult format, maintaining it would be even more burden. A reader requiring a PDF can use a tool to convert any of our files to PDF. That's basically what we'd do at the end of the day, anyway.

There's been some mailing list chatter lately on how to best format PDF editions, but that's not being pursued on a project level.


Hi and thanks for the great work! Have you considered offering .mobi or .azw file formats of the books? With the 2023 browser update, even old Kindles now have a fast and functional web browser. It is almost possible to find and download Standard Ebooks directly from the Kindle browser, but for the file format.


We do offer azw3 files for all of our books. https://standardebooks.org/help/how-to-use-our-ebooks#kindle...


Yes, Amazon has changed the game and they only allow downloads in .AZW, .PRC, .MOBI or .TXT format now.

I understand that this is their fault and not yours, but maybe it could be interesting for you to offer one of these formats now that the Kindle browser is actually usable?


> they only allow downloads in .AZW

Do they actually check the file content or just the name?

If the latter, just a .AZW alias to the actual .AZW3 might work ...

(I can't test, don't have a Kindle nowadays)


I don't know. The Kindle browser doesn't have any copy and paste function, so it's difficult to work around any limitation.


I meant the SE site serving ".azw" files that are the same as the .azw3 - I understood that it's the Kindle browser that limits the downloads, right?

Once the files are in the Kindle, it would probably work out OK.


Ah, I see. Great idea! I hope they see your comment.


You could probably drop the server and use Cloudflare Pages and a SSG. I use Astro for https://sabine.press/

Edit: oh and Lambda for a total of 2 server functions


Well, the point is not to jump at the new-fangled tech and AWS cloud lock-in :)


I’m curious, why do you have a policy against hosting religious books?


The site actually hosts several "religious books" (try filtering by the "Spirituality" tag -- I've even produced several books on religious topics myself for SE). What it doesn't host are "Religious texts from modern world religions" (what some might call "scriptures," e.g. the Bible or the Quran) which is a much narrower category than "religious books."

As a religious person myself, I actually think this policy is very sensible. Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best." That quickly enters very murky theological territory, where one side of a given religion might push for one particular translation, whereas another side would push for another translation.

To give the Bible as an example, Catholics and Orthodox Christians include the deuterocanonical books (e.g. Tobit, Judith, Sirach) in their canons whereas Protestants exclude these. Would the SE version of the Bible include these? Some American fundamentalist Christians claim that the King James Version is the only valid English translation of the Bible, whereas the Revised Version (also available in the public domain) is based on more reliable Greek manuscripts. But some conservative Christians reject the Revised Version and its descendants based on certain theological premises...

Do you catch my drift? IMHO it's very sensible for SE to avoid these sorts of debates entirely by sticking to books where you could argue (with some degree of handwaving) that there really is a "best version" :)


> Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

Is there a technical reason to disallow multiple translations of the same text? I can see on the "wanted ebooks" page a number of translated titles[0]; so the project does seem to make editorial decisions about which translations to work on. Obviously, where one translation exists, there may be others that have other advantages.

[0] - https://standardebooks.org/contribute/wanted-ebooks


We try to pick the “best” translation that’s in the public domain in the US. Quite often, that’s a single translation unfortunately, but if there are multiple we do try to evaluate them from a readers point of view.


> Most (nearly all?) religious texts of major world religions were originally written in languages other than English, and so if SE were to try to host those texts the site would have to make an editorial call about which translations of those texts are the "best."

The site already hosts a number of works that were originally written in languages other than English, and yet it had no problems making an editorial call about which translations of those texts are the "best." The obvious solution would be to just allowing multiple translations of foreign-language books.


I think that makes sense, but it still seems a bit arbitrary, I don’t see bookshops having these issues


Yes, bookshops will sell one version of the Bible to Catholics, another to Protestants, another to fundamentalists, another to progressives, etc. :)

In contrast, part of the SE editorial philosophy is that it tries to host the best (based on academic scholarship, translation quality, academic acclaim, etc.) version of each text available in the public domain, which excludes that "something for everyone" sort of play available to a commercial bookstore. You could rightly argue that this is losing something (it's good to have multiple translations to compare if you're reading a text for critical purposes), but the SE editorial philosophy avoids a certain amount of confusion and clutter for the general reader. So there's a deliberate (you could call it "arbitrary" in some sense, if you wish) tradeoff being made here.


US Barnes & Noble can have a few meters of shelves with different versions of the Bible, and a buying guide. It is quite striking if you are not used to it.


Part of the issue would be that the nooks are translations and the copywriter data would be from the translation date.

So modern versions of e.g. the Bible could not be in Standard Ebooks. So easiest to not carry any translations.

Bookshops have no problem with this as part of the purchase price will go to the copyright owners of the translation.


> So modern versions of e.g. the Bible could not be in Standard Ebooks.

There are modern translations that are permissively licensed and are of surprisingly high quality. See the NET Bible as a prime example. It's also the only one I know of with good translation notes that can be had for free.


Modern versions of e.g. Tolstoy's "War and Peace" could not be in Standard Ebooks. So easiest to not carry any translations?


One of the funny things about Bible translations is that more modern translations are based on older manuscripts than older translations, due to advances in archeology. SE can't carry any translations that incorporate the insights of the Dead Sea Scrolls, and having access to some of the oldest Hebrew manuscripts is a pretty big deal when it comes to translating the Tanakh.

It's true, modern versions of War and Peace can't be hosted at SE, but those modern versions generally don't reflect revolutionary leaps in archeology :)


It seems like most of the Christian books on SE are Roman Catholic in orientation (Belloc, Chesterton, etc.) Pilgrim's Progress is a Protestant work, but it would be good to see a better representation of both pre-Reformation and Protestant titles.


Can you provide any specific recommendations?


Sure, how about some classics like:

The Didache

Anselm, Cur Deus Homo

Anselm, Proslogion

Augustine, City of God

Augustine, Confessions

Augustine, On Christian Doctrine

William Law, A Serious Call to a Devout and Holy Life

Luther, The Bondage of the Will

Calvin, Institutes of the Christian Religion

Pascal, Pensées

All of these are in public domain.


My thought was that many/most religious works are public domain and are already readily available elsewhere.


Actually all of what SE has now has content on different sites


I'd imagine that if they host one religions books, many more religions will come out of the wood work and demand their books also be included, leading the site to be largely religious texts.


Numerous sites, platforms, stores, etc. host religious books, and that has never happened.


Amazing that you don't use front-end javascript at all. Is there anything you wish HTML+CSS could do better?

Re XHTML - It looks like the website is being served with a text/html content type. Did you give up on your XHTML experiment? How did it go? Maybe it would help if browsers reported back errors to the server, like how "Content Security Policy" reporting works.


We never served as application/xhtml+xml. That has some nasty side effects in browsers, like no incremental rendering. But it’s always been legit to serve XHTML as text/html even if you lose the break-on-error functionality.


In addition to the Newsletter and Feeds, it would be nice to have a Blog or News section where you can publish news every now and then, for example an article for the public domain day would have been very useful for making new publications known, simplifying sharing and attracting new volunteers or donors.


Thank you for Standard Books!

I remember when Manybooks used to be what you want. But quality dropped precipitously with self-published new novels, I suspect some money is changing hands somewhere.

What happened to Manybooks? Does Standard Books have a plan for avoiding that?


I don't know anything about Manybooks' history, sorry.

At SE we focus exclusively on US public domain titles; that's one of the major philosophical points of the project. The other major point is a high quality standard, so it's in our best interest to keep pursuing that. SE became known due to its quality standard, not because it's more free ebooks. Therefore if we strayed from those points then we'd be just another free ebook site, of which there are no shortage.

Quality is also why we reject self-published books that have been dedicated to the public domain, as those are typically low-quality content to begin with. (Though I wouldn't call every single book we host "high quality content" in the sense that each one is up there with Shakespeare. But books that have survived a hundred years tend to have survived because they're not slush.)


Good to know that as a self-published author myself that the quality of any site is going to drop as soon as I put my book on there.

Every other book I read now is by an author with NO rating, I have read six this year, none were memorable, or my cup of tea I will give you that, but two of the four- or five-star offerings on Amazon were just as bad. As they say, if you don't open an oyster, you will never find a pearl.


If you want to go to the work of creating a curated selection of high-quality, contemporary, self-published, public domain books no one is stopping you.

That's not the niche SE has chosen to target. You can't expect them to serve every possible use case.


What are the dimensions produced by se build-images?


The expected size for the JPG for the cover is 1400x2100.


It says on each book page "Compatible epub — All devices and apps except Kindles and Kobos." - but i think this is incorrect bc epub is now the preferred format for Kindle.



I email epubs to my kindle frequently and they open and read just like any ebook. I last tried this a week ago and it was fine.


When you use 'Send to Kindle' to send an epub to your device, you are not reading an epub. In the link above, it mentions how Kindle converts epub to an Amazon format before allowing your device to read it. Amazon's formats on the whole are inferior, with poor rendering capabilities, and an automatic conversion means all bets are off in terms of what the ebook will look like.

Kindle will not natively support epub until you can connect it to a USB cable, transfer an epub using a file manager, and it does not get secretly converted.


I did not know that and does kind of explain one book with pictures that looked a bit weird. On the whole though, text epubs are totally readable as I get most of my books from non-amazon sites such as smashwords and email them to the kindle to read.


Kobos also support epub.

Almost all my Kobo books are EPUB and work great.


Real EPUBs can crash Kobos and you need to specifically reformat them with a plug-in in Calbre. It may be a recent update that broke it, since I used to have less problems.


Inevitably, like everyone who rejects PHP frameworks because "PHP is already a templating language", you just wound up reinventing the framework anyway.

I'm not complaining - It's just, there's a reason everyone goes for the existing frameworks and it isn't addiction to complexity. Raw PHP code is legendarily insecure and prone to XSS and other issues if you don't do things exactly right.

Nice site, though.


> prone to XSS and other issues if you don't do things exactly right.

Not any more so than sites with frameworks. I’ve found XSS issues in Java Spring framework built sites that didn’t “do things exactly right”. A framework doesn’t magically fix that.


No one mentioned magic. Frameworks are designed to do what PHP developers wind up implementing in an ad-hoc, haphazard way themselves, and tend to be better at doing it on average. Any code can have security issues but I'd trust a battle-hardened open source PHP framework over some random coder's hubris any day of the week.


I published a couple of books for the project during a sabbatical in 2021 (The Devil's Dictionary [0] and a cheesy, small H. Beam Piper book named Four-Day Planet).

The process and tools are quite nice and it's very rewarding to see your work in ebook form. It takes a _long_ time to proof and re-read a book, but it's surprising how many times you can do this and how differently you need to read to catch errors versus just enjoying the damn book.

The fascinating part of the project is a _strong_ editorial opinion, which IMO makes the project successful. There is a core group of people that upholds the standards for the project, and the resulting consistency of quality of output derives from that. The team clearly cares about the quality, and has demonstrably maintained that over the huge number of releases.

I even went to the archives of the "San Francisco Newletter and California Advertiser" to collect some of Bierce's original work, making it the most complete, and most corrected open-source version of the book. [1] The one previously hosted by Project Gutenburg was quite old and, frankly, quite riddled with transcription errors.

I haven't tried reading the Devil's Dictionary back-to-back since I published it, but I might one day. There's a lot of detail in this work that I never saw until I had it under a microscope.

[0] https://standardebooks.org/ebooks/ambrose-bierce/the-devils-...

[1] https://archive.org/details/san-francisco-newletter-dec-11-1...


For other curious HNers, what differentiates [0] them from Project Gutenberg [1] is the improved typography/styling and the full usage of modern reader techniques. Think of it like, etext != ebook.

[0] https://standardebooks.org/about/what-makes-standard-ebooks-...

[1] https://www.gutenberg.org


So why don’t they contribute these things back to Project Gutenberg? Particularly the typography ones like curly quotes and proper dashes, as those are almost always corrections where the overly-ASCII Gutenberg source doesn’t match the original.


Like PG, our editions are blends of other editions, along with our own updates. Often our edition winds up looking nothing like the PG edition, for example when we combine volumes, extract footnotes into endnotes, remove pagination, and so on.

So submitting back to PG would be more like replacing a PG edition, instead of updating it; and I doubt the original PG submitter would like it if their hard work was simply replaced by someone else who thought their version was an improvement.

Our volunteers do sometimes submit typos they find back to PG. We don't require that, so some producers do, and others don't.


Yeah, I was just looking through A Christmas Carol and observed a handful of editorial changes in the commits <https://github.com/standardebooks/charles-dickens_a-christma...> (bran-new → brand-new, frouzy → frowzy, and “Lowercase some gratuitously uppercased words”). Frouzy → frowzy I’m mildly in favour of. Ditching bran-new definitely loses character (he omitted the d on purpose!). One or two of the lowercased words were mildly strange capitalised (e.g. Idol was inconsistent with the previous paragraph); but the lowercasing of many introduces broad stylistic inconsistency, and direct local inconsistency sometimes; and most of the capitalisations were not gratuitous. In fact, more than a few were clearly to be pronounced, as a form of emphasis (e.g. Poor, One, Us); and some were distinctly proper nouns in the context, the removal of which increases the parse difficulty (e.g. One¹, /(Cold )?(Roast|Boiled)/); and some reflect customs still common or even preferred in their domains (e.g. Act, Angelic, Apostles, Star). I just reckon that commit should be reverted, because from my perspective it’s mostly actively bad, and the rest subjective. I’m curious what your reaction is to my opinion here.

But yes, I see that you’re practising some editorial oversight and not aiming to faithfully represent the original in all regards, which I gather is more generally Project Gutenberg’s goal; and this would obviously contraindicate upstreaming.

On the other hand, when it comes to more stylistic matters, I tend to wish Project Gutenberg had more consistency. There’s too much gratuitous variation in presentation and ridiculous 256-colour backgrounds. It’s often too obvious much of it is the work of a group of individuals rather than a coherent effort.

I’m curious about the footnote-to-endnote thing, because I’m not sure how the various formats in question handle them all, but in print endnotes are almost always just awful. If anything, I’d be expecting to replace endnotes with footnotes. (Me, I’m partial to sidenotes.)

—⁂—

¹ Hickory dickory dock, three mice ran up the clock; the clock struck one, and has been charged with assault and battery.


Yeah, any edition of a book that's "updating" modern English loses me, including messing with capitalization. Not interested. I love the formatting on Standard Ebooks, but they're no use to me if they're "updating" language, aside from things like repairing typesetting and formatting lost or mangled in PG editions.

Agree on notes in print, side notes (on very-wide editions) are best, then foot, then end of chapter endnotes. Full end-of-work endnotes are awful. Maybe they're better in ebooks, than footnotes, though? E-readers' poor UX for not-even-that-advanced features of books is part of why I barely use them, and practically never for any work that'd have notes of any sort.


As someone who regularly compares different scans of old books, I counter: for centuries it’s already been common practice for publishers to update spellings, recapitalize, and even make more drastic changes. You just never noticed because print books don’t have a public commit log.

In the case of Standard Ebooks, “sound‐alike” changes are allowed (so spelling and capitalization changes are allowed when they make sense). Censorship, and even innocuous grammatical changes, are not. Despite generally appreciating old works in their own context, I find the tradeoff in readability for such a widespread practice to be worth it given how minor SE’s alterations are.


Sometimes capitalisation matters are close to purely stylistic, but other times they really are part of the content, guiding pronunciation or emphasis, so that lowercasing them harms the work. What is your opinion of my assessment in the above comment of some of the specific changes in <https://github.com/standardebooks/charles-dickens_a-christma...>?


I haven’t looked into your example, but certainly it can be true that lowercasing can be harmful. It goes without saying, I think, that the SE policy is only to lowercase words when doing so doesn’t harm.

When I see erroneous changes in SE books, I argue to revert, and have generally been successful. In my experience it’s drama‐free, like fixing any other typo.


The problem is when irregular spelling is intended to capture a vernacular. It does a disservice to everyone, erasing the author's intent with homogenized language.


If the spelling is intended to be vernacular, the SE policy is not to change the spelling. I (a mere reader) have successfully reverted dialectal spellings in SE several times.


Editorial commits are all marked as such and contain no non-editorial changes. The tools for compiling ebook files are available at https://github.com/standardebooks/tools, so creating your own versions with only the work you're interested in is straightforward (and can be at least partially automated).


In addition to what Alex has said, as an SE contributor I do try to submit errata to Project Gutenberg where I can find the time and energy. Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to find those errors in their code and submit the changes directly, but unfortunately they don't have that, so far as I am aware.

That is one major advantage SE has, I think, which is that we do allow people to make pull requests against any of our ebook repositories and any PRs that get merged are automatically deployed to the site. This makes it much, much easier for tech-savvy people to submit proofreading corrections!


> Part of the problem, though, is that PG's errata process (https://www.gutenberg.org/help/errata.html) is quite cumbersome since you have to write an email to their errata team with each individual error. That's a real hassle to try to keep track of and submit. Ideally, if PG had something like a pull request system, I would just be able to[...]

On the other side of the coin, Standard Ebooks's heavy endorsement/buy-in of GitHub-based workflows are offputting to broader audiences. (It's pretty offputting to me, and I'm not even non-technical; I just recognize it as a sort of Conway's Law + Law of the Hammer sort of thing, and it chafes.) I.e., for others what you describe is far less than "ideal".


Typos can be reported by email on SE too. Git is only required when you’re publishing a new book. My observation from watching the mailing list is that emailed typos are fixed quickly. (I always fix typos using pull requests, and those are acted on quickly too.)


You don't have to use Github if you don't want to, but you do have to use Git. We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.


> We've had more than a few producers successfuly produce ebooks without using GitHub or Google Groups.

Can you share or document how? https://standardebooks.org/contribute suggests that "Technically inclined readers can produce ebooks themselves" but doesn't provide any point of entry to do so other than a link to the GitHub org, and "No technical experience is necessary. Contact the mailing list if you want to help." just links to the Google Group.


It's very uncommon, if you want to do that then just email me privately and we can set something up.


Also wondering this.


I would love if they offered a download option for a file you could just upload to Lulu (or similar service) to have it printed and mailed to you.

Every time I buy one of these public domain books from Amazon, they are invariably shitty, low-quality "printed by Amazon" versions. I miss the time where you could get a high-quality hardcover, but more and more those seem reserved only for the current week's NYT best-seller books.


I've considered running a campaign to finance a print run of some of our (SE's) books. But the fact is that it's just so easy to find super cheap paper copies of these books almost anywhere. As long as you buy a copy that was printed before, say, 2005 - or from a reputable publisher like Oxford or Penguin - then the edition will already be pro quality. (After that, it's much more likely that you're buying a print-on-demand copy of a raw Project Gutenberg text.)

If we did offer print books, I think the value-add would be making them extremely ornate, one-of-a-kind editions like Arion Press or Folio Society make, and we'd charge a lot for a copy. But even then I'm still not sure the juice would be worth the squeeze, because that's also been done to death... how many more fancy editions of Dracula or whatever does the world need?


I think you might be underestimating the value-add---at least based on the existence of this thread! Yes quality copies are out there, but easy to find for the Editor-in-Chief of Standard Ebooks doesn't mean easy for everyone. I suspect plenty of people would find a trusted, no hassle source for a quality print copy worthwhile, just for the simplicity and convenience. Though I totally respect not wanting to waste ink and kill trees reprinting something that's already widely available.


Folio Society basically already does this, at a premium price. Used copies of well-set PD classics from respected publishers like Franklin Library or Modern Library go for pennies and can be shipped to your door fron places like Abebooks, or you can easily find them at your nearby library sale/used bookstore/charity shop/etc.

I've been toying with the idea for a while but I think the market is just too saturated, even for premium editions. Maybe the focus should be on reviving more obscure works... not sure.


This could be a cool monetization strategy. I don't really read physical books, but the "classics" on Amazon are often complete ripoffs. Here's Crime and Punishment for $10 just to get the Kindle version: https://www.amazon.com/Punishment-Penguin-Classics-Fyodor-Do...

I feel like these open domain novels published by big publishing houses have the veneer of legitimacy, but projects like the one this thread is about I think could accomplish much more. Especially for authors where the work is translated into English. Plus the cover designs are much cooler.

I will say, the search on their website is kind of slow and could use some work.


Not the greatest example, as the translation is not public domain.


In fairness though, if you sort by price, you can always find classics on Amazon for dirt cheap. E.g.

https://www.amazon.com/Greatest-Works-Dostoyevsky-Punishment...

Although there's no saying as to whether or not they will have proper spellcheck, TOC, if they are legitimately in the public domain, or even if it's the right book with all the pages. That's where a service like Standard Ebooks is superior to the potluck you get from Amazon.


Why is that book a ripoff?


Without discounting the point made by smogcutter about Penguin's edition not actually being public domain: for a classic work, I'd expect to be able to get a paperback for less than $10.* And that involves a real-life physical artifact which (a) necessarily has lower margins than an ebook, and (b) doesn't come with the omnipresent threat that it will evaporate from your device (or your managed online locker or whatever), nor that you'll have to stop reading if your battery dies, nor that you're unable to easily hand it to someone else to let them thumb through or borrow it. For an ebook, $3 or $4 sounds about right. Maybe $5 for a relatively modern translation, as in the case here. Recall that Netflix in comparison is $X per month (fill this is in; I don't actually know, but I know the number is not high) and libraries are free-ish. Price points at or around $10 per work or more feel like a shameless ploy to trigger the sensation of "economy" in "false economy" and push people into rent-seeking platforms where they consistently hand over a continual stream of monthly payments in perpetuity for "unlimited" access—to select items within the very limited one month term that the payment gets you.

* NB: whether this is actually the case or not is a separate matter


Yes basically. I understand if a publisher commissioned a translation and put in work but $10 for a DRM digital copy is too steep. Maybe $3-$4 would be reasonable? Crime and Punishment was written in the 1800s, the author is long dead. And it's considered a historic and important piece of literature.

Regardless, it's great that these works are available in high quality for free.


If it was $3 would you buy it today?


Because there are high quality alternatives that are free and have no DRM there's no price point I would buy it. The only scenario I would is if I wanted to read a specific translation.

A physical copy if I wanted one I'd be willing to pay ~$6 for, less if used.


If there’s no price point at which you’d buy it why weigh in with what you believe a reasonable price to be?


Physical media tends to be ~30-40% of the costs, so I think it's more like $6 or $7.


You think what's more like $6 or $7?


Eh, I disagree. $10 to be able to have it this second on my Kindle vs. waiting to get the physical copy feels like a good value.


> I disagree

That doesn't make sense. You disagree with what? My expectation that it shouldn't cost much more than $5 for an ebook of something with a 150-year-old plot?

> $10 to be able to have it this second on my Kindle vs. waiting to get the physical copy feels like a good value

That's nice, I guess—for you. But we weren't talking about you, and we weren't talking about instant gratification.

If I'm buying for reasons where instant gratification isn't a factor—and I'd argue that for books, prioritizing instant gratification is even sillier than with e.g. food and drink or streaming TV shows—should I still pay the premium to be able to get something "this second" if my flight isn't for another three weeks and I'd have been perfectly willing to deal with a delay?


Yes, I disagree with what you said – that an ebook should be priced lower than the print version.

If you want the cheaper one buy the cheaper one. If you aren't in a rush to get it, buy the one that gets delivered in a week instead of in 10 seconds. If you want the ebook, buy the ebook. If you want the printed book, buy the printed book.


It really reads here like you're willfully missing the point.

> it shouldn't cost much more than $5 for an ebook of something with a 150-year-old plot

> for a classic work [...] $3 or $4 sounds about right

That is a direct response to your question to the other commenter about why Penguin's Crime and Punishment for Kindle is a ripoff.


Yes and I disagree with you.

I don’t think italicizing makes the argument more ironclad.

$3 or $4 doesn’t sound about right to me.

To me, the age of something doesn’t necessarily influence its cost. Saying “this book is old so it should be cheap” is something I disagree with.


At this point, you're just propping up and tearing down straw men—including your remarks about whether or not italics make an argument "more ironclad": nobody said it did. The italics are a response to your insistence in not engaging in the conversation you're ostensibly talking part in and trying instead to turn it into a different conversation that no one else is.

> $3 or $4 doesn’t sound about right to me

No one asked. You asked, on the other hand, a question about the ripoff comment, and that's the question you got an answer to.

No more attempts at conversational sleights of hand, please.


Yes, I asked a question. You gave your opinion. I said I disagreed with it. You said disagree with what? I said I disagree with your opinion. Repeat… several times now lol.


Here's what you're not getting: nobody asked about that.

You cannot say to someone who doesn't like hot dogs, e.g., "What makes hot dogs gross?" and then when someone explains why, respond, "I disagree." That makes no sense. You asked for the information, they gave it to you, so the only possible way that "I disagree" fits at that place in the conversation is if you're saying that you disagree that that's their reason. To respond "I disagree" in the sense that you don't share their taste is to fundamentally change the subject to try to have a different conversation. And it's not interesting, besides.

People have different tastes. That's expected.

Stating that you personally think it is worth the price—especially in this context, as if it's some kind of retort—is just annoying. It isn't illuminating; that you don't share the same opinion is utterly unsurprising and didn't need saying, and it provides no special insight.


Lulu does not make 'high-quality hardcovers'.


Related:

Standard Ebooks - https://news.ycombinator.com/item?id=32215324 - July 2022 (256 comments)

Free and liberated e-books, carefully produced for the true book lover - https://news.ycombinator.com/item?id=25138534 - Nov 2020 (106 comments)

Standard Ebooks: Free public-domain ebooks, carefully produced - https://news.ycombinator.com/item?id=20594802 - Aug 2019 (129 comments)

Standard Ebooks: Free and liberated ebooks, carefully produced - https://news.ycombinator.com/item?id=14570035 - June 2017 (96 comments)


A well run open source ebook project, producing the highest quality ebooks. Always looking for volunteers as well as donations.


What’s the best way to volunteer?



There are HTML versions but they don't seem to reflow

example:

https://standardebooks.org/ebooks/rudolph-erich-raspe/the-su...

A bit offtopic, but I never understood why .epub is a thing. For instance the linked HTML/XHTML version seems to work just fine (except for the reflow thing.. but I assume that a CSS issue)

.epub seems to be mostly HTML with a few pieces missing. I guess I don't understand why we needed a new format? and not just use a strict HTML subset?

I'd love some strict HTML subset that indicated the file can be used offline. I personally try to make all my webpages so that they can be saved to disk and opened from a single file (though if you embed images/videos this becomes problematic). But I don't have a way to indicate to a reader "Hey you can Ctrl+S this webpage". I'd publish .epub, but the browser won't open them


epub is just a collection of zipped HTML, CSS FONTS and images and as bog standard as you can get. You can open it with a Zip extractor and see.


Huh.. yeah.. so then why doesn't Firefox/Chrome open it?


Edge used to support ePubs directly, but they removed that functionality for some reason. There is a little more to it than just rendering. E.g. ideally you’d want some popup table of contents support.


Chrome opens the html pages fine once the epub is unzipped. Some epubs may however have DRM I dont have any of those but they probably wont work.


I have a question regarding books in other languages than English. Is there a technical reason for only allowing English or is it the legal aspect (knowing what is in public domain in countries other than the US) that hinders this?

For example I live in Iceland there is a number of texts that are in the public domain, for example the Edda and Icelandic sagas among others. But since we are very few (approximately 380K people) there is no comparable entity and most likely never will be, so the best and probably only way to get something similar would be to be a part of a larger organization.


It’s neither: it’s that our tooling and manual of style[1] is only developed for our specific conventions for English text. Other people have attempted to start alternative projects but getting that set of standards together takes time and effort, and like you say 380k people isn’t that many.

If you do decide to start something then SE’s tooling supports a --white-label flag when creating a skeleton, which would at least get the first few productions off the ground.

[1] https://standardebooks.org/manual


Thank you for this, are your tools licensed in such a way that it would be ok to extend and adopt them so that they can be used for other languages?


The code is GPL-3 and the templates are CC0: https://github.com/standardebooks/tools/blob/master/LICENSE....

Feel free to ask on the mailing list if you have any questions, more likely to be picked up there than in a random HN thread :)


I’m happy to see Standard Ebooks here! I’ve read their editions of Nostromo by Joseph Conrad and Vanity Fair by William Thackeray and the quality great. I recommend it if you’re interested in classic literature.


Who decides or sets the difficulty level of "reading ease" (which is a sortable metadata attribute on the search page) ?

Some classiications seem a bit ...nonintuitive. For example, the Autobiography of John Stuart Mill is classified as "very diffcult" whereas "The Tempest" by Shakespeare is classified as "fairly easy".

I would classify it the other way around, but what do I know, I'm a nonnative speaker anyway.



This is very nice ! I’d love to see this for French literature too


This is really cool. I'm going to donate.


I wonder if a scan -> OCR -> LLM proofreading pipeline is possible?


I am one of the SE editors/regular contributors and I did play around with this a bit for a poetry collection: https://groups.google.com/g/standardebooks/c/IUvGLmvZrmM/m/s...

I'm sure someone sufficiently determined and good at prompt engineering, and integrating LLMs into a larger toolset, could come up with something even better. I'm personally very skeptical of LLMs as a technology, but even I have to admit that this was a pretty ideal and unobjectionable use of LLMs.

That being said, though it was a fun experiment, I later found that it was easier (and less wasteful of natural resources) to just do the same thing with a bit of custom markup and a search and replace script.


I don't think that's quite what the parent had in mind.

The most natural application of a language model in proofreading is to compute perplexity across the text; if all goes well, errors should be detectable as points of unusually high perplexity. (In principle, this should even be able to spot otherwise undetectable errors like missing words.)


I could see how that would be helpful, but at least for my use case I'm more interested in seeing how LLMs integrated with computer vision can speed up transcriptions. Since a thorough proofread by a human is already baked into the SE production process (and is indeed one of the major selling points), having more automated tools to aid proofreading is nice but doesn't do anything fundamentally different, from my point of view. Whereas if LLMs can be leveraged for transcription SE producers no longer need to depend on external projects like Project Gutenberg or Wikisource to produce texts (which can take months) or transcribe texts from OCR results by hand (very tedious and error-prone--believe me, I'm speaking from experience!). It would drastically open up the range of possible books someone could reasonably produce (in a timely fashion) for SE.


As a first pass I'm sure it'll save some effort (i.e. l -> 1 in some fonts). I can't imagine it fully replacing and editing/proofreading passes.


I made a tool like that, and I bet with a more powerful LLM like GPT4, and perhaps a better baseline OCR tool (like GPT4 vision), it could work really well for this sort of thing:

https://github.com/Dicklesworthstone/llama2_aided_tesseract


To have access to what I read and also remember it, I created recently a web app called bookeeper that exports your Kindle highlights to notion, generating a personalized summary of it with AI as well. Try it here if you are interested: https://bookeeper.io/?utm_source=hacker_news&utm_medium=book


Standard Ebooks is fantastic! In fact, I love what they're doing so much that I actually built a little SaaS product on top of their ebook collection.

The site is called Modern Serial, and it lets you read books from Standard Ebooks in 10 minutes a day as Substack-style email newsletters.

https://modernserial.com/


Very cool project. Does anyone know of something similar for audiobooks?




Librivox creates audiobooks of PD texts. I've heard good things about their work but I personally don't listen to any audiobooks in general.


Any audio recording will have its own copyright separate from the base text, so it'll be a while before any quality audiobooks enter the public domain.

For now, your best approach would be to take high-quality ebooks like what Standard Ebooks offers, and use text-to-speech software.


What are the standard dimensions produced by se build-images?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: