Hacker News new | past | comments | ask | show | jobs | submit login
Major Mode for Reading EPUBs in Emacs (github.com/wasamasa)
207 points by JNRowe 12 months ago | hide | past | favorite | 65 comments

That's a pity the FB2 format is not nearly as popular as EPUB is. It's a single-file pure tidy (I mean a simple, intuitive, bloat-free, well-documented schema) XML with purely semantic layout and no formatting. It's a pleasure to write code processing it. The user/app/device is the only to decide how to view it (neither pages nor fonts are specified). Isn't this lovely?

EPUB's zip archive may encapsulate images and fonts. How do you achieve that with a single XML file? I guess it has a reduced feature set?

Fonts are among the last things I want it to encapsulate, I'm kind of a content-formatting separation extremist. In fact I often convert epub books to fb2 just to make my PocketBook use the default font for better readability (perhaps there is a switch somewhere but couldn't find it).

As for the pictures - I'm comfortable with them being Base64-encoded and encapsulated at the end of the FB2 file.

EPUB indeed is more feature reach but from my point of view that is all bloat and unnecessary complexity.

I haven't seen a single ebook which could actually benefit from EPUB features. Whatever features rich typesetting worth saving is rarely published this way anyway, they just use PDF (which is sad but true).

I feel like there's a time and a place for not including those things. I write stories (~1-100k words) in Markdown and run them through Pandoc to get epubs, and the results are nice enough with a small bit of custom CSS but I would love to outsource formatting entirely to the viewer. I don't want anything fancier than the sort of formatting you'd see in a mass market paperback.

I recently bought an ebook and found that it was formatted in a custom font, so I "returned" it immediately. I guess this is a philosophical issue to some extent, but I don't think the typical novel is visual content, and I don't know what you're trying to accomplish by overriding my preferred settings for text presentation.

I wrote and use an editor similar to Scrivener that allows the user to edit in Markdown, but saves the project as XML. Images are saved as nodes with a mimetype and a Base64-encoded byte array, and referenced in the document with a UUID.

HTML also allows you to specify an image inline using Base64 encoding; soemthing like

    <img src="data:image/png;base64, fhfhfhfh...">
I don't know what FB2 does, but it's definitely a solvable problem.

Reading the wikipedia article[1] this is exactly what it does.

[1]: https://en.wikipedia.org/wiki/FictionBook#Differences_from_o...

FB2 supports illustrations (jpg, png). As for CSS, fonts, and other such embeddings - they are anti-features from a reader's perspective. Why instead of adaptable, and user-tunable semantic markup would I prefer some guy's tastes, or some company's corporate style pushed on me?

Besides, there's a big surface for bugs in epub. It happened to me to buy an ebook which froze my reader until I weeded out CSS, and fonts from it with Calibre. Ironically, it also, made it easier to read.

> That's a pity the FB2 format is not nearly as popular as EPUB is.

It's made for readers by readers, so no DRM. A non-starter for publishing corporations, sadly.

Publishing companies in Russia are pretty successful selling DRM-free books, you can download PDF, EPUB, and FB2 as you buy. They just spam Google "download (book title)" search results and sue every website distributing pirated copies.

Not having page numbers is pretty obnoxious for a book.

For a paper book. But for an electronic book it's the opposite. An author/publisher has no idea about the size of the screen of the device the book is going to be read on + many people like to tweak the font and its size. And you don't need the actual numbers as long as you have a clickable ToC.

Page numbers can indeed be very useful for the purpose of non-electronic referencing of pieces of a text, especially if a printed edition of the book exists you need to collaborate with people having such but I would rather use simple anchor tags for this.

Shameless plug for a personal project:


This is really just a one-liner I put in a shell script to pass things to Pandoc so I could read them like a man page in my terminal.

Love the Emacs lib though. The more emacs I my life the better. :)

A screenshot would be great. I'm not sure what the output would look like and therefore if it's something worth trying. Though as you say it's only a one liner...

Great advice! Thanks!

More things should use Pandoc as their rendering engine. Why reinvent the wheel when you've already got the whole car?

This is really clean. Ugly option for for non-emacs plebs like me:

unzip -qc "$1" ".htm*" | w3m -T text/html -dump -cols 120

The UNIX way.

https://github.com/bddean/emacs-ereader is another one. Similar features, the biggest difference is that nov shows a chapter at a time while ereader renders the entire document at once which allows doing isearch over the whole thing.

Both of these take advantage of the shr HTML renderer built into Emacs which in turn uses libxml2 to do the heavy lifting.

From the screenshot, my one big complaint is that it's typeset ragged right instead of flush right (which is, generally, how books and most ebook readers do it), but I suppose that's just a CSS fix, then?

If you scroll down, the README tells you how to make it flush right: https://github.com/wasamasa/nov.el#rendering

Ah, excellent :) that looks much nicer!

>which is, generally, how books and most ebook readers do it

Books flush them right. eBook readers usually don't and when they do it's horribly typographically, because they don't use a smart algorithm for hyphenation and line splitting.

books.app on the mac appears to format things nicely. I'm not sure the algorithm is as good as latex, but it seems pretty good.

Constant spacing between words actually makes things easier to read. Prioritizing no ragged edges over all else prioritizes aesthetics over readability.

Nov is my primary ereader these days, particularly since the read-aloud package compliments it so well. If I had one complaint, it's that I wish it supported concatenating all chapters of a book into one single buffer.

I really like that this by default remembers your place in the book. Between Nov.el, pdf-tools, pdf-view-restore, and the fact that most non pdf files convert pretty easily to epub I think one can now use emacs as reading software pretty easily and with far less effort.

The bindings are even similar save for o for outline in pdf-tools and t for table of contents.

One suggestion would be a function to search the entire document since in buffer search will only search the current chapter.

Really awesome progress.

Now, if only I could get Emacs running on my iPad mini which is my primary e-reader.

It runs fine in ish.app

I can’t find ish.app in the iOS store. Please post publisher and further details.

Looks like it's https://ish.app

I think you can also do that using DocView[0] and the mupdf backend.

[0]: https://www.gnu.org/software/emacs/manual/html_node/emacs/Do...

"Keybinds can be viewed with F1-m". This information should be in the first chapter of README. :-/

F1-m translates to "C-h m", which is the standard Emacs command to show information about the current major-mode. It's not specific to nov.el.

I did not know that. And I have been using Emacs since 1984. I use "M-X apropos".

C-h ? will show you help on help, among them:

C-h a - apropos

C-h k <key sequence> - detailed docs on the command run by key sequence

C-h m - docs on current minor modes and major mode

C-h w <command> - what keystrokes will run command

There's a lot more as well.

C-h is backspace and also "move cursor left" on many ancient video terminals. So it was always remapped.

C-h v - display the full documentation of variable C-h k - display documentation of the function invoked by key

M-x apropos RET help RET

Ctrl-k Ctrl-h-m

Go emacs!

I've been heads down looking at implementing EPUB for Polar and there's no way this could work reliably in Emacs. If so I'd be very very very impressed if Emacs could pull it off but I'd bet money that it can't.

I think a good 80% of them could work but you don't want to be halfway done an EPUB only to find out that code highlights weren't being rendered or that rending is halfway broken for some small percentage of the document.

From where do you guys purchase or download (free) EPUBS? I wanted Christensen books and didn't find any. Either they are tied to Kindle or Noob.

If you enjoy sci-fi, Tor Books gives a free novel in epub and mobi formats each month or so; just sign up for the e-newsletter.

This is literally the reason I'm excited to learn about nov.el and other epub readers; when I get an epub book, I sigh inwardly because there's 2-3 more steps to go before I can actually read it, e.g. email the file to my kindle address, wait for Amzn to ask me to verify it's not spam, then finally download it to my Kindle.

For public domain books there's https://standardebooks.org

Project Gutenberg also gives you EPUB downloads for any of their ebooks. (Though some older ebooks are transcribed/processed as text-only, so the HTML/EPUB is autogenerated and loses the original structure.)

StandardEbooks are book from Gutenberg that have been cleaned and make to look nicer. See https://standardebooks.org/about/

It's not just "cleaned and make to look nicer". StandardEbooks ebooks contain editorial changes from the original text. They're tagged in the changelog, but they're mixed with technical edits so there's no easy way to revert them. I'd rather read the Project Gutenberg versions.

From the same people that run sci hub. Please remember that authors need to eat too though. If you have money you should support them by buying their books or they may write fewer of them for you to enjoy.

This doesn't excuse DRM. Personally I don't read ebooks because (1) I won't accept DRM, (2) I won't pirate. Instead I buy a few books in dead tree format but mainly lots of authors are losing out.

You could buy the dead tree and pirate the ebook supporting the author but not drm.

Some publishers do not use DRM - I believe that's the case for Tor and Baen, in SF/Fantasy. Baen sells direct in their site, including monthly bundles.

If they work thru big publishers authors get actually very little from every book purchase. So in practice most of your payments go to feed large majors. I wish more authors would go the path of self publishing so they can maximize their compensation per book sale.

More authors are self publishing than ever before. The problem is that you never hear about them. Additionally, the editing on self-published books tends to be a lot rougher.

Plus, they tend to price very cheap, so they’re probably getting about the same per sale as a normal royalty on a print book. And, any editorial service (proofreading, copyediting, etc.) has to be paid out of the author’s share.

Removing DRM from (purchased) EPUBs or MOBIs and/or converting them works extremely well. Otherwise, there are numerous websites that offer downloads, IRC and BitTorrent.

ebookhunter dot ch.

The pop-ups can be pretty annoying, but just close them and persist; the download links are pretty legit.

It would be nice to have an interface to calibre search of your ebook metadata. This can be done by wrapping calibredb.

Example use case. Run M-x calibre-search , prompt appears type title:space tags:scifi search is populated with the titles and metadata of matching books and you type to narrow hit enter book opens.

Also open-last-book and open-recent-books. I could probably add this to my existing wrapper.

My solution to this has been to export the library as a bibtex file, which is a built in feature of Calibre, and then use ivy-bibtex (could just as easily use helm-bibtex) to search the library. You need to make sure the file path is part of your bibtex export, but this is easily done. Also make sure you export tags.

Sounds kind of manual compared to what I do now which is wrap calibredb in a script and narrow if needed via rofi.


Presumably would be about as easy to do in elisp.

A call to calibredb in a library of thousands of books on ssd is only 1/3 -> 1/2 a second

It is a little more manual, but I think the end result is a little faster during use. This half second delay you mention doesn't happen with my library of over a thousand books. If I had something more automatic, I'd still want to be able to use bibtex. I like having the same interface for reading and citing, as well as the same for both reading books and reading papers.

I bet there is a command line way to export a bibtex library, so it could be automated in elisp. It takes seconds though, and I only do it a few times a month. Doesn't matter though, I settled on this workflow because I was already using bibtex regularly for papers. Probably makes less sense if you aren't already happy with bibtex.

Does anyone know of a tool like pdf-tools but for annotating epub?

Ideally I have an epub that I read on my kobo ereader. Afterwards I have a number of annotations (highlights) that I have listed in plain text.

I would like to be have the epub and the highlights side by side and be able to snap directly to the highlights position within the epub in order to take further notes.

For annotating, you can try org-noter:


If you want to get your annotations into org mode/emacs, look at this guy's work:


I haven't yet tried it, but I think he does something similar.

This goes well with doom - https://github.com/hlissner/doom-emacs

What's this read aloud package ?

This works quite well.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact