UBook – Markdown for eBooks

Hemospectrum · on Feb 8, 2014

I don't think this is a bad idea per se, but it seems to me that comparing it to Markdown is a poor way of communicating the intent and utility of this markup language. Markdown's central design principle is that, by adopting semantic markup conventions from email, it is readable in plain-text form even to someone who hasn't set out to learn the conventions. That severely limits the kind of thing you can express in Markdown, but that's an acceptable (even desired) tradeoff in the majority of situations where Markdown would be useful in the first place.

Because Markdown doesn't cover everything, and because there are other sets of semantic markup conventions used in a plain text setting, there's certainly room for "Markdown-likes" which attempt to replicate the same kind of effort with those other conventions in mind. For example, Fountain[1] is a language for writing screenplays, and its conventions are a fairly close (though not as close as possible) approximation of the same ones used in real screenplays.

I've never authored a manuscript for a book, so I don't know myself whether such conventions for things like tables of contents exist in the publishing world. But I have a feeling that if they do, they wouldn't look like this. What UBook actually reminds me of is troff/nroff macro syntax[2], which was developed for a similar purpose and is still in use today as the lingua franca of Unix manpages.

You've implied in other comment threads that you don't really intend this for writers to use directly. If this were to be the output of a more user-friendly writing tool, I'm curious what such a tool might look like, and how it could differ from existing WYSIWYG rich text editors (where everything looks like a nail and your hammers are labeled "bold" and "bigger font"). For example, how would you represent section and paragraph structure in an editable way without the use of inline control characters that you edit the same way as other text? I don't think I've ever seen an editor built that way, but if that's what you're going for, I look forward to it.

[1]: http://fountain.io/

[2]: http://liw.fi/manpages/

ubook · on Feb 8, 2014

UBook is a universal language to help authors, publishers and book designers easily markup and format books. Like markdown it uses simple, semantic tags to create a plain text file that can be transformed into something more complex such as HTML, saving the compiler the job of managing complex code.

It is designed as an intermediary language but it is hoped it can be a publishing language in its own right.

It can handle any type of book, from simple novels to very complex textbooks. Although similar to markdown it has a number of features to cope with the specific demands of book publishing:

1. The table of contents is compiled automatically using a simple, intuitive method;

2. No footnotes or endnotes;

3. Structural components unique to books are included e.g. scene breaks and page breaks;

4. Complex elements are included e.g. tables, links, images and indexes;

5. The final file is just a zip file; the manuscript is a plain text file.

UBook is a not-for-profit, open project designed to solve a problem. It simplifies the production of ebooks, and focuses on a universal method of doing so. It is at a pre-release stage, and I would welcome any insights you have to offer.

If you have an interest in ePublishing, web languages, HTML, markdown etc. please do check it out.

acangiano · on Feb 8, 2014

One minor nitpick, but I read your logo as UUBook. I would drop the second U and just have Book next to the blue U.

DougMerritt · on Feb 8, 2014

> No footnotes or endnotes;

I must be misreading that. How is lack of support for footnotes and endnotes a positive feature, as opposed to a glaringly missing feature?

> saving the compiler the job of managing complex code.

I re-read this 6 times, trying to figure it out. You mean, the human who creates the book?

On technical forums, I suggest you not use "compiler" in that sense, because we are accustomed to its technical meaning, of software that translates computer language.

On a positive note, I like markdown, and I dislike creating and reading XML by hand (although it works ok for software to deal with), so this seems like a fine idea to me.

ubook · on Feb 8, 2014

Footnotes are bits of text tagged on at the end, like Wikipedia for instance. UBook semantically tags them so they are available where the note exists in the book. Presumably a popup or panel when you touch or click the note marker in the text. It was a bad use of words. It was meant to convey the idea you didn't jump to the end of the chapter or book to read a note. HTML-based formats just use anchors.

Yes, compiler is ambiguous. The idea behind it is a clean interim language a writing tool could export to, and software could then adapt to other formats like Kindle etc.

FraaJad · on Feb 8, 2014

This appears way more complex than Pandoc, which can produce multiple output formats (yes, including ebooks) while using well know markdown syntax. Even the extensions to the syntax are a) from popular libraries (github, mmd) b) easy on the eye.

cies · on Feb 9, 2014

One up for Pandoc.

gamblor956 · on Feb 8, 2014

It seems that you've tackled a problem that doesn't really exist. Nobody really cares if the compiler has to manage complex code--that's what its there for. Pushing the work on to a human isn't the solution--the current html+css-based approach of the existing eBook standard already works well enough and there are already plenty of WYSIWYG editors to take care of the underlying source code.

Edit: Note that all of the major eBook standards are variants of HTML, with some specific extensions/modifications that are generally not difficult to deal with when converting between formats.

ubook · on Feb 8, 2014

You've possibly never tried to create ebooks then. Even something simple like a table of contents is handled quite differently by different formats. Many convertors aim to convert Word docs into ebooks, for example, and are poor.

But time will tell. You could be right. If there is no demand it will fail to catch on. But the ambition to to provide a single universal format that can then be output to different formats.

Thanks for the feedback though.

Edit: I meant to add; the goal is not for people to hand code it. The ideal is the format is used invisibly by writing software, or perhaps as an export format. Obviously I only "released" it an hour ago, so like markdown, in the first instance its a rough and ready thing, only usable in a text editor.

acabal · on Feb 8, 2014

For the most part it's either epub or mobi at this point in the ebook game, with the rare occasional pdf. Epub and mobi are essentially interchangeable thanks to Calibre, and if you need PDF then you probably need more complex formatting than a Markdownesque language can provide.

Converting from Word is a problem, but that's for different reasons: 1) authors (understandably) have no idea how to typeset, so each author has their own special bizarre style that's almost always wrong, 2) the HTML produced by Word is a hideous wreck no matter how you slice it, and 3) each different version of Word produces wildly different, but equally monstrous, HTML.

Sadly the word processing game has been won by Word. 99% of non-techie authors will use Word (or maybe Scrivener) to write their novel and it's tough to convince them to learn a formatting language when they can just press 'bold' on the toolbar.

In either case we already have a format to be invisibly used by writing software: HTML/CSS. It's the underlying language for epub and mobi, it's well understood and easily learned, it was specifically designed for book-like documents (and not web apps funnily enough), and programs already export in that format. The problem is that some programs (like Sigil) do a better job of exporting than others (like Word).

Things like TOCs can be handled by using an open, easily-editable format like epub as a base format, then using Calibre to invisibly compile to different formats. Calibre does a flawless job 99% of the time.

Edit: My creds are that I run one of the largest writing communities online, briefly ran an online ebook conversion service, and have composed and published ebooks myself.

cstross · on Feb 8, 2014

Sadly the word processing game has been won by Word. 99% of non-techie authors will use Word (or maybe Scrivener) to write their novel

That's because publishers rely on Word for workflow because everyone uses Word and they require interoperability.

They don't actually like Word any more than the rest of us do -- they mostly rely on Adobe InDesign for typesetting, and getting Word docs into InDesign is a bit painful unless the author is au fait with the publisher's own style sheet -- and if you talk to their electronic/internet publications specialists they'd love to find a usable alternative platform. Unfortunately they need such a platform to be universal, to support workflow-specific tasks such as copy editing (handled today, not terribly well, via Word's change tracking) or checking page proofs (handled today by going over PDFs until your eyeballs bleed), and to work for everyone they do business with. Scrivener is great, and generates very clean HTML/CSS/ePub, but it stops at the point when you hit "Compile" -- it doesn't interoperate with the proofing/production side of workflow. As for the rest ...

(Credentials: I write novels for a living and in the past week have been discussing this topic with Hachette's head of digital strategy and one of Penguin Random House's digital production specialists.)

dalai · on Feb 8, 2014

If it is not meant to be hand coded then I would think that starting with Markdown is a bad idea. Markdown was meant to be human readable with minimal markup. This means that there are many limitations as to what you can represent in Markdown.

To me, it would make sense to start with something more expressive. There are many different types of books (fiction, technical, poetry, interactive) and many types of semantic information that you will need to encode that publishers may require. Moreover, it is not just semantic information you need to capture. For some books the placement of the elements on the page, the font used, etc. may be important. If this format is also meant as an intermediate format for printed books then you only need to take a look at LaTeX and its numerous packages to get an idea of all the different things someone might require.

Even if you decide to go with something like Markdown, it makes sense to look at mmd and pandoc extensions (e.g. for tables) before coming up with another syntax.

Navarr · on Feb 8, 2014

I liked the $ for use of non-markdown meta-data.. but then you went and changed how paragraphs work. I was also pretty sure that \# was the outline marker..

I feel like you removed a lot of usefulness from markdown with this format.

austinstorm · on Feb 8, 2014

Yeah, you clearly have never made an eBook. It's a jungle out there.

alexchamberlain · on Feb 8, 2014

I love Markdown, and I've written a couple of essays in LaTeX. Unfortunately, this reminds me more of the latter. We live in a world of code completion and excessive hard drives. We don't need to use cryptic tags no one can remember. Nice idea, but maybe a little more Markdown and a little less LaTeX?

ubook · on Feb 8, 2014

Markdown is ultimately a pre-HTML format and inherits its weaknesses. It is a display language.

Books have their own foibles; table of contents, notes, indexes etc. These really need a semantic approach.

Finally, I'm not suggesting anyone handcode it. I use a writing tool on my iPad that uses markdown as its file format. I never see the markdown, I just bold text, add headings etc. That was the idea, to keep it in the background. Hence the reason I'm aiming for simplicity.

Incidentally, most books need only ten tags, five of which are used exactly once on the title page.

alexchamberlain · on Feb 8, 2014

Personally, I am fed up with WYSIWYG editors; none of them work, though LibreOffice comes close. Word freezes up if you use more than a couple of pictures.

tmzt · on Feb 8, 2014

Consider paste linking pictures in Word.

My ultimate word processing environment would be an empty web page where you can type Markdown and have the UX adapt to what you are typing, much like entering comments on StackOverflow. When you are done you could apply global styles or even page templates without changing the semantic content. You could even go back to authoring in the utf8-based language if you preferred.

Outline view in Word is useful, section navigation (and section styles) in OpenOffice are useful. Text to table would be great if it actually worked.

trvz · on Feb 8, 2014

This isn't thought through: if someone has started writing his book in Markdown he can't use this markup without converting the expressions; instead of extending the original in a useful way, this project is breaking compatibility.

jayvanguard · on Feb 8, 2014

Isn't it Mark UP if you use funny syntax that you wouldn't normally use? Markdown should use normal syntax that you may have used anyways, losing no readbility, but at the cost of some possible ambiguity.

sandGorgon · on Feb 8, 2014

Have you looked at Oreilly's HTMLBook [1] infrastructure ?

[1] https://github.com/oreillymedia/HTMLBook

riffraff · on Feb 9, 2014

honestly I do not understand: how is HTMLBook better than the old DocBook?

I mean, is

    <section data-type="preface"> ... </section>

really an improvement over

    <preface> ... </preface>

?

jjsz · on Feb 8, 2014

I use penflip, which lets me download text as PDF, Word doc, ePub, HTML, .txt, or as markdown source. Who are you targeting?

ubook · on Feb 8, 2014

I have no idea. I have coded for different ebook formats, and used various tools. I just felt it was a mess. I wanted to see if a simple markdown-style language might help create a universal approach that could be used as a basis for outputting to competing formats.

It is an experiment as much as anything, and it is on HN just to see what people think of the idea.

slaxman · on Feb 9, 2014

I thinks a nice idea. This is great for programmers, most of whom already understand markdown.

However, based on my experience, if you need to get wide spread adoption, it has to be simpler that MS Word. Why? Because most authors are used to that. It will be the first point of comparison.

My $0.02

jjsz · on Feb 8, 2014

I think you should embed a custom fiddle.

ArtifTh · on Feb 8, 2014

There already is FB2 format. Sure, its XML-based and many people may not like it, but why reinvent same thing againg?

bowerbird · on Feb 9, 2014

i have a system that's much more developed than this one.

it's also simpler, more powerful, and actually _working._

meanwhile, posts like this seem to get lots of attention.

even though the person just created an account to post.

i don't get it. can someone give me a pointer or two?

-bowerbird

guard-of-terra · on Feb 8, 2014

This looks like plain text version of FB2 but without software or device support.

latk · on Feb 8, 2014

For a format to be used, it has to be attractive in some form. Markdown is attractive because it's extremely easy to read and write. HTML is attractive because it can be displayed by all web browsers. LaTeX is attractive because it is extremely powerful, has the best math typesetting, and allows for user-defined abstractions. Your format is not easy to read. There is no implementation that displays it. It lacks many advanced features that make it useful, and those features that it does have are rather weird.

To evaluate the format, let's consider a couple of texts that could be written, with their unique requirements:

- A fantasy novel. The novel may rely on custom fonts, or multi-colored text. Your UBook cannot handle this (but to be fair, neither can Markdown).

- A collection of poems. Modern poems can have rather challenging layout requirements, which UBook does not offer (but to be fair, neither does Markdown).

- A highschool-level math textbook. This requires good support for math typesetting, figures, and tables. UBook has no clear vision to support math typesetting, e.g. through something like MathJax (to be fair, HTML relies on JavaScript or MathML for this, but some Markdown dialects are math-aware).

- A programming book. This requires a form of verbatim input and syntax highlighting. Neither is possible with UBook (to be fair this is a pain in HTML, and only some Markdown dialects are highlighting-aware).

- An academic paper. This has domain-specific requirements (e.g. math typesetting, special typesetting for linguistics) and also requires excellent handling of references (HTML: yes, Markdown: no) and footnotes (HTML: no, Markdown: in some dialects). The footnote handling in UBook is not sufficient (for example, it's not possible to reference the same footnote from multiple locations).

Of course that are rather demanding examples, and there are plenty of texts that don't need many of these features. What use cases remain? Any text without advanced layout or formatting requirements, and without a need for much structure: wide swaths of literature, and non-demanding technical writing. Is your UBook the best choice for those use cases? Wouldn't it be easier to write it as Markdown, and compile it via HTML to a MOBI or EPUB before deploying?

Now that I've looked at the scope of the language, let's look at the finer design decisions of UBook itself.

- For example, the whole book has to be in a single file. This makes it unwieldy to write. Adding an "include" feature could help.

- All commands except the table commands are terribly short, which makes them hard to remember. This makes sense for often-used commands, but seldom-used commands can be longer. for example, "$A" is not as memorable as "$AUTHOR".

- UBook considers a line break to be a paragraph separator. This is bad for two reasons: (1) Advanced users with version control might want to split paragraphs onto multiple lines for better "diff"s. (2) Line breaks and paragraphs are semantically different, and HTML, Markdown, and LaTeX include ways to output a line break without starting a new paragraph.

- The "$E" command (empty line) makes no sense whatsoever. No other format I know supports such a feature (although it can be faked with line breaks).

- Inventing a new link syntax is highly unnecessary. The one used by Markdown is pretty good, and allows us to refer to a link target via a shorthand:

    Some Text [inline](link) or [keyword] link which can be used with [other text][keyword] as well

    [keyword]: some link

- For the given use cases, the index functionality is unnecessary: A text complex enough to require a glossary will likely have more requirements that are not met by UBook, e.g. a way to link to a glossary entry in the text.

- Using the same character to introduce a block-level command and a headline is very confusing. It might be better to use explicit headline commands like "$SUBSECTION" and "$CHAPTER" (compare LaTeX).

- The format appears to lack composability. E.g. it's not obvious whether a block-level image can be included in a quote. It does not seem like a table cell could contain another table. It doesn't seem like a list item could contain multiple paragraphs (and if it does, where does the item end and normal text start?). Of course, this makes it very easy to parse (which is unnecessary, given today's state of parsing technology).

- The fact that the three-character sequence "$Q " has to be included before each quoted line is a WTF in itself (and a misfeature shared similarly by Markdown). This is undue burden on the author as he has to type it, and undue burden on the parser as this makes the language non-context free. Using a quotation-start and quotation-end marker would be much better for all involved.

- I fail to see the difference between horizontal lines and "$S" section separators.

- The "$END" is a horrible anachronism and should die. The end of file is good enough to denote the end. Requiring such a token means that all your examples on the page won't work, and that many authors would be surprised when their magnus opus doesn't display.

Your UBook in its current state is not a viable markup language. It has a few interesting concepts (indices, footnotes, book-orientation, table-cell merging), but isn't really attractive when we have so many other possible languages. Before redesigning the language, take a look at AsciiDoc, MultiMarkdown, and older markup languages like LaTeX, Perl's POD, and troff. Learn from their mistakes instead of repeating them! Remember that many restrictions of these formats do not apply to you!

UBook as a distribution format is unlikely to ever become a viable option even though it has some nice features like excerpts (I would like to see the ebook format market to be disrupted, but UBook won't manage that). It would be better to de-emphasize this part of UBook, and focus on it as an authoring tool that can compile to various ebook formats.

philip1209 · on Feb 8, 2014

I would be interested in LaTeX compatibility with Markdown files

mikepurvis · on Feb 8, 2014

Pandoc provides Markdown converters for many formats, including LaTeX. Under the hood, its PDF output is via LaTeX.

ecspike · on Feb 9, 2014

My frequent path is Orgmode (which is just text and very md-like) which can export to Markdown proper and a bunch of other formats including PDF. It can autogenerate leverage pandoc and also make a TOC for you.

jsnk · on Feb 8, 2014

Is there a converter currently?

ubook · on Feb 8, 2014

No, the language itself is pre-release. The posting here is basically a first showing.

The idea is to encourage convertors of course. For the author or book designer to mark it up as a UBook, then be able to export that to anything e.g. Kindle, ePub etc.

The focus at the moment is in defining the tags, then encouraging others to use it as a universal method to semantically encode books in a very simple way.

otikik · on Feb 8, 2014

$I $don't $like $this.

andyl · on Feb 8, 2014

I really want a better markup for producing books. I'm not sure that this is it.