
UBook – Markdown for eBooks - ubook
http://ubook.info/
======
Hemospectrum
I don't think this is a bad idea per se, but it seems to me that comparing it
to Markdown is a poor way of communicating the intent and utility of this
markup language. Markdown's central design principle is that, by adopting
semantic markup conventions from email, it is readable in plain-text form even
to someone who hasn't set out to learn the conventions. That severely limits
the kind of thing you can express in Markdown, but that's an acceptable (even
desired) tradeoff in the majority of situations where Markdown would be useful
in the first place.

Because Markdown doesn't cover everything, and because there are other sets of
semantic markup conventions used in a plain text setting, there's certainly
room for "Markdown-likes" which attempt to replicate the same kind of effort
with those other conventions in mind. For example, Fountain[1] is a language
for writing screenplays, and its conventions are a fairly close (though not as
close as possible) approximation of the same ones used in real screenplays.

I've never authored a manuscript for a book, so I don't know myself whether
such conventions for things like tables of contents exist in the publishing
world. But I have a feeling that if they do, they wouldn't look like this.
What UBook actually reminds me of is troff/nroff macro syntax[2], which was
developed for a similar purpose and is still in use today as the lingua franca
of Unix manpages.

You've implied in other comment threads that you don't really intend this for
writers to use directly. If this were to be the output of a more user-friendly
writing tool, I'm curious what such a tool might look like, and how it could
differ from existing WYSIWYG rich text editors (where everything looks like a
nail and your hammers are labeled "bold" and "bigger font"). For example, how
would you represent section and paragraph structure in an editable way without
the use of inline control characters that you edit the same way as other text?
I don't think I've ever seen an editor built that way, but if that's what
you're going for, I look forward to it.

[1]: [http://fountain.io/](http://fountain.io/)

[2]: [http://liw.fi/manpages/](http://liw.fi/manpages/)

------
ubook
UBook is a universal language to help authors, publishers and book designers
easily markup and format books. Like markdown it uses simple, semantic tags to
create a plain text file that can be transformed into something more complex
such as HTML, saving the compiler the job of managing complex code.

It is designed as an intermediary language but it is hoped it can be a
publishing language in its own right.

It can handle any type of book, from simple novels to very complex textbooks.
Although similar to markdown it has a number of features to cope with the
specific demands of book publishing:

1\. The table of contents is compiled automatically using a simple, intuitive
method;

2\. No footnotes or endnotes;

3\. Structural components unique to books are included e.g. scene breaks and
page breaks;

4\. Complex elements are included e.g. tables, links, images and indexes;

5\. The final file is just a zip file; the manuscript is a plain text file.

UBook is a not-for-profit, open project designed to solve a problem. It
simplifies the production of ebooks, and focuses on a universal method of
doing so. It is at a pre-release stage, and I would welcome any insights you
have to offer.

If you have an interest in ePublishing, web languages, HTML, markdown etc.
please do check it out.

~~~
DougMerritt
> No footnotes or endnotes;

I must be misreading that. How is lack of support for footnotes and endnotes a
positive feature, as opposed to a glaringly missing feature?

> saving the compiler the job of managing complex code.

I re-read this 6 times, trying to figure it out. You mean, the human who
creates the book?

On technical forums, I suggest you not use "compiler" in that sense, because
we are accustomed to its technical meaning, of software that translates
computer language.

On a positive note, I like markdown, and I dislike creating and reading XML by
hand (although it works ok for software to deal with), so this seems like a
fine idea to me.

~~~
ubook
Footnotes are bits of text tagged on at the end, like Wikipedia for instance.
UBook semantically tags them so they are available where the note exists in
the book. Presumably a popup or panel when you touch or click the note marker
in the text. It was a bad use of words. It was meant to convey the idea you
didn't jump to the end of the chapter or book to read a note. HTML-based
formats just use anchors.

Yes, compiler is ambiguous. The idea behind it is a clean interim language a
writing tool could export to, and software could then adapt to other formats
like Kindle etc.

------
FraaJad
This appears way more complex than Pandoc, which can produce multiple output
formats (yes, including ebooks) while using well know markdown syntax. Even
the extensions to the syntax are a) from popular libraries (github, mmd) b)
easy on the eye.

~~~
cies
One up for Pandoc.

------
gamblor956
It seems that you've tackled a problem that doesn't really exist. Nobody
really cares if the compiler has to manage complex code--that's what its there
for. Pushing the work on to a human isn't the solution--the current html+css-
based approach of the existing eBook standard already works well enough and
there are already plenty of WYSIWYG editors to take care of the underlying
source code.

Edit: Note that all of the major eBook standards are variants of HTML, with
some specific extensions/modifications that are generally not difficult to
deal with when converting between formats.

~~~
ubook
You've possibly never tried to create ebooks then. Even something simple like
a table of contents is handled quite differently by different formats. Many
convertors aim to convert Word docs into ebooks, for example, and are poor.

But time will tell. You could be right. If there is no demand it will fail to
catch on. But the ambition to to provide a single universal format that can
then be output to different formats.

Thanks for the feedback though.

Edit: I meant to add; the goal is not for people to hand code it. The ideal is
the format is used invisibly by writing software, or perhaps as an export
format. Obviously I only "released" it an hour ago, so like markdown, in the
first instance its a rough and ready thing, only usable in a text editor.

~~~
acabal
For the most part it's either epub or mobi at this point in the ebook game,
with the rare occasional pdf. Epub and mobi are essentially interchangeable
thanks to Calibre, and if you need PDF then you probably need more complex
formatting than a Markdownesque language can provide.

Converting from Word _is_ a problem, but that's for different reasons: 1)
authors (understandably) have no idea how to typeset, so each author has their
own special bizarre style that's almost always wrong, 2) the HTML produced by
Word is a hideous wreck no matter how you slice it, and 3) each different
version of Word produces wildly different, but equally monstrous, HTML.

Sadly the word processing game has been won by Word. 99% of non-techie authors
will use Word (or maybe Scrivener) to write their novel and it's tough to
convince them to learn a formatting language when they can just press 'bold'
on the toolbar.

In either case we already have a format to be invisibly used by writing
software: HTML/CSS. It's the underlying language for epub and mobi, it's well
understood and easily learned, it was specifically designed for book-like
documents (and not web apps funnily enough), and programs _already_ export in
that format. The problem is that some programs (like Sigil) do a better job of
exporting than others (like Word).

Things like TOCs can be handled by using an open, easily-editable format like
epub as a base format, then using Calibre to invisibly compile to different
formats. Calibre does a flawless job 99% of the time.

Edit: My creds are that I run one of the largest writing communities online,
briefly ran an online ebook conversion service, and have composed and
published ebooks myself.

~~~
cstross
_Sadly the word processing game has been won by Word. 99% of non-techie
authors will use Word (or maybe Scrivener) to write their novel_

That's because publishers rely on Word for workflow _because everyone uses
Word_ and they require interoperability.

They don't actually _like_ Word any more than the rest of us do -- they mostly
rely on Adobe InDesign for typesetting, and getting Word docs into InDesign is
a bit painful unless the author is au fait with the publisher's own style
sheet -- and if you talk to their electronic/internet publications specialists
they'd _love_ to find a usable alternative platform. Unfortunately they need
such a platform to be universal, to support workflow-specific tasks such as
copy editing (handled today, not terribly well, via Word's change tracking) or
checking page proofs (handled today by going over PDFs until your eyeballs
bleed), and to work for everyone they do business with. Scrivener is great,
and generates very clean HTML/CSS/ePub, but it stops at the point when you hit
"Compile" \-- it doesn't interoperate with the proofing/production side of
workflow. As for the rest ...

(Credentials: I write novels for a living and in the past week have been
discussing this topic with Hachette's head of digital strategy and one of
Penguin Random House's digital production specialists.)

------
alexchamberlain
I love Markdown, and I've written a couple of essays in LaTeX. Unfortunately,
this reminds me more of the latter. We live in a world of code completion and
excessive hard drives. We don't need to use cryptic tags no one can remember.
Nice idea, but maybe a little more Markdown and a little less LaTeX?

~~~
ubook
Markdown is ultimately a pre-HTML format and inherits its weaknesses. It is a
display language.

Books have their own foibles; table of contents, notes, indexes etc. These
really need a semantic approach.

Finally, I'm not suggesting anyone handcode it. I use a writing tool on my
iPad that uses markdown as its file format. I never see the markdown, I just
bold text, add headings etc. That was the idea, to keep it in the background.
Hence the reason I'm aiming for simplicity.

Incidentally, most books need only ten tags, five of which are used exactly
once on the title page.

~~~
alexchamberlain
Personally, I am fed up with WYSIWYG editors; none of them work, though
LibreOffice comes close. Word freezes up if you use more than a couple of
pictures.

~~~
tmzt
Consider paste linking pictures in Word.

My ultimate word processing environment would be an empty web page where you
can type Markdown and have the UX adapt to what you are typing, much like
entering comments on StackOverflow. When you are done you could apply global
styles or even page templates without changing the semantic content. You could
even go back to authoring in the utf8-based language if you preferred.

Outline view in Word is useful, section navigation (and section styles) in
OpenOffice are useful. Text to table would be great if it actually worked.

------
trvz
This isn't thought through: if someone has started writing his book in
Markdown he can't use this markup without converting the expressions; instead
of extending the original in a useful way, this project is breaking
compatibility.

------
jayvanguard
Isn't it Mark UP if you use funny syntax that you wouldn't normally use?
Markdown should use normal syntax that you may have used anyways, losing no
readbility, but at the cost of some possible ambiguity.

------
sandGorgon
Have you looked at Oreilly's HTMLBook [1] infrastructure ?

[1]
[https://github.com/oreillymedia/HTMLBook](https://github.com/oreillymedia/HTMLBook)

~~~
riffraff
honestly I do not understand: how is HTMLBook better than the old DocBook?

I mean, is

    
    
        <section data-type="preface"> ... </section>
    

really an improvement over

    
    
        <preface> ... </preface>
    
    ?

------
jjsz
I use penflip, which lets me download text as PDF, Word doc, ePub, HTML, .txt,
or as markdown source. Who are you targeting?

~~~
ubook
I have no idea. I have coded for different ebook formats, and used various
tools. I just felt it was a mess. I wanted to see if a simple markdown-style
language might help create a universal approach that could be used as a basis
for outputting to competing formats.

It is an experiment as much as anything, and it is on HN just to see what
people think of the idea.

~~~
slaxman
I thinks a nice idea. This is great for programmers, most of whom already
understand markdown.

However, based on my experience, if you need to get wide spread adoption, it
has to be simpler that MS Word. Why? Because most authors are used to that. It
will be the first point of comparison.

My $0.02

------
ArtifTh
There already is FB2 format. Sure, its XML-based and many people may not like
it, but why reinvent same thing againg?

------
bowerbird
i have a system that's much more developed than this one.

it's also simpler, more powerful, and actually _working._

meanwhile, posts like this seem to get lots of attention.

even though the person just created an account to post.

i don't get it. can someone give me a pointer or two?

-bowerbird

------
guard-of-terra
This looks like plain text version of FB2 but without software or device
support.

------
latk
For a format to be used, it has to be attractive in some form. Markdown is
attractive because it's extremely easy to read and write. HTML is attractive
because it can be displayed by all web browsers. LaTeX is attractive because
it is extremely powerful, has the best math typesetting, and allows for user-
defined abstractions. Your format is not easy to read. There is no
implementation that displays it. It lacks many advanced features that make it
useful, and those features that it does have are rather weird.

To evaluate the format, let's consider a couple of texts that could be
written, with their unique requirements:

\- A fantasy novel. The novel may rely on custom fonts, or multi-colored text.
Your UBook cannot handle this (but to be fair, neither can Markdown).

\- A collection of poems. Modern poems can have rather challenging layout
requirements, which UBook does not offer (but to be fair, neither does
Markdown).

\- A highschool-level math textbook. This requires good support for math
typesetting, figures, and tables. UBook has no clear vision to support math
typesetting, e.g. through something like MathJax (to be fair, HTML relies on
JavaScript or MathML for this, but some Markdown dialects are math-aware).

\- A programming book. This requires a form of verbatim input and syntax
highlighting. Neither is possible with UBook (to be fair this is a pain in
HTML, and only some Markdown dialects are highlighting-aware).

\- An academic paper. This has domain-specific requirements (e.g. math
typesetting, special typesetting for linguistics) and also requires excellent
handling of references (HTML: yes, Markdown: no) and footnotes (HTML: no,
Markdown: in some dialects). The footnote handling in UBook is not sufficient
(for example, it's not possible to reference the same footnote from multiple
locations).

Of course that are rather demanding examples, and there are plenty of texts
that don't need many of these features. What use cases remain? Any text
without advanced layout or formatting requirements, and without a need for
much structure: wide swaths of literature, and non-demanding technical
writing. Is your UBook the _best_ choice for those use cases? Wouldn't it be
easier to write it as Markdown, and compile it via HTML to a MOBI or EPUB
before deploying?

Now that I've looked at the _scope_ of the language, let's look at the finer
design decisions of UBook itself.

\- For example, the whole book has to be in a single file. This makes it
unwieldy to write. Adding an "include" feature could help.

\- All commands except the table commands are terribly short, which makes them
hard to remember. This makes sense for often-used commands, but seldom-used
commands can be longer. for example, "$A" is not as memorable as "$AUTHOR".

\- UBook considers a line break to be a paragraph separator. This is bad for
two reasons: (1) Advanced users with version control might want to split
paragraphs onto multiple lines for better "diff"s. (2) Line breaks and
paragraphs are semantically different, and HTML, Markdown, and LaTeX include
ways to output a line break without starting a new paragraph.

\- The "$E" command (empty line) makes no sense whatsoever. No other format I
know supports such a feature (although it can be faked with line breaks).

\- Inventing a new link syntax is highly unnecessary. The one used by Markdown
is pretty good, and allows us to refer to a link target via a shorthand:

    
    
        Some Text [inline](link) or [keyword] link which can be used with [other text][keyword] as well
    
        [keyword]: some link
    

\- For the given use cases, the index functionality is unnecessary: A text
complex enough to require a glossary will likely have more requirements that
are not met by UBook, e.g. a way to link to a glossary entry in the text.

\- Using the same character to introduce a block-level command and a headline
is _very_ confusing. It might be better to use explicit headline commands like
"$SUBSECTION" and "$CHAPTER" (compare LaTeX).

\- The format appears to lack composability. E.g. it's not obvious whether a
block-level image can be included in a quote. It does not seem like a table
cell could contain another table. It doesn't seem like a list item could
contain multiple paragraphs (and if it does, where does the item end and
normal text start?). Of course, this makes it very easy to parse (which is
unnecessary, given today's state of parsing technology).

\- The fact that the three-character sequence "$Q " has to be included before
each quoted line is a WTF in itself (and a misfeature shared similarly by
Markdown). This is undue burden on the author as he has to type it, and undue
burden on the parser as this makes the language non-context free. Using a
quotation-start and quotation-end marker would be much better for all
involved.

\- I fail to see the difference between horizontal lines and "$S" section
separators.

\- The "$END" is a horrible anachronism and should die. The end of file is
good enough to denote the end. Requiring such a token means that all your
examples on the page won't work, and that many authors would be surprised when
their _magnus opus_ doesn't display.

Your UBook in its current state is not a viable markup language. It has a few
interesting concepts (indices, footnotes, book-orientation, table-cell
merging), but isn't really attractive when we have so many other possible
languages. Before redesigning the language, take a look at _AsciiDoc_ ,
_MultiMarkdown_ , and older markup languages like _LaTeX_ , Perl's _POD_ , and
_troff_. Learn from their mistakes instead of repeating them! Remember that
many restrictions of these formats do not apply to you!

UBook as a distribution format is unlikely to ever become a viable option even
though it has some nice features like excerpts (I would like to see the ebook
format market to be disrupted, but UBook won't manage that). It would be
better to de-emphasize this part of UBook, and focus on it as an authoring
tool that can compile to various ebook formats.

------
philip1209
I would be interested in LaTeX compatibility with Markdown files

~~~
mikepurvis
Pandoc provides Markdown converters for many formats, including LaTeX. Under
the hood, its PDF output is via LaTeX.

~~~
ecspike
My frequent path is Orgmode (which is just text and very md-like) which can
export to Markdown proper and a bunch of other formats including PDF. It can
autogenerate leverage pandoc and also make a TOC for you.

------
jsnk
Is there a converter currently?

~~~
ubook
No, the language itself is pre-release. The posting here is basically a first
showing.

The idea is to encourage convertors of course. For the author or book designer
to mark it up as a UBook, then be able to export that to anything e.g. Kindle,
ePub etc.

The focus at the moment is in defining the tags, then encouraging others to
use it as a universal method to semantically encode books in a very simple
way.

------
otikik
$I $don't $like $this.

------
andyl
I really want a better markup for producing books. I'm not sure that this is
it.

