

Ask HN: How far are we from in-browser typesetting - mgualt

Given advances in computing power and browser abilities, how far are we from in-browser 
typesetting?  Suppose we ignore issues such as reflow and legacy browser support. I am asking whether true typesetting, such as is provided by the LaTeX engine, is now possible with up-to-date technology.
======
_delirium
Formatting for print is one place that's still a bit tricky. Not much control
over how browsers will print things, though some people I know have had
success with PrinceXML as a print converter. Doing lots of "normal" print
things like referencing page numbers is a bit unnatural as well, e.g.
replicating the LaTeX \label \ref pair that lets you write things like "(see
page 256)" without hard-coding it.

~~~
mgualt
Formatting for print? I assume you aren't talking about /actual/ printing on
paper... the whole point of my question is that we need a new typesetting
system for active and web-enabled electronic documents which provides as much
control over the reader experience as PDF does, but is not tied to the print
format and has new features which are now possible.

~~~
_delirium
Ah ok, I took you to be asking what's needed for HTML5 technologies to replace
LaTeX/ConTeXt and similar as general tools for typesetting, which would
include the ability to target both paper and screen output.

If it's for electronic documents purely, then I think we're pretty close, at
least in terms of what the technology supports. Making it _easy_ to do various
kinds of typesetting tasks are perhaps another story; there is a lot of stuff
that is technically possible with the tech modern browsers support but not
easy to do, certainly not with a workflow as automated as LaTeX's.

~~~
mgualt
I think you are absolutely right, and this brings us to a daunting challenge
-- how to plan and implement the document of the future...

------
andrewcooke
what is missing now? what difference is there between "true typsetting" and
current web layout engines? they seem pretty much identical to me.

maths used to be a weak point, but take this page from my blog -
<http://www.acooke.org/cute/Calibratin0.html> \- and scroll down to the maths.
then look at the source. the source is basically tex. it's rendered in the
browser using mathjax - <http://www.mathjax.org/>

that's all possible now.

~~~
mgualt
Web layout is nowhere near the kind of typesetting required for professional
quality documents and books.

I am aware of mathjax, which is useful for embedding snippets, but this is
still far from what I am talking about, which is a complete publishing system
such as latex, as is used by publishers for all kinds of documents, but for
electronic active documents with features that go beyond those of books and
PDF.

~~~
andrewcooke
instead of repeating your assertion perhaps you could explain how it is
"nowhere near"?

it's a different toolchain, with different end-users in mind. but you have an
underlying engine that arranges text with arbitrary modifications to font
styles in arbitrary regions.

the kind of differences i see are at the level of whether those regions are
rectangular or not (which css3 is addressing, i think). that is not critical
for most use.

~~~
mgualt
I think it is reasonably self-evident that one doesn't often find webpages
with the quality of typesetting that one finds in many printed books.

The TeX environment nowadays has hundreds of specialized packages which
provide micro-typography support, programmatic diagrams, advanced mathematical
features, detailed cross-referencing, multilanguage support, advanced
treatment of fonts including ligatures, linebreaking, justification, character
protrusion, grayscale balancing... A cursory glance at the PDFTeX author's
thesis work may give an idea of how far the system has evolved:

<http://www.pragma-ade.com/pdftex/thesis.pdf>

It is impossible for me to adequately convey the TeX ecosystem in a comment --
you may consider that my question is aimed at people already familiar with it.

Now, I understand that current web technology seems to be improving to the
point where, as you say, "you have an underlying engine that arranges text
with arbitrary modifications..." So, I am left with the question of whether
the TeX ecosystem can somehow be ported completely to create, say, HTML5
documents or something like this, instead of the current PDF output. I am
talking about something new - this will not be a PDF, and it will not be a
traditional webpage with features such as re-flow or "multi-browser
support"... this would be an electronic publication where the author has
complete control over the viewing experience. The advantage of this would be
the possibility of extending TeX even further, to allow for programmatic
control of the reading experience, to include features such as folding,
nonlinear document structure, embedded media and code, and many other things I
can't even fathom. If the current toolchain does in fact have the capabilities
for TeX to be implemented in it - then it would be very interesting to create
a plan for how to accomplish this goal in the proper way.

~~~
andrewcooke
i'm familiar with tex.

you seem to be asking two things.

(1) - can browsers support an ecosystem that matches what is available to tex?

(2) - can tex be used to replace web pages?

the answer to (1) is yes. browsers can be "programmed" with javascript to
modify layout, just as mathjax does. for example, you could write a java
package that formats text for printing by adding page numbers etc. this
requires someone to write the high level logic, but the underlying layout
engine already exists. you would also need a package system - something like
what is being developed (requirejs etc).

the quality of existing results is largely a function of market, maturity, and
expectations. you're comparing books (largely paid with existing toolchains)
against web pages (largely free, all this is pretty new). the layout engine
itself is, as i have already said, pretty much equivalent.

(2) is a confused mess that you don't seem to have thought out properly. and
given the tone of your other replies i sure ain't going to help you.

~~~
mgualt
In the TeX output, say the DVI file, we can (I presume, though I don't know
any details) find all the information about detailed placement and size of all
the characters and diagrams. This is provided in a certain format, according
to certain conceptual framework.

On the other hand, the web technologies, including javascript, can
theoretically be programmed to respond to such instructions and recreate the
same kind of control we get with PDFs, and even go beyond it.

What I wonder is: if we want this kind of inter-operability, what is the crux
of the matter? Where precisely does the mapping have to be made?

Thanks for your 2 cents, I "ain't" asking for your "help".

~~~
jasomill
Three things come immediately to mind: non-paginated layouts, immutable
output, and tool support.

First, while most of the basic TeX algorithms for things like line-breaking
and hyphenation could be implemented in terms of browser primitives, the
overall TeX system is irrevocably page-oriented. While the maximum page length
could theoretically be increased to "essentially infinite" through judicious
widening of TeX's integer data types, this would break lots of TeX code and
documents, for both performance and layout reasons.

Second, one of Knuth's main design goals for TeX was "perfect" backwards and
cross-platform compatibility for documents, to the extent that he implemented
everything in terms of integer math to avoid minor output variations due to
slightly different floating-point implementations on various systems. If this
is a design goal for a "drop-in" replacement system, it's not clear that the
browser is a good target platform, because you generally _want_ browser
vendors to take advantage of things like hardware accelerated rendering that
tend to be incompatible with "pixel-perfect" output for all permissible
resolutions.

In both of the above cases, it's clear that lots of preparatory work needs to
be done simply to understand what a high-quality platform for technical
publishing in "native Web form" should even look like — but chances are good
that it'll look quite a bit different than TeX, if only because the browser is
quite a bit different than the printed page.

Finally, I use Emacs to write TeX documents and I rarely use any particularly
sophisticated editor support beyond simple syntax awareness and highlighting —
and I'm hardly alone. The fact that TeX is, like Emacs, largely implemented
"in terms of itself" means that it's often more natural to implement
automation in the document (where I include various third party packages in my
definition of the "document", my point being simply that the document itself
drives the automation at "runtime") rather than relying on powerful external
editing and preprocessing tools. For authors like me who aren't professional
Web designers and/or JavaScript programmers, the prospect of using anything
related to HTML and DOM as bases for input and macros sounds like a nightmare.
People commonly use TeX equation notation in plain text contexts, because,
unlike SGML derivatives like HTML and MathML, TeX was designed to be written
and (at least to some extent) read by humans. Again, there are lots of
reasonable alternatives ranging from powerful editing tools that could
potentially at this point be implemented in the same language as the document
itself, which would have certain advantages, to extended versions of
preprocessed input formats like Markdown. But again, it'd take lots of work to
establish "one and only one" of these options as a standard.

------
macca321
[http://googledocs.blogspot.co.uk/2010/05/whats-different-
abo...](http://googledocs.blogspot.co.uk/2010/05/whats-different-about-new-
google-docs.html)

------
ChuckMcM
This has been possible for a while, its even easier with WebGL. I would not be
surprised if it made for a killer 'retina' display demo.

~~~
mgualt
I would be interested to know whether a version of the TeX typesetting program
could be implemented in WebGL or some appropriate tool.

I have been seeing a lot more HTML5 demos with unusual placement of fonts and
text. So it seems a no-brainer to port TeX to the browser (for many years this
was considered impossible for some reason)

~~~
ChuckMcM
Memory available to the browser was limited, sub-pixel rendering was
impractical. There is a commercial product 'MathType' that uses TeX to render
beautiful mathematics on web pages but I don't know if they have a WebGL
version or not.

If you read Don Knuth's book on TeX you can probably figure out how to make it
run.

~~~
mgualt
I will at least look at Knuth's book, though I doubt I will be able to make it
work!

