
Tectonic – TeX/LaTeX Engine in Rust - xvilka
https://tectonic-typesetting.github.io/en-US/
======
svat
I looked into this project when it was first announced. The “in Rust” part
seems more aspirational than reality. For those who may not know, Knuth
originally wrote TeX in a language called WEB, which is basically Pascal with
some preprocessors for making it usable and documentable. Later extensions to
TeX, including eTeX, pdfTeX and XeTeX have also been written in WEB. The
existing TeX distributions (TeX Live, MikTeX, etc), at their core first
translate this WEB/Pascal into (autogenerated and basically unreadable) C,
then run it through a C compiler, etc.

What this project has done is take the auto-generated translation into C of
xetex.web, and wrapped this (unreadable) C code in Rust — which is an odd
choice to say the least. It seems (from Reddit comments by the author) that
the reason is that author of this project at the time was unaware of LuaTeX,
which (starting with a manual and readable translation into C years ago) is
actually written in C.

All these odd choices aside, and barring the somewhat misleading “in Rust”
description (misleading for another reason: the TeX/LaTeX ecosystem is mostly
TeX macros rather than the core “engine” anyway), there are some good user-
experience decisions made by this project. With a regular TeX distribution
these would be achieved with something like latexmk/rubber/arara, which too
are wrappers around TeX much like this project.

There is still room for someone to do a “real” rewrite of TeX (in Rust or
whatever), but as someone on TeX.SE said, it is very easy to start a rewrite
of TeX; the challenge is to finish it.

~~~
xvilka
Efforts to ditch any C remnants are being made[1].

[1]
[https://github.com/crlf0710/tectonic/](https://github.com/crlf0710/tectonic/)

~~~
svat
Thanks. Is there any code to look at? (Couldn't easily find anything relevant
on that repo…) I didn't mention it earlier, but I'm collecting a list of
alternative TeX implementations[1] so if there's even (say) a dozen lines of
WEB that have been converted to Rust, I'd be very eager to take a look and
compare — it would be illuminating.

[1]:
[https://tex.stackexchange.com/questions/507846](https://tex.stackexchange.com/questions/507846)

Edit: To answer my own question. The relevant source (corresponding to the
main part of xetex.web) is here:
[https://github.com/crlf0710/tectonic/blob/2580c55/engine/src...](https://github.com/crlf0710/tectonic/blob/2580c55/engine/src/xetex_xetex0.rs)
I've posted a comparison of some example code first from the official
xetex.web listing, and then from the “Rust” code here:
[https://gist.github.com/shreevatsa/627399d0150e66d211a264bc0...](https://gist.github.com/shreevatsa/627399d0150e66d211a264bc05b33beb)

You can draw your own conclusions from comparing the code samples, but to
point out a few obvious differences:

• What has the symbolic name “fi_or_else” in the WEB code has become the magic
number “108” in the Rust code. (This is because the author of this project
decided to have their Rust code start from the autogenerated C code which has
already lost this symbolic information.)

• What is simply “if tracing_ifs>0” in the WEB code is 29 lines of Rust code,
involving a magic offset into eqtb.

• The comments from the original are gone.

• Something like “cur_if:=subtype(p);” becomes “cur_if = (*mem.offset(p as
isize)).b16.s0 as small_number;”.

I wonder how maintainable such Rust code will be. These problems are not
insurmountable and the code can always be cleaned up later I guess… my point
is simply that at the moment it is not idiomatic Rust code, for instance.

~~~
xvilka
It's the result of automated c2rust[1] conversion. Of course it will be
cleaned up. C2rust itself provides handy refactoring tools scriptable in Lua
language. A lot of the code will be removed, for example image handling can be
done through image-rs crate, etc.

[1] [https://github.com/immunant/c2rust](https://github.com/immunant/c2rust)

~~~
svat
All that's fine, and good luck to you; I wish you well.

But I guess I wasn't clear. This project is described as a TeX engine “in
Rust”. Has any code been actually _written in Rust_ —as opposed to either Rust
code that wraps C code (as in the main repo), or autogenerated Rust code (as
in this one you linked)—for any of the core parts (as opposed to system-
dependencies like I/O or whatever) of the TeX engine?

I'm genuinely curious, and if there is any such code I'd like to read it (and
compare it to the equivalent WEB code).

I completely understand that this could be done at some future date. The
question is about the description of the current state of the project. The
impression from a statement like “Efforts to ditch any C remnants are being
made” is that the C code is only a few “remnants”, when in fact it appears to
be the entire TeX engine itself.

(Also, I do not foresee any refactoring tool being able to convert, for
instance, “108” into “fi_or_else”, recovering information that's already lost.
The point here is that it would have been better to start with the original
WEB code, not the lossy and autogenerated C code.)

------
fock
And the price for the most missleading headline this week goes toooo THIS.

This is a TeX-manager/workflow automation/MikTeX+parrot clone. It's not a TeX-
Engine, e.g. parsing TeX, and being extensible and faster than the
alternatives (classic TeX, XeTEx, LuaTeX). To add insult to injury: writing
Rust into the headline for me triggers a "better, faster TeX"-wish - went
there, curious how much of TeX they already supported, came back a little
disappointed.

------
jpfr
This is all good and well. But why XeTeX as a starting point instead of
LuaTeX?

Most distributions started the transition from pdfTeX to LuaTeX as the main
implementation. And the code from the new translation from the original Pascal
to C is pretty nice.

And LuaTeX also has UTF8 support and modern font handling.

~~~
pantalaimon
> And LuaTeX also has UTF8 support and modern font handling.

I always thought this was the whole point of XeTeX?

~~~
jcelerier
LuaTeX is a kind of successor to XeTeX. Biggest thing is that xelatex / latex
/ etc all had to use fixed size memory management while luatex supports
dynamic allocation (which means no more "TeX capacity exceeded, sorry."
dreaded error message)

~~~
ratmice
Dynamic memory allocation is not always a desirable feature. If I were running
a service like Arxiv, I would much rather have fixed memory management than
dynamic.

Similarly, if I was running TeX in web assembly, in a tab in a browser I'd
much rather hit "TeX capacity exceeded, sorry." than have it try to scuttle
off with all the swap...

------
jimhefferon
Anyone have any experience? I see a date of a year ago on the page.

The latest TeX Users Group meeting, a month or two ago in San Francisco,
featured a number of amazing demos of TeX and LaTeX systems compiling large
documents basically instantly. Amazing what modern CPU's can do. The TUGboat
with the papers should be out soon.

~~~
wenc
> compiling large documents basically instantly

Would be interested to see this. TeXLive has always been able to compile
documents instantly even a decade ago, but only documents that were a few
pages long.

When you got into dissertation-sized folders of documents with lots of TikZ
images, it takes a little longer but still not that long. (my 160 page
dissertation compiled in 10-15s back in the early 2010s on a Core 2 Duo with a
non-SSD drive)

It was hard to get much faster than that for large documents because LaTeX has
(had?) critical paths that defy parallelization.

If it is indeed possible to achieve instantaneous compilation for large
documents, that would make live-compilation a reality (live compilation
already is a reality but for small documents).

~~~
jimhefferon
One demo compiled the TeXbook, again basically instantly.

~~~
zzleeper
Do you remember which one? Can't seem to find it here:

[https://tug.org/tug2019/program.html](https://tug.org/tug2019/program.html)

~~~
DoctorOetker
I would also really like to know more about this.

Could it possibly be the following entry?

Saturday, August 10 10:00am David Fuchs What six orders of magnitude of space-
time buys you

>TeX and MF were designed to run acceptably fast on computers with less than
1/1000th the memory and 1/1000th the processing power of modern devices. Many
of the design trade-offs that were made are no longer required or even
appropriate.

~~~
drfuchs
Yes, indeed.

An absolute plain vanilla TeX, exactly as Knuth wrote it, and my tool chain
compiles it, composes all 495 pages of The TeXbook in 0.300 seconds on a 2012
MacBook Pro laptop (the first "Retina" model). Single threaded, composing
pages in 0.6 msec each, running in well under 1 Megabyte total for code and
data. Back on the SAIL mainframe (a Dec PDP10) that Knuth used to develop TeX,
it was almost exactly 1000 times slower: the pages would tick by every half-
second or so (at night, anyway).

Of course, nowadays we also have lots more memory to throw around. One cool
idea is to modify TeX to retain all the internal data structures for the pages
it has created, and run a nice screen viewer directly from that; Doug McKenna
gave a slick presentation at the recent Palo Alto / Stanford TUG meeting of
such a system that he created in order to enable live viewing of his Hilbert
Curves textbook, including displaying figures of space-filling fractal curves
that can be arbitrarily zoomed, which is simply impossible to do via PDF.

Going further, you can additionally modify TeX so it takes snapshots of its
internal state after every page, and is able to pop back to any of these
states. Presto, now if the user makes an edit on page 223, TeX can quickly
back up to the state of the world just before page 223, and continue on
forward with the modified input. Page 223 gets recomposed and immediately
redisplayed, essentially in real-time. Of course, the trick here is creating
and storing the snapshots efficiently; the TUG demo I gave using The TeXbook
runs in a few hundred megabytes, and does the whole "pop back, recompose a
page, redisplay it" rigamarole in milliseconds.

The bad news is that my stuff is still in the proof-of-concept stage, as
there's no support for the well-established extensions to Knuth's TeX
(importing graphics, using system fonts, etc.) that are required by the vast
majority of LaTeX users. I don't expect any of these features to slow things
down appreciably, but time will tell. I intend to do a "Show HN" by and by,
with lots more details, when it's able to handle real-world documents.

My apologies for failing to successfully fly under the radar until things were
ready for prime time. My premature TUG demo was intended to wow Prof. Knuth
sufficiently that he'd approve of a decades-late Ph.D. for me. (Happily, he
did agree, contingent on just one additional feature being added...)

~~~
blt
> _Presto, now if the user makes an edit on page 223, TeX can quickly back up
> to the state of the world just before page 223, and continue on forward with
> the modified input. Page 223 gets recomposed and immediately redisplayed,
> essentially in real-time._

Isn't it possible, in the worst case, that editing the source line that maps
to page 223 could trigger re-rendering arbitrarily far back before page 223?
Like if you wrote all 223 pages without any chapters, parts, \newpage, etc.
How does your program handle this?

~~~
drfuchs
Sure. It seems best to redisplay quickly, then update the screen again when
everything is quiescent (the user hasn’t typed anything for a few tenths of a
second, and the whole document has been fully recompiled with no changes
detected). Usually it’s not even noticeable, though of course there are
degenerate cases where a document oscillates, which gets called out in the UI
in the unlikely case it happens.

------
abdullahkhalids
I am happy somebody is working on this, but I really wish somebody would
rewrite TeX and derivatives from scratch in a modern language. The syntax of
LaTeX is mostly adequate, and should be kept so people don't have to relearn
everything. But the list of other improvements that need to be made is a km
long.

* The backend should be in a language that allows for easier editing and package development.

* Need modern bidirectional UTF8 font support.

* Need the compiler to stop producing a bunch of extra files in the same folder, which is a significant adoption barrier.

* Need a way to generate clean html output with nice css.

* Tikz is great, but it would be excellent if it was possible to include the graphical output of any language by writing inline code (org-mode style).

* Same for mathematics - if I can send input to Sage or Mathematica and print the results from within the tex files, life would be so much easier.

* Beamer is interesting, but it is hard to make anything but rather bland scientific presentation in it. A framework for rapid design prototyping in beamer would help so much.

* Etc etc

~~~
jcelerier
> * The backend should be in a language that allows for easier editing and
> package development.

LuaTeX allows writing packages in Lua

> * Need modern bidirectional UTF8 font support.

LuaTeX supports bidir, not sure if you refer to something more specific

> * Need the compiler to stop producing a bunch of extra files in the same
> folder, which is a significant adoption barrier.

you can just run your build in a build folder like you would in C, etc ?

~~~
abdullahkhalids
I know it's possible to have workarounds and I have used them as needed. But
the workarounds still have bugs, or are OS specific (your third one for
instance), and they don't solve the problem of slow compilation.

There are also fundamental limits to TeX language such as the number of
arguments to a function limited to 9 [1], or overloading functions is a pain
[2], etc etc.

The advantage of rewriting from scratch in a modern language is that these
issues can all be dealt with without workarounds.

[1] [https://tex.stackexchange.com/questions/2132/how-to-
define-a...](https://tex.stackexchange.com/questions/2132/how-to-define-a-
command-that-takes-more-than-9-arguments)

[2] [https://tex.stackexchange.com/questions/448877/check-
number-...](https://tex.stackexchange.com/questions/448877/check-number-of-
arguments-in-a-command)

------
xvilka
There is some code that is still in C/C++ - remnants of XeTeX, but there is a
project[1] to convert all legacy code in Rust.

[1]
[https://github.com/crlf0710/tectonic/](https://github.com/crlf0710/tectonic/)

~~~
desiderantes
Did you even read what you're replying to?

------
currymj
tectonic is great! I find it works as a "daily driver" now, it's not just
another experimental Rust project. it downloads everything as it's used, and
automatically decides when to rerun compilation to deal with citations, float
numbering, etc.

Rarely do I find I need to fall back on the standard programs.

I strongly recommend giving it a shot if you use LaTeX.

------
fortran77
I really don't get it. What advantage does this bring over the old Web/Weave
Pascal -> C build system?

~~~
ratmice
What I really like about tectonic, is that a program can use it like a
library, and embed the engine without shelling out to execute programs. There
is at this point some system font dependencies that need to be realized before
compilation.

But otherwise it pretty much just works without a lot of pre-installation
steps.

------
ngirard
A big problem for Europeans is that Tectonic currently defaults to US letter
paper size:

[https://github.com/tectonic-
typesetting/tectonic/issues/126](https://github.com/tectonic-
typesetting/tectonic/issues/126)

~~~
macintux
You and I have very different definitions of "big".

~~~
chucksmash
Yes, I believe that's what the GP is saying.

------
bayesian_horse
I've been a fan of LaTeX for some time, but I've mainly switched to using
Chromium instead.

Yes, it seems a bit strange, but browser engines have become mature enough to
replace most of what LaTeX can do, and there are even work-arounds for things
the browsers can't do natively. For example "Paged.js" is a polyfill for
implementing CSS paged media extensions.

Using various Javascript (or webassembly now) libraries, I can directly render
Math, SVG, musical charts, all sorts of quirky text directions etc.

There are even knuth-plass implementations in Javascript. I hope that someone
smarter than me figures out how to marry that algorithm to the new CSS Houdini
API!

~~~
chrismorgan
I’m afraid your desire to meld Knuth-Plass with the CSS Houdini efforts is a
bit like saying “I hope someone figures out how to build a wall for a house
out of paint.” Except probably even less plausible.

~~~
bayesian_horse
Why? Houdini offers an API for laying out DOM objects through custom
algorithms. Knuth-Plass is an algorithm for laying out Boxes/Text Characters.

Maybe I misunderstand something here, but the paint analogy doesn't help!

~~~
chrismorgan
When I first wrote that comment, I wrote something like “you could _do_ it,
but the end result would be absolutely terrible”. Then I decided that actually
it wasn’t possible after all in any meaningful way, and changed it to “except
probably even less plausible.”

In reality, with what I have in mind, it’s kind of a bit of both: what I have
in mind is that you’d need to split each piece of the Knuth-Plass layout (box,
_& c._) into a DOM element of its own first, so that the layout can determine
their sizes and shuffle things around appropriately—since the layout API is
only giving you the set of CSS boxes to lay out and their sizes and any
engine-decided inline breaks in them, and not the ability to inspect what’s in
them or to break them up into further fragments.

Once you’re doing _that_ , it’s probably a bad idea to use the Layout API,
because you get no substantial benefit (a ResizeObserver to notify you when
you need to redo the layout is just about as good), but are using a _lot_ more
DOM nodes (which is bad for performance, and I strongly suspect it’d cancel
out the benefit that the Layout API version can run away from the main
thread), and are using a new, less-well-supported and probably-buggier API to
boot.

Also browsers have some pretty terrible bugs around how hyphenation especially
works when you have zero-width characters, and they have shown _no_ interest
in fixing them. (Chromium’s are the worst, but Firefox has a couple of
interesting ones as well.) Therefore you’d probably need all of your line-
breaking opportunities (most notably, soft hyphens) to be in boxes of their
own. And now you _probably_ won’t get your ﬀ ligature if a hyphen _could_ be
inserted between them, so I’m probably going to have to disqualify it as not
being able to produce the same output.

In the end, I’d be surprised if a variant of
[https://github.com/bramstein/typeset](https://github.com/bramstein/typeset)
using the Layout API as far as possible while retaining identical output
(excepting this soft-hyphens-inside-ligatures case, if my guess is correct)
could get down to even 20× slower than it, or using less than about 10× as
much memory. In practice, I think figures like 100× slower and 500× memory are
more likely. It’s _possible_ that it would be less janky for large amounts of
text, given that it operates in a worklet which may be run off the main thread
by the browser; but I doubt it, due to the increase in other requirements.

This is all assuming that my understanding of what would be needed and
possible is correct—I may have stated it too strongly given my lack of
particularly detailed knowledge in the area.

~~~
bayesian_horse
The ultimate problem I see with breaking words into separate boxes is that
they can't be rendered "in flow" anymore. So I guess the actual Text has to be
set outside the layout API.

However, consider the case of the CSS Flexbox. It can also wrap boxes, and
there some smarter way of looking ahead could be helpful and would be within
the scope of the layout API. Not sure.

And indeed, for rendering PDFs it may not be necessary or beneficial to rely
on the Houdini APIs at all.

------
inamberclad
Github says that this project is 80% C and 6% Rust. What's up with that?

~~~
kick
_Tectonic is a modernized, complete, self-contained TeX /LaTeX engine, powered
by XeTeX and TeXLive._

XeTeX and TeXLive are both written in C.

~~~
ratmice
Another thing to consider is it isn't counting all the rust dependencies. You
can get away with a lot less code in tree when things get pushed out into
crates.

------
Shorel
All TeX rewrites seem to keep the batch oriented architecture.

What could improve usability and add the possibility of a modern UI for
TeX/LaTeX is to have an incremental TeX engine, that only computes the changes
in a document instead of everything from zero each time.

~~~
FlorianRappl
Plus an engine that realizes that it has to recompute the document immediately
instead of making "pdflatex ..." run 2-3x necessary in order to get what you
actually wanted.

------
tasubotadas
Wow. This is amazing. I am a user of MikTex and it is such a pain to deal
with.

It was about time those Perl scripts are abandoned.

The fact that it now produces final .tex files without needing multiple passes
and the fact that it can automatically fetch used packages is amazing.

~~~
mehrdadn
Just a heads-up: what nobody tells you is that MiKTeX is like twice as slow as
slow as TeXLive.

I learned this the hard way.

------
denysvitali
Last time I tried it it was missing (and probably it still is) shell-escape
support.

This command argument is needed if you're planning to use packages such as
pygments (which does code highlighting).

~~~
jopython
I depend on minted which needs shell-escape. So this is a bummer.

------
bbanyc
Is this translating Knuth's WEB/Pascal to Rust, or a from-scratch
reimplementation?

~~~
ratmice
It's using web2c, and linking against a cleaned up version of the generated C
sources for a combined c/rust binary.

There is a further project which runs c2rust over the generated c sources, and
is cleaning up the generated c sources, this is linked in comments above.

Going directly from web -> rust, is expected to be difficult, and will
probably have a lot to learn from the manual conversions above.

------
bloopernova
Cool, I've just started messing around with LaTeX, mostly for pgfgantt because
Microsoft Project is so damned expensive. In fact so many working project apps
(Like OmniPlan) seem to be way too high.

------
merricksb
First discussed here in 2017:

[https://news.ycombinator.com/item?id=14450448](https://news.ycombinator.com/item?id=14450448)

