What this project has done is take the auto-generated translation into C of xetex.web, and wrapped this (unreadable) C code in Rust — which is an odd choice to say the least. It seems (from Reddit comments by the author) that the reason is that author of this project at the time was unaware of LuaTeX, which (starting with a manual and readable translation into C years ago) is actually written in C.
All these odd choices aside, and barring the somewhat misleading “in Rust” description (misleading for another reason: the TeX/LaTeX ecosystem is mostly TeX macros rather than the core “engine” anyway), there are some good user-experience decisions made by this project. With a regular TeX distribution these would be achieved with something like latexmk/rubber/arara, which too are wrappers around TeX much like this project.
There is still room for someone to do a “real” rewrite of TeX (in Rust or whatever), but as someone on TeX.SE said, it is very easy to start a rewrite of TeX; the challenge is to finish it.
Edit: To answer my own question. The relevant source (corresponding to the main part of xetex.web) is here: https://github.com/crlf0710/tectonic/blob/2580c55/engine/src...
I've posted a comparison of some example code first from the official xetex.web listing, and then from the “Rust” code here: https://gist.github.com/shreevatsa/627399d0150e66d211a264bc0...
You can draw your own conclusions from comparing the code samples, but to point out a few obvious differences:
• What has the symbolic name “fi_or_else” in the WEB code has become the magic number “108” in the Rust code. (This is because the author of this project decided to have their Rust code start from the autogenerated C code which has already lost this symbolic information.)
• What is simply “if tracing_ifs>0” in the WEB code is 29 lines of Rust code, involving a magic offset into eqtb.
• The comments from the original are gone.
• Something like “cur_if:=subtype(p);” becomes “cur_if = (*mem.offset(p as isize)).b16.s0 as small_number;”.
I wonder how maintainable such Rust code will be. These problems are not insurmountable and the code can always be cleaned up later I guess… my point is simply that at the moment it is not idiomatic Rust code, for instance.
But I guess I wasn't clear. This project is described as a TeX engine “in Rust”. Has any code been actually written in Rust—as opposed to either Rust code that wraps C code (as in the main repo), or autogenerated Rust code (as in this one you linked)—for any of the core parts (as opposed to system-dependencies like I/O or whatever) of the TeX engine?
I'm genuinely curious, and if there is any such code I'd like to read it (and compare it to the equivalent WEB code).
I completely understand that this could be done at some future date. The question is about the description of the current state of the project. The impression from a statement like “Efforts to ditch any C remnants are being made” is that the C code is only a few “remnants”, when in fact it appears to be the entire TeX engine itself.
(Also, I do not foresee any refactoring tool being able to convert, for instance, “108” into “fi_or_else”, recovering information that's already lost. The point here is that it would have been better to start with the original WEB code, not the lossy and autogenerated C code.)
Not sure how easy it would be to call Pascal from Rust or vice-versa - but on x86_64 i belive the calling convention is similar for C and Pascal (strings are different, though).
This is a TeX-manager/workflow automation/MikTeX+parrot clone. It's not a TeX-Engine, e.g. parsing TeX, and being extensible and faster than the alternatives (classic TeX, XeTEx, LuaTeX). To add insult to injury: writing Rust into the headline for me triggers a "better, faster TeX"-wish - went there, curious how much of TeX they already supported, came back a little disappointed.
Most distributions started the transition from pdfTeX to LuaTeX as the main implementation. And the code from the new translation from the original Pascal to C is pretty nice.
And LuaTeX also has UTF8 support and modern font handling.
I always thought this was the whole point of XeTeX?
Similarly, if I was running TeX in web assembly, in a tab in a browser I'd much rather hit "TeX capacity exceeded, sorry." than have it try to scuttle off with all the swap...
I remember reading that the unicode-enhanced Computer Modern isn't exactly the same as the original (latin) Computer Modern that Knuth designed with the former having a few cosmetic issues. If that is indeed the case, is it possible to use the original Computer Modern with a UTF8 engine?
The latest TeX Users Group meeting, a month or two ago in San Francisco, featured a number of amazing demos of TeX and LaTeX systems compiling large documents basically instantly. Amazing what modern CPU's can do. The TUGboat with the papers should be out soon.
Would be interested to see this. TeXLive has always been able to compile documents instantly even a decade ago, but only documents that were a few pages long.
When you got into dissertation-sized folders of documents with lots of TikZ images, it takes a little longer but still not that long. (my 160 page dissertation compiled in 10-15s back in the early 2010s on a Core 2 Duo with a non-SSD drive)
It was hard to get much faster than that for large documents because LaTeX has (had?) critical paths that defy parallelization.
If it is indeed possible to achieve instantaneous compilation for large documents, that would make live-compilation a reality (live compilation already is a reality but for small documents).
Could it possibly be the following entry?
Saturday, August 10
10:00am David Fuchs
What six orders of magnitude of space-time buys you
>TeX and MF were designed to run acceptably fast on computers with less than 1/1000th the memory and 1/1000th the processing power of modern devices. Many of the design trade-offs that were made are no longer required or even appropriate.
An absolute plain vanilla TeX, exactly as Knuth wrote it, and my tool chain compiles it, composes all 495 pages of The TeXbook in 0.300 seconds on a 2012 MacBook Pro laptop (the first "Retina" model). Single threaded, composing pages in 0.6 msec each, running in well under 1 Megabyte total for code and data. Back on the SAIL mainframe (a Dec PDP10) that Knuth used to develop TeX, it was almost exactly 1000 times slower: the pages would tick by every half-second or so (at night, anyway).
Of course, nowadays we also have lots more memory to throw around. One cool idea is to modify TeX to retain all the internal data structures for the pages it has created, and run a nice screen viewer directly from that; Doug McKenna gave a slick presentation at the recent Palo Alto / Stanford TUG meeting of such a system that he created in order to enable live viewing of his Hilbert Curves textbook, including displaying figures of space-filling fractal curves that can be arbitrarily zoomed, which is simply impossible to do via PDF.
Going further, you can additionally modify TeX so it takes snapshots of its internal state after every page, and is able to pop back to any of these states. Presto, now if the user makes an edit on page 223, TeX can quickly back up to the state of the world just before page 223, and continue on forward with the modified input. Page 223 gets recomposed and immediately redisplayed, essentially in real-time. Of course, the trick here is creating and storing the snapshots efficiently; the TUG demo I gave using The TeXbook runs in a few hundred megabytes, and does the whole "pop back, recompose a page, redisplay it" rigamarole in milliseconds.
The bad news is that my stuff is still in the proof-of-concept stage, as there's no support for the well-established extensions to Knuth's TeX (importing graphics, using system fonts, etc.) that are required by the vast majority of LaTeX users. I don't expect any of these features to slow things down appreciably, but time will tell. I intend to do a "Show HN" by and by, with lots more details, when it's able to handle real-world documents.
My apologies for failing to successfully fly under the radar until things were ready for prime time. My premature TUG demo was intended to wow Prof. Knuth sufficiently that he'd approve of a decades-late Ph.D. for me. (Happily, he did agree, contingent on just one additional feature being added...)
Isn't it possible, in the worst case, that editing the source line that maps to page 223 could trigger re-rendering arbitrarily far back before page 223? Like if you wrote all 223 pages without any chapters, parts, \newpage, etc. How does your program handle this?
so there will be an associated Ph.D. thesis to look forward to at some point? really curious to how you achieved the speed-ups
what kind of timeline are we talking about? months / years?
And I'd also be interested to hear how big a doc. I have a 300+ page book I am developing, with many graphics. Compilation on my several-years old standard laptop is 5-10 seconds. So I wonder what is up.
* The backend should be in a language that allows for easier editing and package development.
* Need modern bidirectional UTF8 font support.
* Need the compiler to stop producing a bunch of extra files in the same folder, which is a significant adoption barrier.
* Need a way to generate clean html output with nice css.
* Tikz is great, but it would be excellent if it was possible to include the graphical output of any language by writing inline code (org-mode style).
* Same for mathematics - if I can send input to Sage or Mathematica and print the results from within the tex files, life would be so much easier.
* Beamer is interesting, but it is hard to make anything but rather bland scientific presentation in it. A framework for rapid design prototyping in beamer would help so much.
* Etc etc
LuaTeX allows writing packages in Lua
> * Need modern bidirectional UTF8 font support.
LuaTeX supports bidir, not sure if you refer to something more specific
> * Need the compiler to stop producing a bunch of extra files in the same folder, which is a significant adoption barrier.
you can just run your build in a build folder like you would in C, etc ?
There are also fundamental limits to TeX language such as the number of arguments to a function limited to 9 , or overloading functions is a pain , etc etc.
The advantage of rewriting from scratch in a modern language is that these issues can all be dealt with without workarounds.
Rarely do I find I need to fall back on the standard programs.
I strongly recommend giving it a shot if you use LaTeX.
But otherwise it pretty much just works without a lot of pre-installation steps.
> Letter or ANSI Letter is a paper size commonly used as home or office stationery in the United States, Canada, Chile, Colombia, Costa Rica, Mexico, Panama, the Dominican Republic and the Philippines. It measures 8.5 by 11 inches (215.9 by 279.4 mm). US Letter-size paper is a standard defined by the American National Standards Institute (ANSI, paper size A), in contrast to A4 paper used by most other countries, and adopted at varying dates, which is defined by the International Organization for Standardization, specifically in ISO 216.
It makes sense for there to be a system-wide setting to override the software default, so folks who use some other standard can set it once and never worry about it again.
It should respect LC_PAPER on GNU systems, too.
Yes, it seems a bit strange, but browser engines have become mature enough to replace most of what LaTeX can do, and there are even work-arounds for things the browsers can't do natively. For example "Paged.js" is a polyfill for implementing CSS paged media extensions.
Maybe I misunderstand something here, but the paint analogy doesn't help!
In reality, with what I have in mind, it’s kind of a bit of both: what I have in mind is that you’d need to split each piece of the Knuth-Plass layout (box, &c.) into a DOM element of its own first, so that the layout can determine their sizes and shuffle things around appropriately—since the layout API is only giving you the set of CSS boxes to lay out and their sizes and any engine-decided inline breaks in them, and not the ability to inspect what’s in them or to break them up into further fragments.
Once you’re doing that, it’s probably a bad idea to use the Layout API, because you get no substantial benefit (a ResizeObserver to notify you when you need to redo the layout is just about as good), but are using a lot more DOM nodes (which is bad for performance, and I strongly suspect it’d cancel out the benefit that the Layout API version can run away from the main thread), and are using a new, less-well-supported and probably-buggier API to boot.
Also browsers have some pretty terrible bugs around how hyphenation especially works when you have zero-width characters, and they have shown no interest in fixing them. (Chromium’s are the worst, but Firefox has a couple of interesting ones as well.) Therefore you’d probably need all of your line-breaking opportunities (most notably, soft hyphens) to be in boxes of their own. And now you probably won’t get your ﬀ ligature if a hyphen could be inserted between them, so I’m probably going to have to disqualify it as not being able to produce the same output.
In the end, I’d be surprised if a variant of https://github.com/bramstein/typeset using the Layout API as far as possible while retaining identical output (excepting this soft-hyphens-inside-ligatures case, if my guess is correct) could get down to even 20× slower than it, or using less than about 10× as much memory. In practice, I think figures like 100× slower and 500× memory are more likely. It’s possible that it would be less janky for large amounts of text, given that it operates in a worklet which may be run off the main thread by the browser; but I doubt it, due to the increase in other requirements.
This is all assuming that my understanding of what would be needed and possible is correct—I may have stated it too strongly given my lack of particularly detailed knowledge in the area.
However, consider the case of the CSS Flexbox. It can also wrap boxes, and there some smarter way of looking ahead could be helpful and would be within the scope of the layout API. Not sure.
And indeed, for rendering PDFs it may not be necessary or beneficial to rely on the Houdini APIs at all.
XeTeX and TeXLive are both written in C.
What could improve usability and add the possibility of a modern UI for TeX/LaTeX is to have an incremental TeX engine, that only computes the changes in a document instead of everything from zero each time.
It was about time those Perl scripts are abandoned.
The fact that it now produces final .tex files without needing multiple passes and the fact that it can automatically fetch used packages is amazing.
I learned this the hard way.
This command argument is needed if you're planning to use packages such as pygments (which does code highlighting).
There is a further project which runs c2rust over the generated c sources, and is cleaning up the generated c sources, this is linked in comments above.
Going directly from web -> rust, is expected to be difficult, and will probably have a lot to learn from the manual conversions above.