Hacker News new | past | comments | ask | show | jobs | submit login
Tectonic: a modernized, self-contained TeX/LaTeX engine (rust-lang.org)
322 points by JoshTriplett on May 31, 2017 | hide | past | favorite | 92 comments



> Has a command-line program that is quiet

But what about my overfull hboxes???


Thanks for the morning chuckle. Never thought I'd see an hbox joke :)



I, for one, have always preferred all of my `figs` at the end of the document anyway. ;)


That, my friend, is badness 10000.


I feel the overfull hboxes error is symbolic of everything that is wrong with LaTeX.


If I remember correctly, overful hbox error means that you must rearrange your text to produce pleasant render. If that's correct, then I'd argue that this error is everything that's right with LaTeX, because I hate with passion

    d o c u m e n t s        w i t h        h u g e        s p a c e s.


The huge spacing is actually an underfull hbox (i.e. TeX couldn't fill it enough, so had to stretch the spaces). An overfull box is a box where the contained text overflows the box; usually it's a line where the end juts into the margin, which is even uglier than the underfulls, IMO.


He was probably referring to the fact that preventing an overfull box without rearranging text is linebreaking too early, thus creating that excessive spacing. Also note that underfull boxes actually are a subjective thing; you cannot easily draw a line between "ok spacing" and "too wide". People will have different opinions on this, especially if they have no clue about typesetting they might not even realize it. So what (la)tex does here is quite clever in that it creates a layout that will make even the complete novice realize something is wrong. Of course then people can still ignore that but well, if you just don't care what your thesis looks like...


I really wish TeX would prefer underfull to overfull. So many times I've read two column documents where the author ignored the warnings and the columns overlapped, making things pretty hard to read. (Well, actual overlap is rare, but uncomfortably close is common.)


AFAIK, it does prefer underfull boxes to overfull ones,but sometimes the penalties work out such that it has to put in an overfull anyways (IME really long words that it doesn't know how to hyphenate correctly are usually the culprit, and it's more frequent in two-column than single column layouts). You can probably set it to be even stricter though by tweaking the penalties for underfulls and overfulls. The details escape me, but it's probably pretty easy once you find the right part of the TeX book to consult.


Assume that “quiet” means “not verbose”, not “silent”. Although I haven’t got it running yet I believe errors will still be written.


I have a python program that does nothing but filter latex output to remove all the cruft. On my latest paper, raw latex spits out 147 lines of nonsense. After my filter, I see about a dozen lines, including all meaningful errors and warnings and a four-line summary like this:

> Number of underfull warnings : 5 > Number of overfull warnings : 0 > Number of undefined references : 0 > Number of missing citations : 0


Do you have a link to said program?


It's just something I tossed together years ago and have been using ever since. Here it is if you are interested:

https://pastebin.com/9H7icgKJ


If this wraps 120k lines of actual TeX backbones, why write the API layer in Rust at all? Yet another dependency, just to be buzzword compliant?

We've come a long way since the original literate TeX program.


I think the eventual goal is to rewrite the backend in Rust, and to make that available as a library to other applications.


I'm not sure that's feasible.. there are exactly zero independent TeX or TeX-like implementations, everything is patches on top of TeX, which probably has a reason (given the general open source community's inclination to rewrite things). I am by no means an expert on TeX internals, but just cursory browsing of the code makes it look like a rather massive and complex piece of software.


The C/C++ code add up to about 120,000 lines.

Yes, it is massive and complex.

Fortunately, the core engine is essentially self-contained. Almost all of the things that make the TeX frustrating come from the UX, or the difficulty of adding new capabilities that call for somewhat more modern C code. By compiling all of the C with a Rust wrapper, we get a single binary out in the end, plus the ability to extend the engine in ways that were just not feasible before.

Revamping the core engine code from C to Rust would be nice, because it could be cleaner and more comprehensible, but it actually doesn't buy you anything in terms of the UX.

Edit: also, I didn't realize this when I made the decision to fork XeTeX, but LuaTeX has shifted to a C implementation that is more or less detached from the original WEB code.


Knuth spent all those years getting rid of bugs and this guy wants to bring them back? I didn't get that impression.


Seriously, and even if tex isn't perfectly secure, who cares? I don't think people compiling foreign latex documents is a big prize for any hacker, there's just so many more easily weaponized attacks.


It isn't security I'd be worried about. I'm sure there is some real wizardry in the formatting algorithms and many bugs related to it fixed over the years. Rewriting the TeX engine would just be a huge effort to almost certainly arrive at an inferior and incompatible clone.


I agree, "safety" is just the most common reason people give for rust rewrites, so I'm assuming that's part of their rationale.


That is why stuff like pandoc exist. A more modern latex.


Pandoc seems to exist only to support moving from one markup language to another. And markup languages seem to proliferate when people decide to trim down the options from what they had been using. Only to build them back up to the point that a new one is formed, to trim down the options.

Which is ultimately frustrating if you ever try to go back to an old document, and realize you don't have the exact markup engine that you used back then, so that you can't actually rerender it.

Now, to be fair, often this is ok. In particular, you might not care about how it rendered before, and a minimal markdown is fine.

However, if you do care about presentation, it is maddening.


There may be more people who want a TeX-style layout engine in an embeddable form, without the TeX macro language. Or a multiple-scale version of it, which would make it easier to avoid widows and orphans, by making it possible to reconsider earlier page breaks. (Once upon a time, large computers really could hold only one typeset page in memory; no longer.)

If you who want something 100% compatible with TeX, you already have it, and it's not clear why you'd care about the implementation language.


Have we really come a long way since the original TeX? We certainly found fancier bugs to put in programs. And evidently gave up on understanding memory usage, at large. We certainly have not found a way to make it easier for folks to read programs.

Seriously, even if you are not familiar with Pascal, it is not as unapproachable as could be imagined.


I love *tex. You want to install software, it pulls in half of the galaxy as dependencies. The equivalent of black holes in software.


So, exactly like npm packages then?


Having just cursed whatever bloody build system we now have for a simple 3 page web app that seems to cache off half of the bloody internet, I am curious on the complaint leveled against TeX. Especially since the vanilla TeX has very little dependencies and you have to explicitly install every style that has ever been created. :)


'Official' website here [0].

This looks great, although I do wonder to what degree the automated downloading of dependencies limits off-line work. I'm not sure how widespread this is, but one of my biggest productivity hacks is writing with the wifi turned off (I go for pencil and paper for first drafts if possible, but editing, figures and references all require the computer).

[0]: https://tectonic-typesetting.github.io/en-US/


> I am exactly of this philosophy so Tectonic is very careful NOT to require ANY network access for builds if all of the needed resource files have been cached locally. And because the I/O backends are pluggable, you can download the "bundle" file containing the install tree and point Tectonic to use it for fully network-free operation.

— The author (https://www.reddit.com/r/rust/comments/6e2x6m/tectonic_a_com...)


Thanks.


It's hard to talk about a "TeX" engine when there is eTeX, XeTeX, and LuaTeX each with their own strengths and weaknesses heading in different directions. FWIW it seems to me that LuaTeX is actually the most promising of the three, not because of Lua but because of its extensibility.


Good point about the existence of various TeX engines. I recently wrote an article for Overleaf on this exact topic---attempting to explain some of the confusing terminology surrounding "TeX" and the evolution/development of various engines. The piece is called "What's in a Name: A Guide to the Many Flavours of TeX" If anyone is interested, here's the link: https://www.overleaf.com/blog/500-whats-in-a-name-a-guide-to...


Why would I use one instead of the other? I use LaTeX often, but normally let my IDE autodetect and deal with all that stuff on its own.


LaTeX generates DVI files, which need an extra conversion step to be generally useful. PDFTeX generates PDFs directly, and is generally preferable (unless you need legacy Postscript stuff).

XeTeX and LuaTeX deal with Unicode properly and can use normal fonts. Both of these output PDF files directly, like PDFTeX, but have the above advantages over it (+more). LuaTeX, as the name suggests, embeds a Lua interpreter which enables far more sophisticated and performant programmatical things (at the cost of compatibility). XeTeX can cope with legacy Postscript things like PSTricks (ick!) by generating the Postscript and automatically converting it, but that's a slow process, and I often rewrite PSTricks pictures in TikZ or Asymptote when I have to deal with them.

I generally use XeTeX, or LuaTeX when I'm okay with sacrificing compatibility for programmability.


Is PSTricks only a compatibility concern, or are people still doing new stuff in it? I can sort of imagine it... TeX packages are usually not the primary concern of the document author, so they will keep using what they learned 20 years ago.

When I wrote my diploma thesis in 2010, I was given the thesis template that the whole working group were using. It was a steaming pile of hacks that had accumulated over the years, with the recommended compilation method being latex -> dvips -> ps2pdf, and only using graphics in EPS format (IIRC). And people were still writing `\"{a}` instead of `ä`. I replaced it with a new template for use with pdflatex that took advantage of such new-fangled stuff as inputenc, cmap, pdfx and hyperref, and committed it together with a long rant of a README that started with something along the lines of "When I looked on my watch, it was the fucking 21st century."


Unfortunately, people are still using PSTricks for new stuff! I work with teaching materials at a university, and some of the people who wrote it first learned LaTeX in the 1980s, and haven't kept up with the three decades of improvements since. The obsolete hoop-jumping is kind of amazing, and I've been getting really good at complicated regex find and replace.


It's worth noting as well that LuaTeX is the adopted successor to pdfTeX.


Simple examples of cool stuff you can do with luatex: https://lwn.net/Articles/723818/


Is that the right link?

It takes me to a "subscription required" page for an article titled "What's new in gnuplot 5.2"



I just use overleaf. gets work done, easy to work with collaborators, does the package install thing automatically, real time preview of what your write, has dumb-person "rich text" mode etc.

Only downside is its cloud based so I can't be productive on an airplane.

https://www.overleaf.com/


> real time preview of what you write

When I need that with a LaTeX document, I have vim with the TeX sources open on one half on the screen, and mupdf showing the compiled PDF on the other half, and a terminal in the background is running

  while inotifywait *.tex *.bib; do
    sleep 0.2
    pdflatex main && killall -HUP mupdf
  done
The sleep ensures that vim is finished writing the file before pdflatex runs. pdflatex is sometimes substituted for make or latexmk or whatever. SIGHUP makes mupdf reload the PDF. (If you're using e.g. Okular, it will pick up the changed PDF automatically, so that step can be skipped.)

(My intention is not to belittle cloud solutions, although maybe a little. :) Just to show how you can implement the real-time-preview feature with a very small shell script.)


You can use

    inotifywait -e move_self,close_write *.tex *.bib
and drop the sleep. This also means you will not needlessly recompile the pdf if you re-open the source file in your editor.

Side note: Only one of move_self and close_write is actually required, depending on your editor. Some editors (e.g. vim, emacs), will write the modified content to a separate file, then rename it to the real file name (in order to make it atomic) => move_self. Others will overwrite the file in-place (e.g. gedit, nano) => close_write.


Or if you're already using vim anyway, you can use a plugin like vimtex (https://github.com/lervag/vimtex) which automates the compiling after saving the file.


My setup us very similar! I used to do the inotify dance as well, but I now find it a lot easier to just use a vim autocmd on BufPostWrite. This is only reasonable if your document compiles relatively quickly, but one clear advantage is the immediate feedback when some error happens.

I also have a makefile to intelligently include custom macro files and run multiple passes when needed.

Couldn't be happier.


latexmk -pvc


FYI, you can simply clone an overleaf document using git, and work offline, pushing changes after:

https://www.overleaf.com/blog/195-new-collaborate-online-and...


> Only downside is its cloud based so I can't be productive on an airplane.

Other downside is that in-browser text editing sucks compared to what you can do (e.g.) with vim.


That looks very interesting. Can anyone compare to Sharelatex? I'm using that currently and have run into some issues - sometimes loosing server connection, creates unwanted edits in .tex file from just uploading, opening and downloading again. So far it always seems to be whitespace that it messes up.

Edit: Also Sharelatex leaves me wanting for more editing features. Most commonly, autocompletion for reference-tags based on labels in project. Kile does this perfectly, but it's a hassle to get it running everywhere.


> autocompletion for reference-tags based on labels in project.

Soon ;)


Just started using Overleaf as well. It actually has Git support, so if you want to use an offline editor for a while you can just pull changes to your machine, work on it, and then commit/push it to the cloud.


Maybe use another name https://coreos.com/tectonic/


CoreOS Tectonic is not trademarked and is not unique.

Looks like OP's project started around November. I believe that's around when CoreOS Tectonic was released as well.


As discussed elsewhere in the thread, Tectonic is actually trademarked by CoreOS.


I don't think CoreOS would care about trademark of this. It may just make people confused, that's all.


> powered by XeTeX

Does XeTeX support microtype these days? It is important if you want "best-in-the-world output" to PDF.


Nope, LuaTeX does a better job and even that is limited as compared to pdfTeX. See http://mirrors.ctan.org/macros/latex/contrib/microtype/micro... page 7, Table 1.


Yeah, I used pdftex for my thesis in 2012 for this reason. When I use Tex now I use Luatex as it also gives good results but with easier font management.


Apparently they currently support protrusion for XeTeX. I use XeTeX for its font, language and Unicode support but it would be nice to get fully covered microtypography there.


If i read correctly it does not say "powered by XeTeX" but "derived by a XeTeX Implementation"


Right.

The hope is that the easier development process will make it easier to add fancy features like microtypography ... but for me the real prize is HTML output that doesn't suck, so that's going to be the focus of my work.


I copied my quote from the linked page. You are likely right, though.


Sounds like a rewrite, but it's just a wrapper.


It is currently a wrapper as the first stage of an incremental re-write.


This is a terrible idea as Tectonic as a software product already exists: https://tectonic.com

It is a registered "computer software" trademark as well:

http://tmsearch.uspto.gov/bin/showfield?f=doc&state=4805:npp...


The fact that CoreOS have registered a trademark for Tectonic in the US (serial number 86560796 is the relevant trademark on the word mark) is significant and sadly does incline me to thinking renaming the software might be a good idea.

(USPTO trademark search doesn’t really use URLs; it just encodes the transient server state in what passes for a URL, so it expires in a very short time. That’s why your link is dead.)


It's a pity they didn't go with "TeXtonic"... The canonical way to pronounce "TeX" is roughly "tech", since the "X" is supposed to be a Greek chi.

https://en.wikipedia.org/wiki/TeX#Pronunciation_and_spelling

(Of course, "textonic" is also the name of another open source project...)


"Terrible idea" is a rather aggressive way to dismiss hours of someone's work because you don't like the name.


Incorrect, the name is fine. The name is also a legally registered trademark of a company with a commercial business around said name. I'm unsure of how well you know/understand US Trademark law, but the gist is that you lose a trademark that you don't defend.

This means that the name should be changed voluntarily or CoreOS will be obligated to ask them to change it.

    Word Mark	TECTONIC
    Goods and Services	IC 009. US 021 023 026 036 038. G & S: Computer software, namely, software for developing, managing, updating, and maintaining online server instances and clusters hosted by third parties; computer operating system software; computer operating system software for distributed systems. FIRST USE: 20150406. FIRST USE IN COMMERCE: 20150406
    IC 038. US 100 101 104. G & S: Providing access to hosted computer operating systems and computer applications through the Internet; Peer-to- peer network computer services, namely, electronic transmission of audio, video and other data and documents among computers. FIRST USE: 20150406. FIRST USE IN COMMERCE: 20150406
    Standard Characters Claimed	
    Mark Drawing Code	(4) STANDARD CHARACTER MARK
    Serial Number	86560796
    Filing Date	March 11, 2015
    Current Basis	1A
    Original Filing Basis	1B
    Published for Opposition	December 22, 2015
    Registration Number	5013800
    Registration Date	August 2, 2016
    Owner	(REGISTRANT) CoreOS, Inc. CORPORATION DELAWARE 3043 Mission Stret San Francisco CALIFORNIA 94110
    Attorney of Record	Brian R. Coleman
    Type of Mark	TRADEMARK. SERVICE MARK
    Register	PRINCIPAL
    Live/Dead Indicator	LIVE
Courtesy of http://tmsearch.uspto.gov TESS search for Serial 86560796


So it's like latexmk + lazy downloading? How does it compare against latexmk?


I can't speak for latexmk, but Tectonic has, if I may say so myself, some quite sophisticated logic for managing reruns and figuring out which intermediate files can't be written to disk.

That being said, I wouldn't say that the `latexmk`-type functionality is the thing that is going to make Tectonic unique. Hopefully that will be the HTML output, but I have to implement that first ...


Come on, you could have called it TeXtonic, what a missed opportunity for a pun.


https://tectonic-typesetting.github.io/en-US/ declares in the section “About the name” that this is entirely deliberate. I like the rationale and agree with it too:

> The name of the project is “Tectonic,” spelled and pronounced like a regular word because it is one. Enough with the cutesy obscurantism. In cases where the name might lead to ambiguities, it should be expanded to “Tectonic typesetting.”

> If you’re feeling expansive, you can interpret the name as suggesting a large change in the TeX world. Or you can think of it as suggesting a salubrious offering for weary TeX users. Either way, the root of the word does go back to the ancient Greek τέκτων, ”carpenter,” which Donald Knuth — the creator of TeX and a devout Christian — might appreciate.


I'd love to see something akin to "Weebly for LaTeX". Maybe I'm just too much of a theoretical noob, but I don't quite enjoy feeling like I'm writing code when trying to write english.


Isn't Overleaf [1] or ShareLaTeX [2] kind of already like that?

[1]: https://www.overleaf.com/

[2]: https://www.sharelatex.com/


I failed to see what these three bullet items really bring to the table.

- Downloads resource files from the internet on-the-fly, preventing the need for a massive install tree of TeX files

- Automatically and intelligently loops TeX and BibTeX to produce finished documents repeatably

- Has a command-line program that is quiet and never stops to ask for input

My understanding is that both MikTex for Windows, as well as Texlive for Linux, allow users to install packages on the fly. To compile a file with BibTex, a user just needs to run TeX twice. And finally, TeX will only ask for a user input if there is an error.


It failed to process the first two files I tested it on. I really like the idea but that is not a great start. Being able to switch between TeX engines would be a good option, would be great if you you could do that with CL switch.


This might end up being the Rust "killer app"! (Servo aside, of course.)


How is a thin wrapper for a C library a killer app?


It’s already not a thin wrapper around a C library; it has ~7,000 of functionality-adding Rust code, and is planning to steadily move things to Rust.


Is TeX still a big deal? I used it (and LaTeX) for all of my school work 25 years ago but I don't think I've touched it since then. Even back then, most people were using word processing software to create their documents.

Can a variant of an already somewhat fringe piece of software really become a killer app?


TeX is certainly still a big deal, and is not at all fringe. That's my experience from academic Physics and Computer Science. I've heard that different disciplines vary in this regard (e.g. humanities may use more word processing, possibly because they don't need TeX to typeset math).

Anecdotally: when giving a presentation last week, the event organiser was surprised that my slides were HTML (from pandoc; I'm an ex-Web dev), as she'd never seen such a thing. Everyone else used beamer (TeX slideshow package).


One of the most enjoyable moments this past (academic) year was noticing my son's 7th grade homework sheets looked awfully familiar from a typesetting and font perspective. Sure enough, after inquiring, I confirmed that the teacher was using LaTeX to prepare all her schoolwork.

Simply awesome.


That may be true, but the world of academic Physics and Computer Science people is still a relatively small one.

Anecdotally: in the 20+ years I've been out of school I've sat through hundreds of presentations. I don't think any of them were created with TeX (unless TeX is the underlying technology of PowerPoint, Keynote, Prezi, Impress, etc...).


TeX used to have numerous unique features: fairly good fonts; automatic hyphenation, which makes it feasible to lay out text with full justification; support for non-English Latin characters, etc. The rest of the world has caught up with these.

IMO the ways in which it is still special are: (1) good math typography; (2) integration with automated bibliography systems; (3) easily generatable by other tools. (Lots of things that eventually turn into HTML hit #3, but not lots of things that eventually turn into really good-looking PDFs do.)

These points certainly aren't important in all cases ... but for what it's worth, arxiv.org hosts more than a million scientific papers, the vast majority of which are written in TeX.


It's quite important in mathematics and reasonably important in physics. Almost all of my fellow students in my undergraduate physics labs use LaTeX to typeset their reports, and a quick browse of arxiv should show that it's dominant amongst academics. It's one of the few piece of software that actually typesets mathematics correctly and well. Don't get me wrong, from what I've seen recent versions of Word have improved hugely and even innovated with OpenType math-extensions which have then been adopted by TeX, but TeX still leads the way.

One of the most significant advantages it has over other systems is sensible defaults. Everything from the choice of default fonts down to the margin sizes are chosen to look well enough from the beginning.


why does it use xdvipdfmx? xetex would output pdf as well, doesn't it?


XeTeX accomplishes PDF output by piping through xdvipdfmx in the background, so it's the same process.


Given the name issues, why not change the name to rustex?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: