Hacker News new | past | comments | ask | show | jobs | submit login
Implementing a “mini-LaTeX” in ~2000 lines of code (nibblestew.blogspot.com)
93 points by ingve on Aug 4, 2022 | hide | past | favorite | 68 comments



This is fast becoming a pet peeve of mine: people conflating LaTeX with TeX. TeX is the macro language and the compiler, LaTeX is the library.

Replacing TeX is pretty easy since it is a fairly small language. Replacing LaTeX, on the other hand, is very difficult because it contains a rather huge number of packages that have been developed over the decades.


> Replacing TeX is pretty easy since it is a fairly small language.

Conceptually, easy. In practice, it can go sideways. Take JMathTeX, for instance:

http://jmathtex.sourceforge.net/

The code had (has?) a particularly nasty bottleneck. An exception is thrown while parsing for every single character in a macro name until the macro name resolves to a known TeX command. In Java, throwing an exception requires filling out a complete stack trace. I forked the project to fix that issue and then optimized the font glyph to SVG path conversion algorithm. The TeX library went from non-real-time rendering of a few formulas to real-time rendering of 1,000 formulas.

https://github.com/DaveJarvis/JMathTeX

Here's a screenshot showing the JMathTeX fork integrated into my text editor, KeenWrite:

https://github.com/DaveJarvis/keenwrite/blob/master/docs/scr...

See also the build instructions for Knuth's original TeX:

https://tex.stackexchange.com/a/576314/2148


Not quite. TeX is the compiler that produces DVI or PDF files and the typesetting language. It is also a set of macros that together are plain TeX. It is also all of this together, a typesetting system. The name is somewhat overloaded.

LaTeX is not a library. It’s what is called a format in the TeX world; essentially another set of macros that create an alternative to plain TeX. Yes, there are a huge number of packages that are designed to work with LaTeX, but they are not part of LaTeX. Most distributions, such as TeXLive, include TeX, LaTeX, a large number of these packages, fonts, and much else.


But... persistent memory TeX? Where you only change one letter and the entire document doesn't need to be rendered again? Just that letter?

A way to make TeX work more like WordPerfect 5.1. That's what I want.


https://news.ycombinator.com/item?id=19473910

It talks about CSS layout algorithms, but TeX also does layout.

The result: what you ask is very hard. The single letter change can result in changes throughout entire document. Figuring out and keeping dependencies in not easy.


Any real life editor like this will be great… any?


The closest I know is TexMacs. The name is a bit misleading, as this is not TeX.

http://www.texmacs.org/tmweb/home/welcome.en.html


Important distinction: TeX is an interpreter, not a compiler.


Some old sage: when you're relatively to new to computers, you don't understand the difference between an interpreter and a compiler. After a while, you see the difference, and it's substantial and important. After a while longer, you don't see the difference anymore, though for different reasons than at first.


I once build an interpreter for x86_64 to see what my machine code was doing line by line. Then I read the GDB manual.


A not-so-nice thing about GDB is that if you order it to disassemble a program with slightly unusual/wrong ELF headers, it (the GDB, not the program) can segfault which is quite an unreasonable thing for a debugger to do IMHO.


To make things a bit more clear (hope that's what you meant!):

One concrete angle is that interpreters and compilers are usually intertwined: interpreters commonly have an abstract machine which is some language a bit simpler to execute than the full language, thus they have a compilation step at first. And compilers do stuff like constant propagation, inlining and other flavours of partial evaluation. Thus they contain some sorts of interpreters.

Another more abstract angle is to consider interpreters as compilers into a "trivial" language consisting only of values and side effects or the flip side: considering compilers as interpreters with a non-standard semantic.


I don't think that's either important or correct.

The biggest distinction I'd make between an interpreter and compiler is whether you need it to be present at execution time†, and if you use TeX to output PDF then obviously you don't need TeX to be present to view the PDF.

Correctness aside here's what you're replying to:

> TeX is the macro language and the compiler, LaTeX is the library.

What exactly here makes any compiler/interpreter distinction important? How is the communication improved by this "important" "distinction"?

†: Yes yes there are so many nitpicky "corrections" to make here that, indeed, any "distinction" can approach meaninglessness. If you're about to reply with that, read the second bit.


> What exactly here makes any compiler/interpreter distinction important?

In my book, a compiler is an algorithm that (if implemented correctly) terminates for every input. An interpreter (of a Turing-complete language) will obviously occasionally not terminate. This distinction is very important when composing software.

Your distinction has the flaw that you assume that "execution" happens only once, while it is perfectly reasonable for an interpreted program to yield another program.


Your comment seems unreasonably aggressive.

The PDF isn't a program; it's a graphics file. You can't execute it, only view part or all of it. That's true of DVI too (see https://www.mn.uio.no/ifi/tjenester/it/hjelp/latex/dvi.pdf ); although it's easy to get confused by the terminology of "operators" and "commands", neither PDF nor DVI includes things like flow control or subroutines. (PDF does permit embedding things like JS, ICC profiles, and TrueType fonts, all of which permit significantly greater computational expressivity, but none of the relevant subformats can be output by LaTeX or TeX. DVI has none of this.)

LaTeX runs in the TeX interpreter, not in the PDF or DVI file. If TeX were a compiler, LaTeX and similar libraries would be present in TeX's output file, and would execute when that output file was used. If you had a runtime error in LaTeX, it would be logged when you tried to view the PDF or DVI file, and things like LaTeX's layout decisions might depend on the PDF or DVI viewer. Because TeX is an interpreter, not a compiler, execution time is when you run TeX, and LaTeX and similar libraries run when you run TeX, and they produce the output file. They are not present in the output file even when the output file is in a Turing-complete language like PostScript.

It is entirely possible to do your text reflow and layout in PostScript, which is how it would work if TeX were compiling your source file and its libraries into PostScript, but that is not how TeX works, because TeX is an interpreter, not a compiler.

It is important to understand this distinction because it bears on questions like what kinds of output files TeX can generate, when and where you get error messages from LaTeX, and what kinds of incompatibilities you may have to troubleshoot when you are using TeX.

See also https://news.ycombinator.com/item?id=32350924.


I kind of want to see a result with the objective of having (La)TeX documents to be easily editable without having to re-run the entire (La)TeX stack over and over again. I think we share this objective as something positive. Like having WordPerfect 5.1 with reveal codes being LaTeX, and real-time rendering of the document.

But your nitpick about if it is a compiler or an interpreter is irrelevant IMO.

You want some persistent runtime as the solution to having a more responsive TeX workflow. And think compiler vs interpreter is the answer.

I will argue is the total opposite. A compiler would produce an end result, in this case a plain document. An interpreter will have the runtime inside, as all interpreters do. Therefore, the interpreter is the right solution. What we need is an interpreter that stays in memory, with all its memory structures representing the TeX data, and that updates the memory structures when we type text.

The fact that when TeX runs it reads a file, then process its instructions and then terminates, is an argument to classify it as a compiler. An interpreter gives us a prompt and awaits for orders.


I haven't talked at all about whether incremental WYSIWYG editing of TeX documents is desirable, nor indeed about incremental WYSIWYG editing of TeX documents at all. Nor did I talk about whether a more responsive TeX workflow was desirable. The only things I said were desirable were (a) being less aggressive and (b) understanding the distinction between interpreters and compilers.

I have no idea why you would believe I agree with you about these objectives. Perhaps you have me confused with somebody else, in addition to being confused about what a compiler is, and about whether TeX has an interactive mode with a prompt, which it does (as, you point out, do many other interpreters).

Your proposed definition of "compiler", which would make grep a "compiler", as well as CPython when run with a .py file as an argument, but not HotSpot, is almost completely unrelated to the standard definitions. I have quoted a selection of them for your edification in https://news.ycombinator.com/item?id=32362579. The standard term for what you are calling a "compiler" is "batch-mode program".


> I haven't talked at all about whether incremental WYSIWYG editing of TeX documents is desirable, nor indeed about incremental WYSIWYG editing of TeX documents at all. Nor did I talk about whether a more responsive TeX workflow was desirable.

In that case, I find this conversation irrelevant to me. The precise definitions are just that, definitions, and are there to serve an objective, they are not the end themselves. Goodbye.


>If TeX were a compiler, LaTeX and similar libraries would be present in TeX's output file.

Would it? If I compile a statically linked program, I can't see any c, or any of the libraries used in it.


If you can't see the libraries in the statically linked program, you aren't looking very hard. The linker copies library functions into the statically linked program, fixing up relocations. This is easiest to see if you compile with debugging enabled, because objdump -d will list the symbol names. Otherwise you may have to resort to looking for instruction sequences or, with LTO, even dataflow graphs.

It's true that there isn't any C in the output of the C compiler, but that's because C is the language your input program is written in; it isn't the program itself.


> The PDF isn't a program; it's a graphics file. You can't execute it

PDFs and their viewers[1] can do much[2] more than people think.

1. https://twitter.com/LinguaBrowse/status/1057937468564140033

2. https://opensource.adobe.com/dc-acrobat-sdk-docs/


Because I'm working on a DARPA research program where we are sanitizing PDF files, this is addressed at the end of the paragraph you were quoting; you might have considered reading to the end of the second paragraph of my comment before replying:

> (PDF does permit embedding things like JS, ICC profiles, and TrueType fonts, all of which permit significantly greater computational expressivity, but none of the relevant subformats can be output by LaTeX or TeX. DVI has none of this.)


Why is that an important distinction? It's not clear to me that it is.


It is not.

And it is irrelevant, because the objective here is to produce good-looking documents, not to create or run programs.


It is. People get confused all the time when their "compilation" of a TeX "document" doesn't terminate or yields other typical runtime problems. LaTeX is a very leaky abstraction in that regard. For more than basic competence in using TeX, it is important to understand how it works internally and for that it is helpful to have an understanding of what an interpreter is.


I disagree. I don't need to know the compiled opcodes of a Java program to know that I wrote code that tries to perform a division by zero.

What I need is a clear explanation of the issue using the actual source code I wrote.

That's it. Whatever does this job, a compiler, an interpreter, random fairies, quantum AI overlords, it is not relevant.

Clear error messages that show my source code, and explain the issue, that's the answer to fix these typical runtime problems.



Doesn’t TeX emit a file like a compiler? You could say that a PDF reader, say, is an “interpreter,” but TeX itself? I don’t think so.


A compiler generates a file that can be executed to produce the same computation performed by the original source. Since loops and conditionals(*) aren't apparent in TeX's output, but are performed entirely while TeX runs, it is an interpreter.

Regarding the output, TeX is an interpreter that produces its output as a DVI or PDF file rather than as text on a console or showing it in a GUI.

(*) Both those in the source, and those implicit in TeX's typesetting algorithm.


You mentioned loops in the output as a requirement for a compiler, but I'm this case all compilers that unroll loops are not compilers. Early shader compilers come to mind.


I think it's a little fuzzier than that; compiled shaders can still take input, such as screen positions and uniforms and textures, and execute computations specified by your source program on that input. DVI files can't; they're just passive data, even if each page is described as "a series of commands".

If your input language is a programming language, which TeX is, and the language processor chews on it and spits out some passive data, such as DVI or PDF, that language processor is an interpreter, not a compiler. A compiler translates an input program into a hopefully equivalent output program. By design, there is no way for a DVI file to be equivalent to a TeX file.


Is an XSLT processor a interpreter or a compiler?

I think the terms are actually not that well defined and different people use different definitions.

TeX is a compiler in the sense that it is a black box that takes input and transforms it into a different output. By this loose definition even a png to jpg converter is a "compiler".

TeX is a interpreter in the sense that it takes code as input, executes it, and that computation just happens to produce data as output.


This makes sense, and I even thought it might be a good idea to think of Assembler (the program) as an interpreter rather than a compiler: each line of code can be seen as a command to emit a piece of machine code, allocate space, change state or mode, etc.


Agreed! Conal Elliott wrote a marvelous post in 02009 describing C as a pure functional DSL whose output is code: http://conal.net/blog/posts/the-c-language-is-purely-functio...


The assembler is not an interpreter because it doesn't produce the output of the program it assembles.

However indeed it's usually called a translator (or, well, an assembler) rather than a compiler, exactly because its algorithms are simpler, and (even if it does things like resolving labels) the way a human looks at the output is as a 1:1 translation of the input.


It produces the output of running the program it assembles under the nonstandard semantics Koshkin is suggesting.

I think assemblers are not called "compilers" purely for historical reasons: they existed for a decade or so before compilers, and initially the "compiler" was more like what we would call a linker (thus the name "compiler").


The XSLT question is very interesting!

Usually XSLT processors interpret XSLT rather than compiling it, making them compilers rather than interpreters. You could write a compiler for XSLT, and that might be a useful thing to do, but I don't know of such a compiler. (There are several such compilers for the similar XQuery, such as XQC.) XSLT is itself a language that might be especially suitable for writing some compilers in, since it includes a built-in parser and pattern matching, although it generally wouldn't be in my top ten list. If you were running such a compiler in an XSLT interpreter, the interpreter process would be acting simultaneously as an interpreter (for your XSLT) and a compiler (because an interpreter carries out the actions of the program it interprets, like a puppet).

But that wouldn't mean that the XSLT interpreter was a compiler; that would mean the XSLT program it interprets was a compiler.

I agree that if you were to define "compiler" as "a [program] that takes input and transforms it into a different output", then TeX would qualify, as would almost all programs encountered in practice, the exceptions being those that didn't take input, didn't produce output, or that produced output that did not depend on their input. However, this is not a definition of "compiler" that has any currency, and it is well-known that you can make any statement true by redefining the terms used in it, even "TeX is a compiler". In addition to redefining "compiler" as you did, for example, we could redefine "TeX" to mean "kragen" and "a compiler" to mean "a person in the process of taking a giant stinky dump".

The usual definition of a compiler, though, is something like "a program that translates programs from one language into another," and TeX clearly does not fit that definition. For example, Wikipedia:

> In computing, a compiler is a computer program that translates computer code written in one programming language (the source language) into another language (the target language). The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a lower level language (e.g. assembly language, object code, or machine code) to create an executable program.

Waite and Goos (emphasis in original):

> The term compilation denotes the conversion of an algorithm expressed in a human-oriented source language to an equivalent algorithm expressed in a hardware-oriented target language.

Muchnick, Advanced Compiler Design:

> Strictly speaking, compilers are software systems that translate programs written in higher-level languages into equivalent programs into equivalent programs in object-code or machine language for execution on a computer.

Grune, Reeuwijk, Bal, Jacobs, and Langendoen, 2e, emphasis in original:

> In its most general form, a compiler is a program that accepts as input a program text in a certain language and produces as output a program text in another language, while preserving the meaning of that text.

Wirth, Compiler Construction, online revised edition, emphasis in original:

> Computers, however, interpret sequences of particular instructions, but not program texts. Therefore, the program text must be translated into a suitable instruction sequence before it can be processed by a computer. This translation can be automated, which implies that it can be formulated as a program itself. The translation program is called a compiler, and the text to be translated is called source text (or sometimes source code).

Mogensen, Introduction to Compiler Design, 2e:

> A compiler translates (or compiles) a program written in a high-level programming language, that is suitable for human programmers, into the low-level machine language that is required by computers.

Some of these definitions are broader than others (for example, many of them would exclude javac), but TeX doesn't fulfill any of them.

There are fuzzy areas between compilation and interpretation! TeX just isn't one of them. For example:

∙ a Forth or Common Lisp compiler can interpret arbitrary code in the source language at compilation time, making the compiler extensible;

∙ a Scheme compiler interprets macros at compilation time;

∙ a C++ compiler interprets templates at compilation time;

∙ as Koshkin points out in https://news.ycombinator.com/item?id=32357461, by redefining the semantics of a programming language you can justify calling any compiler an interpreter, where for example you interpret "/" as meaning "output machine code to divide two quantities" rather than "divide two quantities", but this trick doesn't work to justify calling any interpreter a compiler;

∙ many interpreters interpret some kind of bytecode for efficiency and consequently contain a compiler that compiles the source code into that bytecode;

∙ to a significant extent we can think of the runtime support library for a programming language like C or Visual Basic as "interpreting" the calls into it from the compiled program, which might be in machine code or not, particularly salient examples here being Nuitka or perl -MO=C;

∙ a common formulation of certain optimizations, such as constant folding and loop hoisting, is as abstract interpretation with non-standard semantics; and

∙ the Futamura projections formulate the whole compilation process as the currying of an interpreter with a source-program argument, followed by possible partial evaluation, and this is in fact how PyPy and a few other JIT compilers work.

In an extreme version of these last two cases, we could imagine concatenating an interpreter with the unmodified source code into an executable, like PyInstaller does, and saying that we have "compiled" the source code. In this case you have something that behaves like a compiler from the user perspective (source code in, executable out), but actually doesn't do any translation. It might be reasonable in this case to argue about whether the program is "compiled" or not, but there is still a crystal-clear distinction between the interpreter and the putative compiler: the interpreter is the thing that is bundled into the executable, and the compiler, if any, is the program that concatenates the interpreter, source code, and libraries.


I would still argue that it is not that straight forward and that the term compiler in the context of computer science was defined during simpler times and since then the field has expanded making classification challenging.

I might try to unpackage the definitions you provided if I have time, but until then, here are some half baked nuggets on my mind:

- In the context of a Piet program, what would you call a psd2jpg program. It is a software that converts an algorithm expressed in a human-oriented source <<language>> (we can liken layers and masks to comments and Photoshop as the IDE of choice) to an equivalent algorithm expressed in a hardware-oriented target <<language>>. I would say this is more like converting a file from ASCII to UTF16 but it does fit the definition rather closely.

- In the context of your PyInstaller example, I would say that reflects the non computing definition of the verb to compile: (1) to put together (documents, selections, or other materials) in one book or work, (2) to make (a book, writing, or the like) of materials from various sources, (3) to gather together. Yet isn't that what we call the linker?

- I believe that in all those definitions the nature of the compiler is irrelevant. It may be a software program, a hardware ASIC or it may even be the job description of a person. The relevant part of the definition is the action "to compile".

- I might even go as far as saying that a programmer in a Waterfall organisation is a single pass compiler and a programmer in an Agile organisation is a interactive compiler. They take an algorithm expressed in the business language (a higher level language) and translate it into a programming language (a lower level language) while preserving the meaning. Maybe I would even liken the various issues in communication and improperly defined specs to undefined behaviour in C.


My longer-winded comment along the same lines is at https://news.ycombinator.com/item?id=32350841.


> You could say that a PDF reader, say, is an “interpreter,"

Yes. PDF is based on PS, which in turn is a programming language. With variables, conditionals, loops and so on. One cannot render PDF without an interpreter of PostScript.

I'm not sure though what the difference is between PDF and PS, what PDF adds and how much it seems like an extension of the PS language. Never bothered to dig into this rabbit hole.


The main difference is that PostScript is a programming language, and PDF is not. PDF is a random-access file format that does not have variables, conditionals, loops, and so on. Most PDF renderers do not contain PostScript interpreters. PDF does borrow some syntax from PostScript, and the PDF imaging model was initially the same as PostScript's, though PDF has added features since then.


Technically, it is just an interpreter.

But it behaves like a compiler, reads a file, process, transforms it, writes the result, and exits.

This is part of the problem, it could be more "interpreter" like, and persist in memory.


It's interesting to look at the early issues of the official TeX publication Tugboat[1]. The first version of TeX, TeX78, written in the SAIL language was just out and there was a concerted effort to produce a more portable version in Pascal, TeX82 (IIRC already known under that name during production).

The first issues of Tugboat then contained several reports about early TeX being ported to various architectures. That often meant rewriting the whole program, as basically no other system besides Knuth's "native" time-sharing OS had SAIL. So there was a port to a (Z80-based) Unix, someone rewriting parts in Fortran, etc. If they were tied to a specific backend/printer, there wasn't even a need to port Metafont.

TeX was relatively small. That often gets lost in contemporary huge LaTeX distributions.

[1]: https://www.tug.org/tugboat/


Lots of nitpickery about what exactly it means to be TeX here, that IMHO are sort of forest-for-the-trees argumentation.

What's interesting here is that it's a blog post on how to do page-level typesetting, which was a subject of great interest and serious research in the 1970's but in the modern world has been mostly forgotten. And that's somewhat unique. Modern developers still occasionally worry about low level details like instruction architectures or I/O latency or packet dumps. The lessons of our ancestors about algorithm choice are still alive in our culture.

But virtually no one gives any thought to how their text is laid out at a level of sophistication higher than the CSS box model.

And that's kinda nice to see.


I thought one of the reasons Knuth made TeX was because there were no good math typesetters available. I have no idea if this assumption is correct, but one of the main reasons I've resorted to using LaTeX in the past is for the math typesetting. I haven't found any good alternatives, and it seems incredibly complex compared to basic kerning + line breaks and/or justification for regular English-like text layout.

So if this implementation lacks math typesetting, and that was one of the motivating factors behind the original TeX implementations, then I think this code is missing a pretty complex core feature.


Math typesetting was a goal, but it wasn’t really the main goal.

Knuth started TeX because the proofs for TAOCP Volume II were done with early phototypesetting equipment, and the results were terrible. Not just terrible for math, but very poor even for just regular prose. The reduction in quality from the Linotype machines used for Volume I was unacceptable to him, but it would have been far too expensive to revive that obsolete typesetting process for one book, so he decided to write TeX instead. His goal here was to knock the whole typesetting process so far out of the park that he wouldn’t have to deal with this same problem again for Volume III, Volume IVa, Volume IVb, Volume Va, etc, etc.


That was indeed one of Knuth’s primary goals. But perhaps it wasn’t one of Pakkanen’s goals though.


From the authors original post, it looks like you're right in saying that was not one of his goals.

This article says:

> Thus we can reasonably say that the code does contain an implementation of a very limited and basic form of LaTeX.

Which is what I'm primarily debating. I feel like a basic LaTeX implementation would at least attempt mathematical typesetting, but that could be debated :)


I think TeX is just used eponymously for typesetters, as runoff, scribe, lout etc. aren't as popular anymore, to put it politely.


In case there is interest in other implementations of TeX: I'm collecting the ones I know here: https://github.com/tex-other and have an answer here: https://tex.stackexchange.com/questions/507846/are-there-any...


> As you can easily tell, line breaks made at the beginning of the chapter affect the potential line breaks you can do later. Sometimes it is worth it to make a locally non-optimal choice at the beginning to get a better line break possibility much later. Evaluating a global metric like this can be potentially slow, which is why interactive programs like LibreOffice do not use this method.

Is there not a fast "good enough" algorithm for justifying text vs a slow optimal one? Is it really infeasible for HTML to have justified text that looks something like LaTeX? It's sad that justified text still looks bad on the web with no signs of it getting better even with all the computing power we have now.


I once discussed this with a coworker who had worked a lot on various things related to typesetting, and he claimed the main issue with doing good text justification for the web is the lack of good, free hyphenation dictionaries (or rules) for most languages. In order to do full paragraph justification in a way that makes sense and always looks good you need to be able to hyphenate words dynamically (and fairly aggressively).

LaTeX does come with hyphenation libraries for some languages, but web browsers would need wider support. There's also the question of how to standardize this, because you would have to ship hyphenation dictionaries for every language, and you would need to standardize this across browsers so they render pages the same.

LaTeX does ship with hyphenation libraries for a bunch of languages, but if you try to use it for any of the minor languages, you'll find the results are so-so and you need to manually hyphenate words to get decent results (this was my experience a decade ago at least).

All of the above could be solved, but sadly I suspect it wouldn't be worth the cost for the players that would have to be involved in solving it.


Firefox does indeed come with hyphenation dictionaries for a bunch of languages, in order to support the CSS `hyphens` property:

https://developer.mozilla.org/en-US/docs/Web/CSS/hyphens

If you look at the language support tables at the bottom of the page, though, other browsers don't support nearly as many languages.


The biggest advantage LaTeX has is that the page width stays the same between various versions of your document, so you can improve it on the fly when you notice issues.

The web tries to support every possible page width so some won't look great.


I set up my static site generator to pre-process text and insert soft hyphens (& shy ; ). That was based on Ruby Text::Hyphen which uses TeX hyphenation patterns.


On my 2015 Mac, using just a single core, TeX composes the entire 495-page TeXbook in under 0.2 seconds, or about 0.4 milliseconds per page. That includes reading the 1.3Mb input file, evaluating all macros and such, doing all line-breaking and page-breaking, and writing the entire 1.9Mb dvi file.

So, I don't know where the "too slow" meme is coming from. Is there a browser that can open and scroll to the end of an HTML page that's 500 screen-fulls long in 0.2 seconds?


> So, I don't know where the "too slow" meme is coming from

Perhaps the parent commenter is executing `pdftex` on Windows. Like all Windows ports of UNIX/Linux-first software, LaTeX experiences the double whammy of the NT kernel being slow with small files + Windows Defender scanning. NTFS is not at fault, oddly enough: it works fast enough in Linux with the recent in-kernel NTFS3 driver.

I use the same distribution (MiKTeX) on both Windows and Linux, and Linux is easily 10× as fast as Windows when compiling complex files (TikZ graphs and diagrams, `.bib` files, internal hyperlinks, etc).


I'm building through Pandoc and LaTeX, and just the LaTeX part on my modern computer is orders of magnitude slower than your numbers, ~20 ms per page. Is that a normal difference between plain TeX and LaTeX? Or more to do with the output of Pandoc?


See similar discussion from a couple of weeks ago where I too mentioned "about a millisecond a page": https://news.ycombinator.com/item?id=32204910 — yes some overhead comes from LaTeX, and even more from heavyweight macro packages that one may be using. Some people even end up with several orders of magnitude worse than your 20ms per page :/


> > [...] Evaluating a global metric like this can be potentially slow, which is why interactive programs like LibreOffice do not use this method.

I suspect that another reason for a WYSIWYG office suite to not use such global optimisation, is to avoid unpredictable changes to layout in response to seemingly unrelated changes by the user. If you get full reflows with figures moving around and headings flipping between the previous and the next page while you're typing a sentence, users will probably quickly punch a hole in their monitor.

In (La)TeX this is less of an issue because of the delayed compilation, I guess. Also *TeX users are probably more tolerant to that kind of mess.


According to previous discussions [1], none that'd fit the constraints of text rendering in browsers AIU (eg dynamic programming approaches considered infeasible on the web, hyphenation not really a thing). But who knows, CSS is incredibly bloated already ...

[1]: https://news.ycombinator.com/item?id=28537923


The HTML standard mandates a shitty layout algorithm with respect to things like floats, but I don't think it actually mandates shitty line-breaking.


nroff is probably a better comparison than LaTeX for this

https://en.wikipedia.org/wiki/Nroff


He's doing something inspired by Knuth-Plass linebreaking, so either Heirloom Troff or Neatroff would be the better comparison.



I can't express how frustrating is editing TeX files. This archeology needs to die.


I’m also frustrated by it. Do you know alternatives that can handle mathematical typesetting?

I looked at Scribble[1] and Rmarkdown[2] but both seem to offer less flexibility for math, or just invoke LaTeX as is.[3]

[1] https://docs.racket-lang.org/scribble/

[2] https://bookdown.org/yihui/rmarkdown/

[3] https://bookdown.org/yihui/rmarkdown/markdown-syntax.html#ma...

Edit: If you're downvoting this, I'm genuinely curious to hear your thoughts.


"The output contains some typographical niceties, so we can reasonably say that the code does contain an implementation of a very limited and basic form of LaTeX" has to be on the more optimistic side of overstatements.

It's roughly akin to "cat can actually display content quite similar to a <pre> block, so it contains a very limited and basic form of a web browser".

It's not entirely wrong. But mostly because it left "wrong" behind long ago and is in a different country now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: