Hacker News new | past | comments | ask | show | jobs | submit login
I ported LaTeX to Javascript (manuels.github.com)
250 points by manuels_ on Jan 19, 2013 | hide | past | favorite | 70 comments



Thanks for your hints! I added a 'Open PDF' button to circumvent popup blockers.

Yes, I used emscripten to port it to Javascript. It was not that hard. Emscripten had three bugs I had to fix (the hardest was to find that the %g format was not supported by emscripten's sscanf).

But it was compiled almost like for x86: first convert the pdftex 'web' souce code to c using web2c, then compile it to LLVM bytecode and the LLVM bytecode to JS.


I would like to compile it on my own with emscripten. Did you use emconfigure or did you write you own compilation script? Couldn't find any pointers to that in the Git, but maybe I didn't look around long enough to find it. How did you deal with kpathsea?


This is awesome. Or witchcraft.

Top, top work. I love it. Now to try with my monster list of packages...


EDIT: I managed to find the answer to this question myself.

I want to generate a bunch of bytes programmatically, then have the user click on a button, and allow the user to save a file containing the generated bytes. This should run entirely client-side, with no talking to the server (except for static loading of the HTML page, JS files, images, css, etc). I've been wanting to do this since I wrote my first Java applet way back in the 1990's, and haven't been able to find a way; it's a personal long-standing unscratched itch for me.

Apparently with HTML5 it's possible to do this, since this project does it! I wasn't able to find where in the code the downloading happens, and I'm not sure what this concept is called, which makes searching difficult. Thanks!

EDIT: Browsing commits instead of the source tree was fruitful [1]. What I'm looking for is called Data URI [2]. The inverse operation -- programmatically processing uploads on the client side -- can be achieved with the File API [3]. Now I need a few days to think about what startups will become possible with these capabilities!

[1] https://github.com/manuels/texlive.js/commit/b7b7eef27846473...

[2] http://en.wikipedia.org/wiki/Data_URI_scheme

[3] https://developer.mozilla.org/en-US/docs/Using_files_from_we...


data: URIs are deprecated for generating content. Use object URLs and Blobs (https://developer.mozilla.org/en-US/docs/DOM/Blob). If you want to force saving the file to a disk use <a download="[filename]">


Although, annoyingly, data: URIs are your only option in Safari (including the latest version), and they also happen to crash the PDF viewer. Safari does have createObjectURL, but then doesn't understand the URL it creates!

Just as annoyingly, IE10 has createObjectURL too, but only allows the created URLs to be used for <img>, <audio> and <video>. To save a blob, you have to use something ms-prefixed instead, and this doesn't allow inline PDF viewing.


I suppose that pdf.js could be used for inline PDF viewing, but at the moment the Firefox 18.0 version idles at ~20% CPU.


Yeah. Interestingly I got all the way to the download with Chrome 25 Beta (for Android) but then it couldn't handle the data URI.


A much more mature LaTeX JavaScript interpreter is MathJax [1].

The projects fill somewhat different niches, however. MathJax's ideal use case is adding math support to a CMS like a blog or wiki. This project looks like it's better for offering a Web-based LaTeX-to-PDF compilation service for articles with 100% compatibility with the original implementation.

Also, this project is a great resume-builder if the author is looking for a job that involves wrangling build systems or cross-compiling to JavaScript!

[1] http://mathjax.org


I added support for packages (you might have to refresh your browser's cache).

The first package supported is geometry. If you want to add another, append the required files to the supported_packages array.

This is what it looks like for the geometry package: https://github.com/manuels/texlive.js/blob/master/website/ma...


Something completely off-toppic:

I don't like TeX. I use it and i like the output, but I have never realy understood the language and therefore I don't like it. The syntax for optional arguments([]) seems very odd to me, aswell as the separation between mouth and the rest. The support for named parameters is IMO very hacky.

Wouldn't be a Tcl based macro processor with the tex-Backend nice? Or is this silly?


I agree. I programmed a few internal LaTeX "packages" for my department in the university. I not only typed some math, it includes a lot of strange macros.

The "programming language" is horrible. It's a clear example of a turing tarpit. For example you don't have arrays, so you must fake them. You don't have function, so you have to return the value in a glob@l. Monkey patching is considered an art, but this makes many of the different packages slightly incompatible.

The "printing library" is amazing. If you only want to do a standard thing (and someone else had programmed it) the result is nice.

The other advantage is that everyone knows it, so if you make a package correctly it is easy to use by mathematicians that can't program in LaTeX.


Yes, the TeX language is pretty horrible (indeed, the whole concept of TeX is a testament to the strength of cannabis available in the Bay Area in the 70s - it's not a typesetting system, it's two separate programming languages which typeset as a side effect!). LuaTeX is an attempt to remedy this situation by embedding lua in the TeX engine: http://www.luatex.org/

(It's also no coincidence that the development of BibTeX coincides with the crack epidemic, but I digress.)


The syntax for optional argument and the lack of support of named parameters are limitations of LaTeX, not TeX. For example, see ConTeXt, which is a macro package on top of TeX, that has much more consistent syntax and all macros accept named arguments.


TeX (PlainTeX) doesn't have support for optional arguments or named parameters (perhaps in some primitives, but they are very different of the optional arguments in LaTeX).

But TeX has a extremely flexible macro system, so it is possible to fake optional arguments. That's what LaTeX do. LaTeX defines a standard way to have optional parameters and make easy to define and use them. It has some strange optional parameters like in \newtheorem, because they are really fake optional parameters. The starred versions of the command also are fake, and have to be defined using a trick. But if you are lucky and don't look under the hood everything works quite well.

I don't know enough about ConTeXt, but I hope that it has a better system for optional arguments.

In some way, this is similar to what happens with some features of high level languages and assembler. For example, the exceptions in C# or Java are translated sooner or later to assembler, but assembler doesn't have exceptions.


Nice, but this constantly crashes Safari when the content goes below the fold. The whole page goes white and then the browser dies.


Sad to hear that. I don't use Safari, but maybe it cannot compete with the JS engines of Firefox and Chrome's version of the Webkit engine.

This script reqiures a lot of computation in JS and communication between the website and the webworker (~300kb). Maybe Safari is unable to cope with this in a decent amount of time.


I've seen emscripten-generated JS crash Safari before. It's not a performance issue; jsc is plenty fast and beats SpiderMonkey in most benchmarks. However, emscripten seems to exercise certain corners of JS that pose stability problems for jsc.


Which benchmarks are you looking at? And are they actually JSC vs SpiderMonkey or JSC+WebKit vs SpiderMonkey+Gecko? I.e. are they JS benchmarks or DOM benchmarks?


"Beats" might be too strong a word :). I guess the picture from AWFY 64-bit is that both have strengths and weaknesses but overall they're comparable.


Yeah, on that set of benchmarks (the one that everyone is targeting hence not doing anything obviously stupid on), they are.

The problem is that all of the JS engines involved have various failure modes in which they end up way slower than the others, though... We're talking 10x-1000x slower. And the problem with performance is it only takes your script hitting one serious performance bottleneck to make the speed of the rest of it not really matter. :(

Hence my interest in testcases that point out such performance cliffs, so they can be removed.


Anyone want to test a webkit nightly? I don't have a desktop machine handy tight now to test on :(

Or even just a bug report at bugs.webkit.org

Thanks :)


Just tried it, crashes in the exact same way:

1. Hitting "Compile" does whatever it should do successfully. 2. Then hitting "Open PDF" opens a new tab which stalls out and crashes

I would almost say its actually a PDF thing that is crashing Safari vs a JS thing.

At least, that's what has happened consistently on my machine with latest Safari and WebKit nightly.

Edit: However, taking the generated PDF from Chrome and dropping it into Safari loads it fine, so maybe it is a JS thing.

Edit 2: Upon further inspection it appears the new tab is being opened with just a data URL representing the entire PDF. It wouldn't surprise me if that's the problem (Safari not being able to handle huge data as a URL). I recall running into a similar crash in MobileSafari because of that a while ago.


Yup, that is for sure it. I took the Chrome generated PDF, turned it into a data URL, then copy-pasted the data URL into Safari's URL bar. Crash.

Drag and dropping the file itself into Safari, works fine.


Please file a bug, so they can fix it.


Nice debugging, not a fan of making blanket assumptions when something isn't working as expected.


It crashes my Opera also. And this is a funny comment, Safari actually beats Firefox in most tests.


Strange, seems to work fine on iOS.


Some background would be nice. Wasn't LaTeX notoriously hard to port – to the point that a LateX app for iPad actually ran in an emulator initially [1]. Is this port based on web2c as well?

[1] http://www.litchie.com/?p=419


This looks like emscripten to me


Nevertheless it's a good effort as a proof of concept.

I'd also appreciate info on whether Latex packages will be supported in the future. At the moment this does not seem to be the case, though I can imagine this would be a challenging problem to solve for in-browser compilation of documents.


Latex packages are not supported automatically. The script has to download the required files in advance which is currently only done for the font file etc.

In theory one could search the LaTeX code for '\usepackage' stings and download the required files and mount them into the virtual filesystem of emscripten.

See https://github.com/manuels/texlive.js/blob/master/website/ma...


Promising. I am getting some error messages though.

    lstat(/bin) failed ...
    /bin
    :
     
    No such file or directory
    warning: kpathsea: configuration file texmf.cnf not found in these directories:


These errors are save to ignore. You have to wait some seconds until compiling has completed.

Note that the browser might prevent the website from opening the PDF (Pop-up blocking)


Yup, had to wait a bit, then enable popups from GitHub [Iceweasel 10.0.12, Debian Wheezy].

Would a download button appearing when the pdf has compiled be harder to do?

Nice demo


Aha okay. You might want to clean that up anyway. :)


OOOOooh.


Awesome! I looked at this approach for https://www.writelatex.com (which does the LaTeX editing in the browser but compiles on the server), but I didn't get very far with it. It would be interesting to read more about how it works / obstacles you had to overcome.


Just wanted to comment that this works better in Internet Explorer 10, documents seemed to compile significantly faster than on Chrome. Running Windows 8x64, for reference.


Really?! I didn't expect that! (I don't run Windows, and so I didn't try it in IE.) Good to know, thanks!


But the PDF in the data-URL does not seem to open up on IE10.


It seems to run roughly 30% faster on IE10. Interesting.


Wow, I really would like to know how this was achieved...

Note that it is somewhat broken for me: http://imgur.com/twIG75m


Wow is indeed the best way to describe this project!

From what I see, the main work horse is this 250k "binary": https://raw.github.com/manuels/pdftex.js/master/release/pdft...

This webworker script in combination with a .fmt (latex format file) generates the PDF. All in JS. @manuels__ this is awesome!


Thanks ivan_ah! I really didn't expect so much positive response!

Yes, the main work is done in the webworker (~250kb) + the tex format file latex.fmt (~700kb)


This part of the document isn't rendered correctly in the resulting pdf:

    In printing, text is usually emphasized with an
       \emph{italic}  
    type style.
In my pdf it looks like the every other letter, starting with the first, is missing from the word italic. It looks like " t l c ". Maybe something is wrong with the fonts. I'm using the latest Chrome on OSX.


Yes, the font file for italic fonts are currently missing (See https://github.com/manuels/texlive.js/issues/1)

I could not figure out which file is missing, yet. but as soon as I find it, I will include it and fix this.


Brilliant! this opens up for collaborative LaTex files generation online.


This is awesome. Care to share some details on the implementation?


Pointlessly awesome. Care to describe why it would make sense to run LaTeX on JS?


In order to generate PDF's on the client. I'm sure there are a ton of web services that would love to do this.


Offline rendering of LaTeX in the browser without needing to download a full installation of LaTeX! I think that's pretty cool.


Running LaTeX on Chrome OS? Frankly, I'm not sure.


Could certainly come in handy if I ever need to quickly recompile my resume and I don't have access to a system with LaTeX installed.


Because: he could


Fun demo, but I can't get the .pdf back right now!


Probably the browser blocks new windows from popping up (window.open() function)


If you want to get around this: When the PDF is ready, give the user something to click on (a button or dialog) and call the window.open() from the onClick callback.


This is awesome but I also had a problem with viewing the file in the browser. What worked was opening it in a tab and then saving the page as a pdf.

Firefox 18.0


Does this presume a local Unix filesystem? It sure doesn't like the lack of a /bin directory on Windows.


No, emscripten emulates a virtual unix filesystem. That's why you see this error on a Windows machine.


It would probably be useful just to say something like "script output" - I expected the right pane to display the compiled result from the left pane, but instead saw error messages.

And yeah, expected error messages like the no-/bin should probably be filtered out. But man, great idea!


I didn't quite get what is being done in Javascript...? Is it the whole typesetting/rendering?


License of the toolchain?


"Hurray, I'd love to hear more about every single software written in the past 70 years compiled to LLVM bytecode and then translated with Emscripten to JavaScript!"

I would hardly call this porting something.

EDIT: I just remembered I ported Microsoft Word to Linux by running it inside a Windows VM.


> EDIT: I just remembered I ported Microsoft Word to Linux by running it inside a Windows VM.

That analogy is not correct. He isn't running existing binaries in an emulator. He cross-compiled code to a new platform, using a new compiler and toolchain, and made sure everything worked in the entirely new environment. And as he mentions in a comment, he found, reported and fixed some bugs in the compiler and toolchain while doing so.

And remember that the web platform is different than a normal desktop environment. You can't just compile code and expect it to run perfectly (it often does for small apps, but larger ones generally no), for example, the main loop has special requirements, threading as well, etc. It's the same as porting a Windows game to the iPhone, for example - there are different APIs, different expectations of how the OS works and what it allows you to do, etc. Such ports take work.

So your dismissal of his work is very unfair.


That seems excessively cynical to give someone who is sharing something they're obviously quite proud of.


I don't mean to offend, but I still fail to understand the excitement around this.


It's OK, you don't have to be excited about things others are excited about. You also don't have to rain on their parade.


If I had an email signature, I'd set it to this.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: