Thanks for your hints! I added a 'Open PDF' button to circumvent popup blockers.
Yes, I used emscripten to port it to Javascript.
It was not that hard. Emscripten had three bugs I had to fix (the hardest was to find that the %g format was not supported by emscripten's sscanf).
But it was compiled almost like for x86: first convert the pdftex 'web' souce code to c using web2c, then compile it to LLVM bytecode and the LLVM bytecode to JS.
I would like to compile it on my own with emscripten. Did you use emconfigure or did you write you own compilation script? Couldn't find any pointers to that in the Git, but maybe I didn't look around long enough to find it. How did you deal with kpathsea?
EDIT: I managed to find the answer to this question myself.
I want to generate a bunch of bytes programmatically, then have the user click on a button, and allow the user to save a file containing the generated bytes. This should run entirely client-side, with no talking to the server (except for static loading of the HTML page, JS files, images, css, etc). I've been wanting to do this since I wrote my first Java applet way back in the 1990's, and haven't been able to find a way; it's a personal long-standing unscratched itch for me.
Apparently with HTML5 it's possible to do this, since this project does it! I wasn't able to find where in the code the downloading happens, and I'm not sure what this concept is called, which makes searching difficult. Thanks!
EDIT: Browsing commits instead of the source tree was fruitful [1]. What I'm looking for is called Data URI [2]. The inverse operation -- programmatically processing uploads on the client side -- can be achieved with the File API [3]. Now I need a few days to think about what startups will become possible with these capabilities!
data: URIs are deprecated for generating content. Use object URLs and Blobs (https://developer.mozilla.org/en-US/docs/DOM/Blob). If you want to force saving the file to a disk use <a download="[filename]">
Although, annoyingly, data: URIs are your only option in Safari (including the latest version), and they also happen to crash the PDF viewer. Safari does have createObjectURL, but then doesn't understand the URL it creates!
Just as annoyingly, IE10 has createObjectURL too, but only allows the created URLs to be used for <img>, <audio> and <video>. To save a blob, you have to use something ms-prefixed instead, and this doesn't allow inline PDF viewing.
A much more mature LaTeX JavaScript interpreter is MathJax [1].
The projects fill somewhat different niches, however. MathJax's ideal use case is adding math support to a CMS like a blog or wiki. This project looks like it's better for offering a Web-based LaTeX-to-PDF compilation service for articles with 100% compatibility with the original implementation.
Also, this project is a great resume-builder if the author is looking for a job that involves wrangling build systems or cross-compiling to JavaScript!
I don't like TeX. I use it and i like the output, but I have never realy understood the language and therefore I don't like it. The syntax for optional arguments([]) seems very odd to me, aswell as the separation between mouth and the rest. The support for named parameters is IMO very hacky.
Wouldn't be a Tcl based macro processor with the tex-Backend nice? Or is this silly?
I agree. I programmed a few internal LaTeX "packages" for my department in the university. I not only typed some math, it includes a lot of strange macros.
The "programming language" is horrible. It's a clear example of a turing tarpit. For example you don't have arrays, so you must fake them. You don't have function, so you have to return the value in a glob@l. Monkey patching is considered an art, but this makes many of the different packages slightly incompatible.
The "printing library" is amazing. If you only want to do a standard thing (and someone else had programmed it) the result is nice.
The other advantage is that everyone knows it, so if you make a package correctly it is easy to use by mathematicians that can't program in LaTeX.
Yes, the TeX language is pretty horrible (indeed, the whole concept of TeX is a testament to the strength of cannabis available in the Bay Area in the 70s - it's not a typesetting system, it's two separate programming languages which typeset as a side effect!). LuaTeX is an attempt to remedy this situation by embedding lua in the TeX engine: http://www.luatex.org/
(It's also no coincidence that the development of BibTeX coincides with the crack epidemic, but I digress.)
The syntax for optional argument and the lack of support of named parameters are limitations of LaTeX, not TeX. For example, see ConTeXt, which is a macro package on top of TeX, that has much more consistent syntax and all macros accept named arguments.
TeX (PlainTeX) doesn't have support for optional arguments or named parameters (perhaps in some primitives, but they are very different of the optional arguments in LaTeX).
But TeX has a extremely flexible macro system, so it is possible to fake optional arguments. That's what LaTeX do. LaTeX defines a standard way to have optional parameters and make easy to define and use them. It has some strange optional parameters like in \newtheorem, because they are really fake optional parameters. The starred versions of the command also are fake, and have to be defined using a trick. But if you are lucky and don't look under the hood everything works quite well.
I don't know enough about ConTeXt, but I hope that it has a better system for optional arguments.
In some way, this is similar to what happens with some features of high level languages and assembler. For example, the exceptions in C# or Java are translated sooner or later to assembler, but assembler doesn't have exceptions.
Sad to hear that. I don't use Safari, but maybe it cannot compete with the JS engines of Firefox and Chrome's version of the Webkit engine.
This script reqiures a lot of computation in JS and communication between the website and the webworker (~300kb). Maybe Safari is unable to cope with this in a decent amount of time.
I've seen emscripten-generated JS crash Safari before. It's not a performance issue; jsc is plenty fast and beats SpiderMonkey in most benchmarks. However, emscripten seems to exercise certain corners of JS that pose stability problems for jsc.
Which benchmarks are you looking at? And are they actually JSC vs SpiderMonkey or JSC+WebKit vs SpiderMonkey+Gecko? I.e. are they JS benchmarks or DOM benchmarks?
Yeah, on that set of benchmarks (the one that everyone is targeting hence not doing anything obviously stupid on), they are.
The problem is that all of the JS engines involved have various failure modes in which they end up way slower than the others, though... We're talking 10x-1000x slower. And the problem with performance is it only takes your script hitting one serious performance bottleneck to make the speed of the rest of it not really matter. :(
Hence my interest in testcases that point out such performance cliffs, so they can be removed.
1. Hitting "Compile" does whatever it should do successfully.
2. Then hitting "Open PDF" opens a new tab which stalls out and crashes
I would almost say its actually a PDF thing that is crashing Safari vs a JS thing.
At least, that's what has happened consistently on my machine with latest Safari and WebKit nightly.
Edit: However, taking the generated PDF from Chrome and dropping it into Safari loads it fine, so maybe it is a JS thing.
Edit 2: Upon further inspection it appears the new tab is being opened with just a data URL representing the entire PDF. It wouldn't surprise me if that's the problem (Safari not being able to handle huge data as a URL). I recall running into a similar crash in MobileSafari because of that a while ago.
Some background would be nice. Wasn't LaTeX notoriously hard to port – to the point that a LateX app for iPad actually ran in an emulator initially [1]. Is this port based on web2c as well?
Nevertheless it's a good effort as a proof of concept.
I'd also appreciate info on whether Latex packages will be supported in the future. At the moment this does not seem to be the case, though I can imagine this would be a challenging problem to solve for in-browser compilation of documents.
Latex packages are not supported automatically.
The script has to download the required files in advance which is currently only done for the font file etc.
In theory one could search the LaTeX code for '\usepackage' stings and download the required files and mount them into the virtual filesystem of emscripten.
Awesome! I looked at this approach for https://www.writelatex.com (which does the LaTeX editing in the browser but compiles on the server), but I didn't get very far with it. It would be interesting to read more about how it works / obstacles you had to overcome.
Just wanted to comment that this works better in Internet Explorer 10, documents seemed to compile significantly faster than on Chrome. Running Windows 8x64, for reference.
This part of the document isn't rendered correctly in the resulting pdf:
In printing, text is usually emphasized with an
\emph{italic}
type style.
In my pdf it looks like the every other letter, starting with the first, is missing from the word italic. It looks like " t l c ". Maybe something is wrong with the fonts. I'm using the latest Chrome on OSX.
If you want to get around this: When the PDF is ready, give the user something to click on (a button or dialog) and call the window.open() from the onClick callback.
It would probably be useful just to say something like "script output" - I expected the right pane to display the compiled result from the left pane, but instead saw error messages.
And yeah, expected error messages like the no-/bin should probably be filtered out. But man, great idea!
"Hurray, I'd love to hear more about every single software written in the past 70 years compiled to LLVM bytecode and then translated with Emscripten to JavaScript!"
I would hardly call this porting something.
EDIT: I just remembered I ported Microsoft Word to Linux by running it inside a Windows VM.
> EDIT: I just remembered I ported Microsoft Word to Linux by running it inside a Windows VM.
That analogy is not correct. He isn't running existing binaries in an emulator. He cross-compiled code to a new platform, using a new compiler and toolchain, and made sure everything worked in the entirely new environment. And as he mentions in a comment, he found, reported and fixed some bugs in the compiler and toolchain while doing so.
And remember that the web platform is different than a normal desktop environment. You can't just compile code and expect it to run perfectly (it often does for small apps, but larger ones generally no), for example, the main loop has special requirements, threading as well, etc. It's the same as porting a Windows game to the iPhone, for example - there are different APIs, different expectations of how the OS works and what it allows you to do, etc. Such ports take work.
Yes, I used emscripten to port it to Javascript. It was not that hard. Emscripten had three bugs I had to fix (the hardest was to find that the %g format was not supported by emscripten's sscanf).
But it was compiled almost like for x86: first convert the pdftex 'web' souce code to c using web2c, then compile it to LLVM bytecode and the LLVM bytecode to JS.