
Sphinxtr: Creating a Portable PhD Thesis - jterrace
http://jterrace.github.com/sphinxtr/singlehtml/index.html
======
neonkiwi
Unfortunately this (good) product will take at least a generation to make it
to the academic world.

I put my thesis together in HTML using Pandoc, customized to look a bit like a
Tufte publication with margin notes. It had animations, anchor links when
cross-referencing paragraphs, and in my eyes, made sense for a document like a
thesis.

My committee members, on the other hand, were in consensus that a thesis
should be a PDF with page numbers, not some multimedia document with
_hyperlinks._ When I made a PDF[0] and submitted it to the university, someone
even checked it to make sure the page numbering switched from Roman to Arabic
at the correct point as stipulated by the submission guidelines (it didn't,
hence my knowledge of their thoroughness).

Considering that universities today often don't even want a hard copy of a
thesis, this is a world that is tied very strongly to the paper paradigm.

[0] Fortunately Pandoc did this quite nicely, though I did have to make a few
changes to deal with citations. Incidentally, there is no good, cross-platform
way to put an animation in a PDF.

~~~
andreasvc
There's a good reason to require proper page numbers: citations. In general
you can't refer to a particular passage in a huge HTML document. Anchor links
are a technical solution, but that's not how normal citations work.

I think that more generally when a document becomes sufficiently complex it is
better to use a page-oriented, typesetting approach. You'll want to fiddle
with where certain page breaks and figures end up, and if you do that you
might as well stick with one canonical format.

You can view it as the paper paradigm. Alternatively, you can also see it as
sticking to a uniform, canonical format. Getting fluid layouts right is a very
difficult problem, and manually getting the layout right for a fixed format is
simply a much better solution for this purpose.

~~~
jterrace
The HTML format does generate multiple pages for different sections and
chapters, completely configurable. The format above is the singlehtml format
where everything is on one page. I'd much rather have a link to a specific
anchor tag on a page that goes directly to where I want than a citation to a
page number on a 200 page document.

~~~
abbot2
"As show on pages 105-107" looks much better than "As shown in sections 4.5.4,
5.1 and 5.1.2". Really. You don't have the first option with HTML and/or
sphinx. Oh, and can I have some bold italic text please?

~~~
jterrace
You can _link_ to a section with a URL. No "section xyz" or "page n" needed.

~~~
abbot2
Yes, you can place a link. But the link, unlike the page _range_ , points only
to the place, not to the _scope_.

~~~
irollboozers
Why can't you link to divs on a page? When I'm given a page range, I don't
typically first go to some arbitrary page in the middle of a range.

------
abbot2
As someone who did typeset his thesis not too long ago to forget all that lot
of tiny tweaks I had to make to get a perfect print out from LaTeX, I would
say that this project has a lot of issues. Some of them a more or less easy to
fix, but some of them will be hard. Consider wrapping some specific paragraph
in a \fussy / \sloppy pair, or hammering a specific float to this page, or ...
(this list is really long). Not to mention bibliography tools. I don't claim
that these issues are unsolvable, but the deeper you go into solving them the
more you will discover that you are basically re-implementing LaTeX.

While the idea of multi-format thesis, or at least double format - PDF and
HTML, is very compelling, I doubt there could be good enough solutions for
that for any thesis which contains more then just text and some inconsiderate
amount of figures/tables/formulae at all.

~~~
jterrace
Agreed, but for my case, I didn't really care too much about the PDF output.
The printed version of my thesis is going to sit in my university library
untouched forever. I'm much more concerned with having a semantically correct,
searchable, beautiful HTML output.

~~~
abbot2
Why not to go the other way round in this case? Typeset your thesis in
semantically correct, searchable, beautiful HTML5, and then just add some
specific CSS to get the hard print copy?

~~~
andreasvc
One big reason is math, which TeX is very good at.

~~~
abbot2
I don't argue that TeX is good for typesetting thesis. I doubt that sphinx,
even with some plug-ins is good for that. Also don't forget that
reStructuredText has its own problems (not found in HTML and LaTeX). Can I
have some bold italic or italic monospaced phrase example in reStructuredText
please? No, really?

------
frisco
This is what thesis writing procrastination looks like.

~~~
jterrace
To be fair, I only released it after my thesis was done :)

------
Nowyouknow
This is an impressive undertaking, but good lord, what an unfortunate name.

~~~
dljsjr
To be totally fair, a sphincter (I'm assuming this is what you're referring
to) refers to a circular muscle structure in the general case. Not just the
butthole.

There are sphincters in your eyes.

There is a sphincter in your esophagus.

Your body is full of sphincters.

The anal sphincter is just one of many sphincters.

And now you really _do_ know.

~~~
mirkules
That's irrelevant, because the association is ever-present.

As an example, if people lived on Venus, they would be called "Venerials" by
the proper genitive form. Alas, doctors got to it first (Venus, Roman goddess
of love...) so it was changed to "Venutians". (Source: Neil DeGrasse Tyson's
podcast, StarTalk radio).

~~~
crntaylor
_...would be called "Venerials" by the proper genitive form..._

I'm going to keep telling myself that your use of 'genitive' right next to
'Venerials' was entirely conscious and deliberate.

~~~
mirkules
Heh, not at all.

------
jterrace
Other output formats: <http://jterrace.github.com/sphinxtr/>

GitHub repository: <https://github.com/jterrace/sphinxtr>

------
decebalus1
Sorry, but the project name is totally unfortunate.

~~~
jterrace
intentional :)

~~~
troyk
Love the name, laughed my ass off ;)

Disclaimer: I'm immature...

------
adiM
How does this compare to existing text formats that generate multiple outputs,
in particular pandoc (which supports an extended version of Markdown)?

------
andreasvc
The section on typography is actually about formatting. Also, an epigraph is
not the same as a quotation:

    
    
         [...]
         2. (Literature) A citation from some author, or a sentence
            framed for the purpose, placed at the beginning of a work
            or of its separate divisions; a motto. Epigraphic

~~~
jterrace
Pull requests accepted :)

------
muneeb
This is awesome! I can totally see myself using this. We also need better make
files for Latex. Changing one line, hitting make, and then watching crap fly
across your screen is not a pleasant experience ...

~~~
jterrace
It's using <http://code.google.com/p/latex-makefile/> which is a really
awesome makefile. It has nice colored output and throws away all the garbage
output.

~~~
qznc
At least you can build your own scripts around this stuff. Building LaTeX for
me goes like this:

I use vim, change a line, and hit ":w". My git-onNotify [0] script detects a
change and issues "make show". The Makefile uses rubber or latexmk to build a
pdf, then issues "gnome-open $PDF", which opens the new version in my pdf
viewer. If my screen is tiled, the preview on the side just updates.

Essentially, I just save my tex file and wait for the change.

[0] <https://github.com/beza1e1/dot/blob/master/bin/git-onNotify>

------
pserwylo
Looks nice. Currently, I am quite happy writing my thesis in LyX [0], which
outputs to various formats, including HTML.

I also have a script which periodically converts the source files from LyX to
both HTML and PDF, then dumps them in the webroot of the Apache server running
on my Uni computer. This folder has a .htaccess file which restricts access to
my supervisors and myself using the Uni's LDAP server.

It works a treat for me.

[0] - <http://www.lyx.org/>

------
pmr_
I have written my Bachelor thesis in org-mode and was incredibly happy with
it. There is certainly a difference in requirements for a PhD Thesis, but I
cannot think of any missing feature or hard to overcome problem right away.

~~~
drothlis
What did you use to convert to pdf / html?

~~~
pmr_
org-mode comes with HTML/pdf/DocBook export and you can extend the exporter
with your own formats.

~~~
drothlis
Did you find it easy to extend with your own formats? Or to modify an existing
format, say html, to produce slightly different output?

I usually use perl/sed/awk + markdown for generating html from my own made-up
mini-formats. I'd love to keep the sources in org-mode instead, but I wonder
whether elisp is the right language for the type of text munging I want to do.

~~~
pmr_
elisp isn't really good at text munging (that sounds counterintuitive at
first, given it is the extension language of an editor) and works much better
if your data is structured as s-expressions. AFAIK there used to be project to
build a modern parser for org-mode files, but I couldn't tell you how much
this has progressed or how usable it is. So, if your export process contains
heavy text-munging I'd avoid it.

Modifying the HTML output (adding classes etc.) was fairly OK, I haven't tried
anything crazy though.

------
lrem
Looks cool. But I guess doing things like using IEEE classes is out of
question? And does it really miss a way of typesetting inline math?

~~~
jterrace
It has inline math. I just forgot to add an example for it. If by IEEE style,
you mean bibtex, then yes, you can easily swap the bibtex format.

~~~
lrem
I mean writing things meant for publishing in IEEE conferences, using the
IEEEtran class.

Another concern is: is it practical to include figures drawn with Tikz? I find
it the easiest way to lay out many things, but it effectively means LaTeX
lock-in.

------
kghose
Also relevant: <https://github.com/coolwanglu/pdf2htmlEX>

~~~
jterrace
This is a nice project, but different goals. I don't want my HTML output to
look like a PDF. I want it to look like a nicely formatted web document

------
Heliosmaster
I don't get it, what's the problem on writing good handcrafted (no WYSIWYG
editors) LaTeX?!

~~~
qznc
It seems out of place in a web-dominated world. Compare Brat Victor's
"learnable programming" [0] or "Scientific Communication As Sequential Art"
[1], which includes animated or interactive things.

[0] <http://worrydream.com/LearnableProgramming/> [1]
[http://worrydream.com/ScientificCommunicationAsSequentialArt...](http://worrydream.com/ScientificCommunicationAsSequentialArt/)

------
freeslave
here's an eclipse based editor specifically built for writing/managing your
thesis. <http://www.chapterlab.com/>

------
_hobgoblin_
I read that as Sphincter...

------
cmccabe
If the tex2html tool is poor, why not improve that tool? Re-inventing TeX
seems like quite a waste of time.

Every time someone invents a new markup format for absolutely no reason, I die
a little bit inside.

~~~
andreasvc
TeX and restructuredtext have very different goals. The first lets you specify
every little typographical detail for a fixed page-oriented output. The latter
focuses on a minimal set of semantic markup which can be used for different
presentation media.

~~~
coliveira
What you said is just the difference between TeX and LaTeX. If you want just
semantic meaning, use LaTeX without lower level TeX commands, then use an
automated tool to convert it to html or other formats.

~~~
andreasvc
No it's not. LaTeX is still page-oriented. You can convert it to HTML and
other formats but not as well as multi-format markup specifically designed for
that purpose.

~~~
cmccabe
The whole point of LaTeX and TeX is not to worry about the exact format, but
to worry about the semantics of the content instead, and let the layout engine
decide for you. Like all automation, this generally works very well, except
when it doesn't. When it doesn't, the solution is to give it a little manual
shove in the right direction, not to rewrite everything in the markup language
flavor of the week.

Incidentially, semantics, not presention, was originally the point behind
HTML, but it got warped and twisted over time into a kinda-sort presentation-
oriented language.

restructuredtext, on the other hand, was originally developed to create
documentation for Python programs. It might be good for that purpose (wouldn't
know; haven't used it.) But it's certainly not good for typesetting
mathematics, research papers, academic quotations and so forth. Hence the
large amount of wheel reinvention going on here. It's a little bit like
writing your research paper using JavaDoc comments. Sphincter indeed.

~~~
jterrace
You've never used ReST but you somehow know it's not good for typesetting
research papers?

~~~
cmccabe
I used it a little bit back when I still wrote Python. It seemed a lot like
HTML, but much terser, and whitespace-sensitive. ReST could do the same kind
of stuff as HTML: build bullet lists, create tables, italic text, and so
forth. However, since I had already learned HTML, learning another language
that did the same thing felt like a waste of time. I fully believe that
Restructured Text is a better markup language than HTML in some ways; however,
I simply don't care because the differences are minor, and HTML is so much
more powerful.

On the other hand, TeX was developed by Donald Knuth, a guy who spent his
entire life doing research and writing papers about it. It has excellent math
support, and is a true semantic language. I've written a few papers in TeX and
been very happy with it.

Anyway, if RestructuredText were good at typesetting research papers, there
would be no need for this project, would there?

~~~
jterrace
You're confusing the language with the build system. LaTeX wouldn't be very
useful without the awesome build system. This project is a build system for
producing both high-quality HTML and high-quality PDF (through latex) with a
single, high-level ReST markup language. It also _uses_ latex formulas for
math.

