
Tips and tricks to write LaTeX papers in with figures generated in Python - Wookai
https://github.com/Wookai/paper-tips-and-tricks
======
akshayn
> We also recommend to save the command used to generate a figure in the LaTeX
> file

An approach I have adopted recently is Knitr[1], so this layer of indirection
goes away. With knitr, my data goes directly into the paper repository, and
then my Makefile has something like this:

    
    
      %.tex: graphs/%.Rnw
        Rscript -e "library(knitr); knit('$?')"
    

The nice thing is exactly what the authors recommend: it's much easier to
enforce a standard appearance across all the figures, and automatically
incorporate more recent data into the paper as part of the compilation
process.

[1] [https://yihui.name/knitr/](https://yihui.name/knitr/)

~~~
Wookai
Looks awesome, thanks for sharing!

------
abhgh
I'd also add that for figures Inkscape is invaluable [1]. Save as svg once,
and export it as whatever later. I typically export it to PDF (from within
Inkscape) for pdflatex.

While its typically indispensable for schematics, I often seem to run into the
use case of combining previously generated plots or figures, or adding a
label/text. Since Inkscape can import pngs, this is a breeze with it. I don't
have to go back to the original code to regenerate plots, or fiddle around
with latex to make minor adjustments.

For stuff generated via matplotlib, I'd strongly recommend seaborn as an
additional library [2]. This is a wrapper over matplotlib. It can prettify
plots with just an import and a 'set' command. You can, of course, use it to
plot too, and for stuff doable in matplotlib using the seaborn alternative is
much easier and looks better with little or no work. And they support pandas
dataframes.

[1] [https://inkscape.org/](https://inkscape.org/)

[2] [https://seaborn.pydata.org/](https://seaborn.pydata.org/)

~~~
fourier_mode
The problem with inkscape is that, any slight changes to the figures would
make the user go deep into the workflow pipeline to make the changes. However
using LaTeX packages like TikZ or PSTicks would simplify the workflow and make
the document more maintainable.

~~~
abhgh
I think this is one of those things that depend on your actual workflow,
content etc. I see your point; for me this hasn't been a problem.

------
Jill_the_Pill
Having just completed a dissertation in LaTex, with figures online in Overleaf
and Dropbox (some of them screenshots), scripts and data spread across two
computers and an external hard drive, desperate last minute plot text changes
right in the pdf, I just have to ask: WHY DIDN"T YOU POST ALL THIS SOONER?

~~~
Wookai
I'm sorry ! It has been online for 4 years now, I simply never thought of
sharing...

------
sigurdjs
If you are serious about making beautiful figures in latex, I would seriously
recommend using tikz and pgf-plots. It is quite easy to automatically generate
tikz-code from python (after all it is supposed to be read and written by
humans) and all aspects of the figure can easily be customized. I have been
quite successful in generating automated reports with pretty and easily
readable figures using tikz and pgf.

If anyone is interested I have uploaded a sample script for generating XY-
plots from two numpy lists to github. The code is by no means very good, but I
just wanted to share in case anyone wants to try this approach.

[https://github.com/sigurdjs/python-tikz](https://github.com/sigurdjs/python-
tikz)

~~~
programLyrique
And it's also possible to directly load a csv file with all the data in latex
and plot if with pgf, which makes it possible to keep all the plotting options
in the latex file:

    
    
      \addplot table[x ={Column1}, y ={Column2}] {myData.csv};
    

The issue is that it can take some time for pgf to load the data and do
computations on them, but you can use the external library of tikz so that it
does not compute the plot again (and save it as a pdf for later uses).

------
jedberg
> When writing LaTeX documents, put one sentence per line in your source file.

An interesting tip, never thought of that! It changes the way you write a bit,
but it does make finding changes easier, finding errors easier, and forces you
to think more about each sentence since you have to hit "enter" at the end of
each one.

~~~
bo1024
It also works much better with version control software (git). Not only does
it help with diffs, as the article mention, but it makes merging way easier in
case you and your coauthor change two adjacent sentences at the same time.

------
bonoboTP
I find it useful to work with plots in Jupyter notebooks. Use the "%matplotlib
notebook" cell magic to get interactive plots inline.

Then you can use savefig when it looks good. Then save the code you used into
some file near the Latex sources.

~~~
maksimum
I also use this approach.

To standardize appearance I put appearance modifiers in
`notebook_context/__init__.py`, and then in my second jupyter cell

    
    
      from notebook_context import *
      configure_plotting_for_publication()
    

Example notebook_context:
[https://github.com/maksimt/empirical_privacy/blob/master/src...](https://github.com/maksimt/empirical_privacy/blob/master/src/notebook_context/__init__.py)

------
mlthoughts2018
I also recommend separating repetitive parts of plot generating code into
template files, such as with mako or jinja2, and then programmatically
generate sequences of plots by first piping the data into the jinja2 template,
and then using insert commands to insert it into a bigger tex document.

I found this helpful when writing a paper where the appendix needed over 35
different tables of regression results, all with the same format but populated
with data from different subpopulations, which would need to be regenerated
(including updated captions, etc.) any time data cleaning or methodology was
changed.

~~~
Wookai
That's a great point! Templates are a great tool to generate big tables from
results, I usually do that for most of the results in my papers, makes it
easier to have the odd copy/paste error. I might add this to the tips and
tricks, thanks!

------
unwind
Meta: there seems to be an extra "in" in the title, that makes no sense to me,
at least.

Not a native speaker, though.

~~~
naniwaduni
It's a careless transposition of "papers in LaTeX" → "LaTeX papers in" without
removing the "in".

~~~
Wookai
Indeed, sorry about that!

------
euske
Re: figures in EPS. I think SVG is the way to go. It can be generated with
matplotlib or even a simpler script (it's just an XML after all). It can be
hand edited. It's viewable with a browser. And it can be converted to PDF with
rsvg-convert.

I personally find matplotlib a bit unintuitive to use, so I made a 100-line
script for generating SVG. It's great.

------
knolan
This is probably most useful for postgrad students getting started with
writing with TeX.

It’s worth pointing out that the figures are made using the matplotlib
library, which is primarily based on Matlab’s plotting functionality. This is
perhaps just as useful for new researchers as many of them are taught Matlab
exclusively throughout their undergraduate courses.

~~~
p10_user
It’s great for getting started, but if you start really customizing your plots
the Object oriented usage of matplotlib is really the way to go.

------
jonathanpoulter
A minor plug: I've found I generate graphs and tables in Jupyter notebooks, so
I wrote ipynb-tex, to allow you to reference cells from a notebook directly in
your LaTeX documents. This supports tables, and figures.

[https://github.com/poulter7/ipynb-tex](https://github.com/poulter7/ipynb-tex)

------
mychele
I would suggest checking matplotlib2tikz and matlab2tikz to get pgfplot/tikz
figures from matplotlib and matlab plots

~~~
Wookai
Indeed, they're pretty cool (although in my experience the resulting TikZ code
sometimes slows down compilation quite a bit).

------
semi-extrinsic
One itch which (curiously) I can't seem to quite scratch in LaTeX is that it
should be possible to say "plot equation \ref{eq:smth} for X in (-4,4)" and
just get the bloody graph. Why should I need to define the equation again in a
separate place, perhaps even in a separate file?

~~~
kccqzy
LaTeX doesn't have enough information about what your notations mean. You can
very well write nonsensical formulas that look pretty in LaTeX but are
absolutely meaningless.

~~~
bingerman
I wish I had read the texbook or something similar sooner to gain knowledge
like this. Used latex for years without knowing the basics and I regret that a
lot.

Also, (v)phantom and smash are something I really should have learned before
all those fancy packages, nowadays I'm mostly using context anyways.

------
joseph8th
Any opinion on the utility of Emacs Org-mode to organize and manage LaTeX? In
particular Org Babel?

~~~
p10_user
I’ve written documents in org mode and converted to pdf via LaTeX, but I find
that if the document gets sufficiently complicated with formatting, I have so
many LaTeX blocks in my org file I might as well be writing LaTeX directly.

Maybe I’m doing something wrong. YMMV

~~~
loskutak
That is true, but org-mode really shines when you want to do literate
programming stuff, e.g. have the matplotlib code directly in the orgfile, ...

------
tapia
I already implement most of the points mentioned there. The most useful (and
new) tip for me was however the rasterization part. I normally like to have
pdf figures for my LaTeX papers, but last time I had some graphics with some
thousands of points plotted, which were taking too long to be printed if you
did that from windows (in Linux there was no problem, that's why I didn't
catch the problem earlier). At the end I decided to save the plot as png, but
was not happy about it haha. It would have been good to know the rasterization
trick earlier.

~~~
Wookai
Indeed, it's pretty useful to be able to rasterize only parts of the plot!
Glad you find it useful!

------
Wookai
Thanks all for the great feedback and discussion, I'll update this thread once
I push an update. If you're interested, there was a great discussion on
/r/MachineLearing as well:
[https://www.reddit.com/r/MachineLearning/comments/b2oiaj/d_b...](https://www.reddit.com/r/MachineLearning/comments/b2oiaj/d_best_practice_and_tips_tricks_to_write/)

------
stilley2
Thanks for the write-up! Two notes from my experience: pgf output works well
with latex as well (although will slowdown compilation), and I recommend not
using the pyplot submodule, especially if you'll be running things remotely
over ssh and don't have a display

~~~
alanbernstein
Would you suggest an alternative to pyplot? What problems does it cause for
you?

~~~
stilley2
I had problems using pyplot over ssh because it can assume there's a display
and fail when it couldn't find one. Maybe this has changed. I use the OO
interface. For example
[https://matplotlib.org/gallery/api/agg_oo_sgskip.html](https://matplotlib.org/gallery/api/agg_oo_sgskip.html)

~~~
jpeloquin
Changing the plot backend should fix this.

    
    
      import matplotlib
      matplotlib.use('Agg')
      import matplotlib.pyplot as plt
    

[https://stackoverflow.com/questions/2801882/generating-a-
png...](https://stackoverflow.com/questions/2801882/generating-a-png-with-
matplotlib-when-display-is-undefined)

~~~
stilley2
I believe Agg is only for bitmap output. While there are probably backends
that work with a headless system, I find the OO option much more flexible.

~~~
p10_user
Agg works for all outputs. I use it in combination with OO over ssh all the
time.

------
billfruit
Is there a better and more comprehensive plotting library than Matplotlib,
it's 3D plots a lack polish. Also it is kind of verbose and require much
boilerplate. Its api is sprawling and hard to remember.

~~~
ujuj
Check Seaborn, it may suit your needs.

[https://seaborn.pydata.org/](https://seaborn.pydata.org/)

~~~
dagw
Also Plotly and Bokeh, although both are more targeted towards producing
interactive web based plots rather than print ready plots.

------
musicale
I have one tip for anyone using LaTeX:

Please stop using the awful Computer Modern typeface.

~~~
edgarvaldes
Any typeface in particular that you do recommend?

~~~
wenc
I used Bitstream-Charter [1] for my dissertation. It looks much better than
Computer Modern.

My resume is typeset in Linux Libertine [2] which is used in this
superlatively beautiful and elegant CV template by Dario Taraborelli [3].
Requires xelatex.

[1]
[http://www.tug.dk/FontCatalogue/charterbt/](http://www.tug.dk/FontCatalogue/charterbt/)

[2]
[http://www.tug.dk/FontCatalogue/linuxlibertine/](http://www.tug.dk/FontCatalogue/linuxlibertine/)

[3] [http://nitens.org/taraborelli/cvtex](http://nitens.org/taraborelli/cvtex)

