
Give me back my sanity: How I had to install Pandoc in a CentOS Docker container - grownseed
https://gist.github.com/grownseed/4fd2e91eca829cc039de
======
peterwwillis
Congratulations! You are now closer to being Enlightened.

You now have a closer understanding pf the hundreds of thousands of man-hours
that distribution maintainers spend doing this _every single day_ just to give
you a working application. Keep in mind of course that all of which you
performed will not replicate properly, because it doesn't use a spec file,
doesn't record its build deps, run deps, permissions, has no uninstall or
upgrade, doesn't use standard paths, probably skips a whole bunch of correct
procedure and bug fixes, uses some real-time dynamic package
managing/installing thingie, yadda yadda....

It's not that software development procedures are broken; actually, they're
really, really well defined for most of the Open Source world. It's just that
most humans have no clue how much work goes into giving them some 'basic'
products. Kind of like how most people have no idea how much of a
multinational effort goes into producing a bottle of Coke.

Could said tool be written with no dependencies? Sure. But that would be a
total nightmare in and of itself. At the end of the day, code reuse and layers
of abstraction are useful... as long as they're designed and used properly.

~~~
VLM
The funny thing is that given maintainers do this all the time so they'll work
faster, and he's apparently figuring it all out the hard way the first time,
by the time he's done the maintainers have likely already uploaded. Probably.
Or at least they could have.

~~~
peterwwillis
It's usually not that simple. If you already have passed the learning curve of
knowing intimately how to maintain packages for your distro, it's a "simple"
matter of updating the old .spec files with the newer versions and sending the
distro maintainers your newer .spec. But there's no guarantee they'll
incorporate your new package in a timely manner (and it will go into an
-UNSTABLE tree anyway, which his Centos 6 install isn't using)

------
debacle
This is just a case study in why using CentOS for anything is like amputating
your left leg - if you're going to do it, you better have a damned good
reason, and no matter how justified you are, it's still going to suck.

But I don't disagree with the author's pain point. Computing is sort of way
harder than it needs to be a lot of the time.

~~~
serve_yay
Yeah I'm not sure I understand the choice of OS here unless it was required
for some reason.

~~~
missblit
I have CentOS 7 installed on my desktop computer!

There are dozens of us! Dozens! But I honestly don't have any major
complaints.

1\. Packages aren't THAT out of date (firefox is the latest ESR, gcc is 4.8)

2\. Things tend to be nice and stable

3\. I sometimes end up needing to compile some stuff from source even on
faster moving distros anyway

~~~
zorked
That is because Centos 7 was just released. Let's talk again four years from
now.

~~~
clarkm
Yeah, every time I ssh into a box and see "terminal type not supported"
(screen-256color), I know its using CentOS 5.x.

Which means ssh doesn't have netcat mode, rsync can't write to log files or
show progress properly, and all my tmux bindings are messed up.

------
captainmuon
Sometimes, the easiest thing is to just compile a binary elsewhere, and drop
it into the system. Especially when you are dealing with CentOS (RHEL,
Scientific Linux, ...).

"Linux"* is usually forward-compatible, meaning that a binary compiled with an
old system will run on a newer one. Maybe you have to bring all your own
libraries with you, and place them on LD_LIBRARY_PATH, but you can get it to
work. I can't count the times I had to extract .so files from RPMs I found on
the net. Feels a bit dirty, but surprisingly works.

Actually, at one point I compiled my own glibc (simultaneously with GCC), and
then the full GTK stack. I played a bit with RPATH so the binaries looked for
libraries at relative, not fixed paths. Then I could just drop the whole usr
tree somewhere, source a script (setting a few variables) and I was able to
run current software, including current versions of Skype, Firefox, and
LibreOffice on my ancient OS __at work. There are binaries for the latter two,
there is no way I would compile those, too.

Its a bit unfortunate that there is no "culture" of using binaries when
necessary on Linux. If people were more aware of the use cases and details,
the experience would be a lot smoother. For example, creators of binaries
could compile against older system libraries to get maximum compatibility. Or,
you could have bundles of libraries (distribution-agnostic, and not installed
in /usr) to run new apps on an old OS. Ironically, Windows has the
distribution of third-party binaries figured out much better... although I do
not envy Windows for the installation situation there, and in 90% of cases
prefer the package managers of the Linux world.

(* And here, I mean Linux distributions, not just the kernel. So
Linux+glibc+GNU+bash+X11+related stuff, but maybe a few of those are swapped
out for alternatives. Just a reasonably GNU/Linux-Like Posixy System, but not
Cygwin or OS X.)

(* * Which I was stuck with for reasons out of my control.)

~~~
fragmede
> The reason that there is no "culture" of using binaries when necessary

I don't understand what you mean.

Outside of Gentoo, most distros are based on binaries - I don't need to
install GCC in order to install programs via 'apt-get' or 'yum install', nor
do I need GCC in order to run them.

\--

Assuming you mean 'culture of not using the built-in package manager', here's
what I think:

While I've definitely done "scp a.out $other_system:" that only works for the
most trivial of programs. After you accumulate a few more files, it becomes
"scp foo.tgz $other_system:" because there're some library files/whatever to
be included.

But then it turns out some of those files need to get put in the right place,
so the next step is a rudimentary runme.sh script that sets up the system.

Hang on, what's the last version we 'installed'? Just have runme.sh stick
version info in /etc/foo.conf.

Congratulations, you've got a rudimentary packaging system!

The reason there's no culture of avoiding the package manager is because it's
there to help you so you don't have to manually hunt down rpm files to extract
.so files, among other things. (If I never have to use rpmfind.net again it'll
be too soon.)

------
zwischenzug
Have a look at this:

[http://ianmiell.github.io/shutit/](http://ianmiell.github.io/shutit/)

which is a means of capturing this complexity for automation purposes.

[https://github.com/ianmiell/shutit](https://github.com/ianmiell/shutit)

I'm working on a "shutitdist" to compile "everything" from source:

[https://github.com/ianmiell/shutitdist](https://github.com/ianmiell/shutitdist)

ShutIt can (among many other things) produce depgraphs like this:

[https://raw.githubusercontent.com/ianmiell/shutitdist/master...](https://raw.githubusercontent.com/ianmiell/shutitdist/master/docs/depgraph.png)

------
echion
Thank you for documenting this so others don't have to go through the pain.
The best thanks people could give would be to file bugs for each of your
annoyances that are still relevant. Now that the problem is actually a bit
more reproducable, the bugs should be relatively productive uses of the
maintainers' time.

------
RickHull
This about the impedance mismatch between building everything from source and
using distro package management. Mixing and matching is painful, as package
managers correlate a lot of package metadata in order to provide smooth
integration and interdependencies of multiple packages.

Building everything from source "by hand" quickly runs into scaling problems,
and admins start to roll their own "package databases" in order to cope.
Fully-baked package management becomes the solution. The problem here is that
distro packages are not the latest development snapshots (for good reasons,
most often and presumably). Still, there are times where newer-than-distro-
repo software is required.

The author's Pandoc case _may_ be one of those, but I doubt it. RHEL (and
CentOS) packages should never (by definition) include showstopper bugs. It's
more likely that bugfixes have been backported to the older, stable version.
Still, it's possible that the author is correct, and the official package does
not meet requirements or is otherwise unusable.

In that case, the best path forward is to build or reuse _an updated package_
, rather than installing from development sources "by hand", e.g. `make
install`. It's likely that someone else has already had this problem and has
built an updated package, which you could then install, but let's assume not.

Now we're into the wonderful world of package building. We can avoid some of
the author's pain here, but mostly there is a tradeoff for having to learn and
deal with the packaging pipeline. The upshot is that package building may be
scripted and synched with development sources, and you are keeping the
metadata inside the packaging system where it belongs and can be managed
(think: future updates).

~~~
bunderbunder
I can't help but compare this to OS X, where most applications are bundled
together with any dependencies that you can't safely assume will exist because
they aren't included in the OS.

Yes, it does waste some disk space. But the $5 worth of disk space that this
approach costs me is more than justified by the $5,000 worth of gray hairs it
saves me.

My only lament is that things still fall back to the annoying old-fashioned
way of doing it whenever I need to install software that's more Unix than OS
X. Which, given I'm a programmer, is still pretty much all of it.

~~~
CUViper
Sure, and Windows is much the same wrt bundling. The tradeoff is that now
you've multiplied the responsibility of library updates across everyone that
bundled it. You probably won't care if simple library version updates are
missed, if the app itself doesn't care. You might care more about bugs that
require every app to update, especially if it's a security bug.

~~~
Meekro
Sure there's a tradeoff, but for most people it's really rare to use software
that isn't being actively maintained. So let the maintainers update their
dependencies when there's a security flaw, the auto-update will pull in the
changes, and I'll still benefit from OS X's 30-second install process.

------
city41
I've been there many, many times and have often wanted to live on an island
with no technology too. But to offer the opposite perspective ...

Yesterday I setup Docker and Fig to provide me with postgres and mysql
containers so I could run the integration tests for a small sql oriented
module I wrote. Other than waiting for the images to download, it was an
absolute breeze. In a matter of moments and a simple config file, I get two
databases at the drop of a hat, and they're not mucking with my main box at
all. Sometimes, I really love software.

~~~
Gigablah
Setting up a private Docker registry though, especially with the (recommended)
setup of an nginx proxy for basic auth and SSL... that's a complete nightmare.
And I still can't get it to work =/

~~~
city41
Maybe by design so you pay for Docker's private hosting? :)

------
taylorbuley
Providing feedback in the right places can work wonders for both the
individual developer and her community. For anyone struggling with CentOS
Docker sanity I suggest reaching out to the team with any questions on a
related Github repo. [https://github.com/CentOS/sig-cloud-instance-
images/issues/](https://github.com/CentOS/sig-cloud-instance-images/issues/)

I recently did such a thing [https://github.com/CentOS/sig-cloud-instance-
images/issues/2...](https://github.com/CentOS/sig-cloud-instance-
images/issues/2#issuecomment-58259012) In addition to a resolution, here's the
caliber of (free, unpaid) support I received:

> ... let us know what other containers would be useful to you? ... Finding
> out what's actually useful to folks helps immensely.

------
zobzu
2 things:

\- distro developpers put a lot of work in making distros easy to use and thus
why yum install pandoc is simple and easy. we often take this for granted.

\- when people create new apps they very, very often think they can do better
than others and suffer from the NIV/+1 standard. Hence multi libs, multi
package managers and 203939megs of bloat. Programs that are proud of make
install vs make + make install generally have this behavior obviously.

In our case id recommend testing with some recent version of Fedora. its
pretty close to CentOS but much newer. Of course that means you'll have to go
through distro update more often.

------
shared4you
I had the exact same problem on an Ubuntu server at work. Since I don't have
root access, I was forced to compile from source. I gave up when I couldn't
compile Cabal.

My solution was Multimarkdown 4 [1]. It supports many basic Pandoc features I
need (tables, etc.). Exports to HTML and PDF. Since it is written in C, the
compilation was straight forward. 10 minutes later, my document is ready.
Couldn't have been happier.

[1]:
[https://github.com/fletcher/MultiMarkdown-4](https://github.com/fletcher/MultiMarkdown-4)

------
seagreen
Wildly the best thing I've found so far on dependency management:
[http://nixos.org/docs/papers.html](http://nixos.org/docs/papers.html)

~~~
imsofuture
Yep, I just switched from Ubuntu to NixOS over the weekend. Solves these
problems wholeheartedly.

~~~
huyegn
Wow, I just read through the documentation for the Nix package manager and
it's a breath of fresh air...Source based builds but with the convenience of
pre-compiled binaries hashed with all the input parameters used to create
those binaries:

[https://nixos.org/nix](https://nixos.org/nix)

Has anyone tried using this for development?

~~~
cstrahan
Yes, and it has been great. I recently had to diagnose a problem in some
software and its dependencies, and I needed a quick way to try different build
flags and dependency versions. Nix's deterministic builds (and having access
to all of the packages definitions in one location) are a huge help here: it
only took a couple _seconds_ of configuring between each build, and I quickly
isolated responsible library/version out of a couple dozen deps.

------
pilif
In cases where I need later versions of applications than what comes with my
distro (Ubuntu 12.04 ATM), using the original source packages is very helpful
as they already correctly list all the dpendencies and they will install
perfectly into the existing system.

So you grab the source package, update the version of the upstream ounces,
update the changelog and build an updated package, with as much original OS
dependencies as possible.

I don't know about RPM sources, but for debs, this is what I had the most
success with.

~~~
bmn_
This. The dumbass in the article doesn't know about packaging and made his
life needlessly difficult.

------
kijin
CentOS is seriously in need of a better way to find and install programs not
in the base OS.

Currently, if you want to install something that isn't in the official repos,
you either need to compile from source or download a bunch of rpm's from
random websites that may or may not be legitimate. (What is this, Windows?)
None of the third-party repos have a decent search function, either, so it's
difficult to tell whether they have the version I need. Heck, even the
official repo isn't quite searchable on the web, you need to use yum on the
command line.

Example: Using only a web browser, find out the latest version of OpenSSL in
CentOS 6.5 and the list of its dependencies.

Compare this with Ubuntu, where nearly everything you could ever want is
organized into easily searchable PPAs, and looking up a package on the web is
as easy as going to packages.ubuntu.com/package-name. No manual downloading of
packages from random websites. No need to dig into mailing lists to find out
which version of bash actually fixes shellshock.

This is not a matter of rpm vs deb, or yum vs apt. This is not a matter of
stable vs cutting-edge. This is a matter of the general ecosystem being
totally haphazard, fragmented, and user-unfriendly.

------
chubot
This was similar to my experience with Haskell and Ubuntu. The Haskell
ecosystem seems to have both severe version compatibility issues and issues
with transitive dependencies ballooning out of control.

I mean, this is a tool that processes text right? Not sure why there should be
hundreds of megabytes of dependencies at any step.

Actually I was expecting a rant about Docker... it didn't really have anything
to do with Docker, as far as I can tell.

------
pnathan
Hum. As someone who's fooled around here... this is the long way around for
pandoc.

The 'proper' way would be to get the latest Haskell Platform in a RPM (I
think?), and then using that, install pandoc.

It's the case that anything I depend on for up to date software is, as a rule,
compiled from source, using the latest tag. The distro maintainers are years
out of date, always. Sometimes this matters... usually it doesn't.

------
gnufied
I am not claiming this is going to be less painful than what author
experienced but on Debian/Ubuntu - `apt-get build-dep` can be sometimes very
useful for installing source dependencies of packages when you are compiling
something more latest by hand (Such as, Compiling emacs 24.4 by hand? `apt-get
build-dep emacs` and then compile emacs).

On another note - I like that additional ppas some more popular open source
projects provide. Such as - nodejs. mongo etc. These def. help in installing
_the latest_ version of required software. However maintaining ppas for lone
developer project can be chore (although much of that can be automtated via
launchpad).

------
tedchs
In general if you're going to run CentOS or RHEL, you need to either be ready
for operating within their versions of packages, or for a lot of pain.
Unfortunately the tradeoff between "stability" and "new" is pretty harsh with
these. It is my hope that Docker alleviates this somewhat, and I think it was
really smart for the author to install this "big bag of source compiled
craziness" under a container instead of polluting the host OS environment.

------
alexbecker
> I would like to extend my gratitude to Google, StackOverflow, GitHub issues
> but mostly, the people who make the beer I drink

This should really just be my email signature.

------
chomp
I've had very similar problems building in-house Openstack RPMs for CentOS 6
(now using Juno.)

You might have better luck in the future just grabbing the el7 stuff (or
Fedora 20) and build them using the CentOS spec file:

[http://pkgs.fedoraproject.org/cgit/pandoc.git/tree/?h=epel7](http://pkgs.fedoraproject.org/cgit/pandoc.git/tree/?h=epel7)

------
mwcampbell
Somebody should contribute a Dockerfile to Pandoc upstream, so jgm can provide
an official Pandoc Docker image (ideally as an automated build done by Docker
Hub). Now, let the bikeshedding begin on which distro to use as the base. I
vote Debian Wheezy, if it has a recent enough Haskell Platform.

------
jph
On CentOS it can be hard to stay current because yum tends to use older
builds. I haven't found an ideal solution for all cases; what I do is build
current versions from source, in `/opt/$program/$version/`.

I also use static linking when possible; this helps enable the binary to be
copyable to other similar servers. I also use Docker and/or Ansible and/or
scripted installs when possible.

This overall approach does mean being committed to updating dependencies and
rebuilding, which is especially important for security updates e.g.
shellshock. It also means specifying environment paths so you get the versions
you want, rather than the default CentOS yum versions. YMMV.

------
edwintorok
If the problem is that the version of pandoc in CentOS 6 is too old, then why
not try to use a CentOS 7 container as a base? In the end aren't containers
about choosing whichever OS image is better suited for a task?

~~~
teh_klev
Environment restrictions matter here. If you're running on Citrix Xen Server
latest (v6.2 like we do and are heavily committed as a business to Xen Server)
then you're stuck at CentOS 6.4 (or Ubuntu 12.04 and Debian 7.0).

Trying to run anything newer in any sort of mission critical role, and it
screws up, will result in being told it's unsupported.

I'm keen on being able to support Docker but CentOS 6.5 is the earliest
version that has out of the box support and CentOS 6.5 won't be supported by
Citrix until the next XenServer release.

But, for us as a smallish hoster targeting businesses with managed services we
value long term support and stability even if it means having to wait a bit
for the fun stuff to arrive.

~~~
edwintorok
Isn't the Docker container all userspace with no kernel components? What is
the difference between running a Docker container for pandoc with CentOS 7, or
running a Docker container with CentOS 6.4 and manually built pandoc? Isn't
pandoc equally unsupported in both cases? In fact as the OP mentions the
pandoc in CentOS 6 was broken, so if it supported why wasn't a solution
provided via the support channels?

~~~
teh_klev
_> Isn't the Docker container all userspace with no kernel components?_

Yes but you still need a minimum kernel version to run Docker (2.6.32-431 or
higher according to [0]).

Earlier this year when we were exploring Docker, CentOS 6.4's latest kernel
version didn't meet that requirement so Docker was a no-go. CentOS 6.5 was the
min version with a supported kernel but our Citrix XenServer environment
(6.2SP100) had no official vendor support for >=6.5. We don't run anything on
our Citrix environments unless it's properly supported by them.

Now because of your comment and doing my fact checking, I just found out that
CentOS 6.4's kernel is now at version 2.6.32-504 which is damn fine news for
us. That said we'd still be stuck with CentOS 6.4 containers because CentOS 7
containers would still be unsupported by Citrix.

[0]:
[https://docs.docker.com/installation/centos/](https://docs.docker.com/installation/centos/)

------
coolsunglasses
7.8.3 is the current version of GHC.

I have no idea if these are compatible with CentOS or RHEL, but I have install
instructions for Fedora in my guide:
[https://github.com/bitemyapp/learnhaskell#fedora-20](https://github.com/bitemyapp/learnhaskell#fedora-20)
that were contributed by davidfetter on github.

------
tedchs
If the folks behind Haskell want it to be used in an enterprise environment, I
would think they'd want to release their own RPMs for recent version of
RHEL/CentOS/SUSE.

------
andrewstuart
This sounds easy compared to some of the hell I have burned in to get certain
things configured like converting an EC2 PV AMI to HVM. Or maybe using oAuth
in any incarnation.

------
hogu
If you're ok with using conda (python pkg mgmt tool) you can do conda install
-c wakari pandoc, and it should just work

------
throw7
I had to go through similar machinations with getting the latest subversion on
rhel 5. I feel your pain.

------
Nux
So it looks like he needed CentOS 7 and installed v6.

"Oh, I installed years old distribution and it's years old! OMG!"

------
cirosantilli
The _real_ problem is that Pandoc is written in Haskell.

~~~
wyager
In what way is that a problem?

Haskell is certainly a reasonable choice for a markup transpiler.

~~~
cirosantilli
I agree, Haskell is a great language, but its ecosystem has much less support
than C. Luckily for us Atwood made jgm write CommonMark in C.

------
smegel
So you had to build a few deps, who hasn't had to do that?

I recall having to build a largish dependency tree to get a recent version of
Gtk2 running on Solaris10, from Glib2, atk, cairo all the way up.

That's life.

