Hacker News new | comments | ask | show | jobs | submit login
Tell HN: Groff needs your help
67 points by cogburnd02 on Feb 22, 2015 | hide | past | web | favorite | 65 comments
GNU Groff (which is probably doing the behind-the-scenes formatting work when you use man pages on GNU/Linux) currently has no maintainer, and the most recent stable release was in 2013. Who would like to step up and help?



OpenBSD replaced groff with mandoc[1] which is a much simpler program, once you known that it is just for manpages not a typesetting system.

[1] http://undeadly.org/cgi?action=article&sid=20110314142734


> not a typesetting system.

I like that groff is a typesetting system, it means that I can generate pretty postscript versions of manpages from the command-line and print them off, all in one go. It's really handy that way.


You can do that with Mandoc too, although the quality of the PDF output is not as nice. What makes Mandoc less of a typesetting system than Groff is that it supports fewer macro sets (mdoc, man, eqn, and tbl, the macro sets commonly used in manpages); Groff supports more (ms, me, mom…) because it’s a more general typesetting system.


I'm going to be frank: why? I've recently looked into making my own manpage and it's a pretty old looking system. The docs are not really clear but using some examples and trial and error I got there. My point is though, why does it need a maintainer? The system feels old enough to get deprecated instead of keeping it alive, let alone bring out new releases.

I haven't spent more than a day working with groff though, perhaps many people still use it (every website with info just looked about as old as your average man page), so I may be completely missing the point here. If so, tell me!


Replacing man pages because 'they look old' is a terribly stupid idea.

How about we replace them because there is something better? Or because they no longer fill a need?

Things that are old and work are not to be messed with. Frankly, if you don't get that, I don't want you, or anyone else who thinks like you, making decisions about any UNIX I might work with.

I lament the poor documentation on Linux, and OSX, and I lament the FSF's obsession with info pages. There's nothing wrong with man pages. Maybe we don't need groff to prepare them, but we need man pages.


> Replacing man pages because 'they look old' is a terribly stupid idea.

I fully agree, and it's not what I meant to say. What I meant is writing new pages in a newer language (insert random lightweight text based format, preferably one where the 'see also' part is linkable) and groff is simply deprecated and supported as well. Since there is currently support, why do we need a maintainer? That is my question.

> Things that are old and work are not to be messed with.

Agreed, so what's the maintainer going to do?

> Maybe we don't need groff to prepare them, but we need man pages.

Once again, I agree. Sorry if I sounded like man pages are unnecessary, that is not what I meant.


> What I meant is writing new pages in a newer language (insert random lightweight text based format, preferably one where the 'see also' part is linkable)

A little bit of manpage history:

Roff has been around since the beginning of Unix (in fact, the group at Bell Labs who developed Unix got funding by convincing managers they could come up with a good typesetting system). Roff supports a variety of macro sets; for a long time, the most common one for manpages was the “man” macros.

In the early 1990s, BSD came up with the “mdoc” macros, which are a significant improvement over the original “man” macros. mdoc is inherently semantic, and allows easy searching and conversion to other formats, including HTML. You can search for based based on function return type or argument type, program authors, include files and environment variables used, and many more. mdoc pages natively support hyperlinking, including links to other manpages, links within the same manpage, and external hyperlinks.

Mdoc is a very pleasant language, and since it’s used in roff you can combine it with other macro sets like tbl (for tables) and eqn (for mathematics). It supports UTF‐8, it can easily be converted to PDF and/or semantic HTML, and provides great searchability. It’s well‐documented and widely supported (mdoc pages are supported out of the box on any system using mandoc or groff for manpages, meaning Linux, OpenBSD, FreeBSD, NetBSD, Mac OS X, Illumos, Minix…).


Okay, I didn't know all that. Perhaps groff is a better language than I was aware of and there is sure something to say for keeping it available.

But is it really all used? I have never heard of searching man pages by e.g. return type (in section 2 or 3 I assume this would be), nor does hyperlinking work (perhaps due to the pager, but still). If only 1% of the people use it, then either it's up to them to maintain it or we just deprecate it in favor of a new system.

And by the way, the new system doesn't have to be only one language, it can be some generic language that other languages can "compile" to if you have the right packages (just like markdown can be parsed to the current man page language).


> But is it really all used? I have never heard of searching man pages by e.g. return type (in section 2 or 3 I assume this would be)

For example, here’s a search for “functions beginning with ‘str’ and with return type size_t”: http://www.openbsd.org/cgi-bin/man.cgi?query=Ft%3Dsize_t+-a+...

On OpenBSD you can do the same from a terminal:

$ apropos -s 3 Ft=size_t -a Nm~^str

As for hyperlinks, this of course depends on your output format. less(1) in a terminal doesn’t do hyperlinks. HTML output will, such as in this page: http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man8/... And a distribution could, for example, configure man(1) to trigger Lynx (or even Firefox) looking at Mandoc’s HTML output.

These toolchains are still being actively developed and improved (semantic search, for example, has only been around for a couple years despite the format theoretically supporting it since the beginning). I try to do my part by contributing manpages to projects that don’t have one, converting to mdoc macros when practical, and explaining the great featureset available. The best part is that it is so widely supported—at worst, mdoc falls back to the manpage infrastructure we have now; deployment of a new toolsuite is not a problem compared to converting to some brand new format. At best, it supports all these great new features, and it does so today!


> I have never heard of searching man pages by e.g. return type (in section 2 or 3 I assume this would be).

I'm not sure what you mean by return type, but if you want to, I think you can section your manpages however you want; i.e. you can have section 1 be 'games' instead of 6 if you want to, it just makes installing manpages more difficult.


> How about we replace them because there is something better? Or because they no longer fill a need?

I would be interested to know your opinions on GNU info/Texinfo, and the possibility/difficulty of rewriting all the existing manpages in Texinfo format.


Well... manpages are still very much alive and widely used, and they're so prevalent that moving to something new is a nearly ridiculous requirement. Half the things with manpages in the wild are probably themselves not maintained. Now you have two problems.

We can move to something new, but getting everyone to agree on what that is, then implementing and maintaining it... isn't that more work than just maintaining the working solution everyone already agrees on, and that solves the problem?


Why not format manpages as markdown?

As a consumer you wouldn't need to care about the format behind the scene and the format is already well known by programmers, there are plenty of libraries that solve can read markdown and plenty of people who know it.


markdown isn't semantic. roff macros allow one to clearly specify that function arguments (Fa) and command line flags (Fl) are different things, even if they render similarly.


Is this really needed in a manpage, though ? AFAIK manpages are ASCII document created with any variation of troff/nroff/groff and piped into less -s. There is no need for any semantical distinction if display is the same, since the best interaction you can have is searching for something.


These are manpages http://imgur.com/a/JZfTr


While I like the venerable look they have, those books aren't what you use daily to document yourself on how your machine works. I was speaking about those manpages we use when we don't know what the letters are for in the itemized output of rsync, for instance.


> While I like the venerable look they have, those books aren't what you use daily to document yourself on how your machine works.

But you can create pretty hard-copy versions of the manpages on your system with groff, the same way these books were created with troff (which groff is a replacement for.)


ASCII is not the only possible output format, and yes, semantic search is useful.


As others have noted, there is a semantic difference between Markdown and roff. But if you really want to write them in Markdown, you can with ronn: http://rtomayko.github.io/ronn/ Ronn also supports outputting to HTML.


I agree that might be a good choice for a more modern alternative, but it still doesn't solve the problem of existing manpages. You could cleverly upconvert on the fly of course, but that's basically the same thing as keeping groff around, except with the added friction of informing everyone that they should be using Markdown now, with new tooling and conventions. Don't underestimate the marketing costs of successfully convincing the diverse Linux ecosystems they should jump onboard.

I only see this working if some big entity slams the gavel and decrees this as the new manpage standard, but I also don't see any such entity having much incentive to do so.


But existing pages would be served with the already existing groff, right? Software doesn't rot, so the existing software will continue to work in the future.


Software doesn't rot but libc changes...


Well, for one time only, you can convert all (make a little converter) existing groff to markdown and then stop supporting groff at all.


This works fine if you're working on a cathedral (one of the BSDs or UNIXes, for example), not so much if you're trying to work with the bazaar that is the Linux software ecosystem. What's your plan? Submit a patch for every piece of software written in the past 15 years?


ikiwiki's source code contains a rudimentary mdwn2man program. It doesn't handle the majority of markdown, but it works for simple pages.


I agree with you. Like I replied to another post:

> > Replacing man pages because 'they look old' is a terribly stupid idea.

> I fully agree, and it's not what I meant to say. What I meant is writing new pages in a newer language (insert random lightweight text based format, preferably one where the 'see also' part is linkable) and groff is simply deprecated and supported as well. Since there is currently support, why do we need a maintainer? That is my question.


Perhaps you'll be interested in this discussion of markup format and tool chain alternatives to GNU info (in the context of Emacs):

http://lwn.net/Articles/625072/

(It got no interest on HN: https://news.ycombinator.com/item?id=8911625)


I think the obsession with "being maintained" is somewhat unhealthy phenomenon in the FOSS world. If the code works and does what you want then why would it need constant fiddling and consistent stream of releases?


The code may need to be adapted to evolving environments. Security bugs might be discovered. In this case I guess "being maintained" just means that someone will react to such events if they happen.


Couldn't programmers from major GNU/Linux distributions be signed up for this duty? They compile new OSes and if they want man pages (heck, who doesn't want man pages?) then it's also up to them to make it work in the new OS. Security bugs can be forwarded to those people by Gnu (assuming they support and host the project).

This may not work for everything, they're not going to maintain any old system, but things that we want to keep around but for which there are not necessarily dedicated maintainers to keep it up...


You can write a Groff-like text processor to the 1990 POSIX environment and it will work in a 2015 environment just fine.


I would go further and call this an obsession with using software that is literally changing by the day, in a constant state of flux.

There seems a desire among those users who comment in forums to see recent commits to the software they use as if infrequent commits or no commits in years suggests there is something wrong with the software.

To me, it is the constant changes and updates that give me pause when choosing software. I am actually more skeptical of large, complex software that requires constant updating.

I prefer software which can deliver reliable performance year after year without changes, e.g., daemontools. But hey, what do I know?

In any event, I am glad to see this comment. I wish more folks would call out this silly obsession.

As for groff, I think there is more to roff that just printing manpages. http://heirloom.sourceforge.net/doctools.html


> as if infrequent commits or no commits in years suggests there is something wrong with the software.

I would be reluctant to start using a package that wasn't actively developed. Libraries, APIs and even languages evolve, and I don't want to be stuck relying on old versions of other things because of one package that isn't being updated. You might say that all the other pieces should take version compatibility much more seriously, to avoid this problem, but that's not the world we're in.

More importantly, since all non trivial software has bugs, if there have been no commits for years, either no-one's using it and finding the bugs, or the maintainer isn't merging people's patches. Both options sound bad for me as a user.

I understand that a program could theoretically be 'done', not requiring any further changes, but I think there's hardly any software that could conceivably be so 'done' it wouldn't have any commits in months or years. I've just looked up a few of the most stable major projects I can think of: the Linux kernel, Apache and Subversion all have multiple commits within the last day. GNU coreutils only has five in the last week.


it doesn't look abandoned: http://git.savannah.gnu.org/cgit/groff.git


The Groff Mission Statement looks interesting.

http://www.gnu.org/software/groff/groff-mission-statement.ht...


The question is - do we really need the typesetting part? I had a troff phase (bought some books) ten years ago. But the community of users is small, its capabilities are eclipsed by TeX, and the language is quite arcane.

One could even go a step further and argue that writing man pages would be much easier (and presumably more man pages would be written) if the markup language was switched to e.g. Markdown by default. For my own software, I am already using Pandoc to generate nroff input from Markdown.


> do we really need the typesetting part?

I think so; producing high-quality printed manpages is useful, at least to me.

> the community of users is small

So tell your friends to use groff! :-D

> its capabilities are eclipsed by TeX

In what way? Yes, TeX can do complex mathematics better, but most documents that people want to publish don't terribly need complex math that much.

> and the language is quite arcane.

What specifically is 'arcane' about it, versus something like TeX or markdown?

> One could...argue that writing man pages would be much easier...if the markup language was switched to e.g. Markdown by default.

But then someone would have to rewrite the existing corpus of manpages, and as others have pointed out, using markdown loses some semantics.

I think keeping that functionality in pandoc is the right way to go.


I am the maintainer of a man page that turns into a 260+ page PDF document in letter size.

I haven't hacked on groff, but I recently I did a whole bunch of work on the man2html program from the man tools. That code is extremely hacky, like you wouldn't believe!

http://www.kylheku.com/cgit/man/

I have it so that a man page can detect whether it's being compiled by groff or by man2html and re-target some of its macros.

That aforementioned large man page is here; the macros are upfront: http://www.kylheku.com/cgit/txr/tree/txr.1

HTML and PDF here: http://sourceforge.net/projects/txr/files/txr-104/

(The index and hyperlinks in the HTML are due to a post-processing pass, implemented in the "genman.txr" script.)


Mandoc, which is what the BSDs and some other systems use for formatting manpages, has very good HTML output. In fact, the program used to be named “mdocml” because it was written to be a mdoc‐to‐HTML converter.

Unfortunately the -man macros (as opposed to the modern -mdoc macros) aren’t that good for conversion to other formats like HTML, because they’re by nature presentation‐focused. All major troff implementations support the -mdoc macros, though, and -mdoc is much better suited. It’s what I write all my manpages in these days (and it’s a drop‐in replacement—replace foo.1 written in -man with foo.1 written in -mdoc and groff, man, etc will handle it instantly). I also like to convert manpages from -man to -mdoc, or write new pages for programs that don’t have one. It gets a little exhausting to convert long pages like the one you linked, though.

Here’s some documentation on the format of -mdoc pages: http://mdocml.bsd.lv/man/mdoc.7.html


I wonder how good is mandoc's handling of the troff language as such.

The long page that I wrote, though it is based on the old -man macros, is actually to a large extent based on its own macros which are retargettable.

As I started to polish the document for better PDF output, I needed to reach into more of the power of groff, while maintaining compatibility with man2html. That's when I started hacking on man2html to handle more of the troff language. I found that loops didn't work very well and there were issues with nested if/else and such.


Keep in mind, groff can itself generate HTML. I think it may need some pre- and post-processing to do it, though too, because (I think) what are links in the HTML version end up being footnotes in the pdf/ps version. Could be wrong about that, though.


The HTML from groff's HTML back-end is pretty much useless garbage.


Well, get into that code, and improve it! :-)


No thanks; that is broken by design. HTML is treated as a typesetter device, more or less, when what is needed is a semantic translation of the high level document structure.

As an analogy to another software system, you wouldn't want to generate HTML from LaTeX by processing the DVI file.

What is needed is the high level macros of a specific package being recognized and translated to HTML at a high level.

However hacky, the man2html program does that (and I made it work better: it has better support for handling more sophisticated macros, and is less buggy. I likely won't invest any more time into it, however, and I'm not going anywhere near groff).


Does it currently have any important bugs?

Because the troff format was designed with the limitations on early 70's computers in mind and seem to be used exclusively for man pages. Why not upgrade to something that better correspond to how we actually code now (ie markdown)?


If nobody is maintaining it, how can you tell it doesn't currently have any important bugs?

Remember that the worst bugs in the last couple of years have been in the code that nobody was looking at.


Some people must have reported bugs since 2013. If not, then either it's unused or bug free?


For one example: https://bugs.debian.org/cgi-bin/pkgreport.cgi?dist=unstable;...

I haven't compared this with upstream though.


Does markdown have macros equal in power to those of troff?

"The great strength of troff is the flexibility of the basic language and its programmability -- it can be made to do almost any formatting task." -- The UNIX Programming Environment, Brian W. Kernighan and Rob Pike.


GNU Groff is terrible and the only reason it hasn't been replaced with a superior implementation is that all the better versions are politically incompatible with GNU. It's no surprise they're having a hard time finding someone interested in effectively code-laundering and slapping a GPL on it.


Really? What are these better versions?


There are a couple of other troff implementations in use today.

Mandoc is very new and focuses on providing a complete solution for a system that uses manpages: it renders manpages in the terminal or to HTML on the fly, provides a database for semantic search of manuals, and is probably the second most common manpage renderer in common use (after Groff). It’s the default renderer on OpenBSD, FreeBSD, DragonFly BSD, NetBSD, Illumos, and Minix. Unlike the others, it’s not a complete troff implementation; it focuses on manpages (“mdoc”, “man”, “eqn”, and “tbl” macro sets).

Heirloom troff is descended from Sun’s troff, and focuses on nice typesetting supporting various PDF and OpenType features.

Plan 9 troff is, well, Plan 9’s troff, and is used by plan9port to render its own manpages.

There are a few other troff implementations (Neatroff, etc.), but these are the most widely distributed ones these days. I personally use Mandoc for everything except PDF output, for which I use Groff.


I've looked at the kind of PDFs heirloom troff can make, on an iPad, and my eyes, just, wow. There aren't words. If Debian would put heirloom troff in its base-install (along with, say, E.B Garamond or Junicode) instead of groff, there'd probably be more people (1) making beautiful man pages, and (2) viewing man pages graphically (with evince or another PDF viewer) than ever before. It's not like there's a licensing problem, AFAICT.

I have not seen a PDF made by mandoc that made me react that way. :-|


Yes, PDF output is one place where mandoc is not as good as groff/heirloom yet. (The other major one being support for generic preprocessors and other macro sets.) Mandoc works very well for terminal output, HTML output, and semantic searching though.


In which the fundamental problem with not paying for your open source software is revealed.


Paying for software is no assurance that the software will be maintained.


Exactly. If groff were closed-source, the last stable build would be long before 2013.


This is a fundamental problem with open source software.


I'd limit it to FOSS that people don't pay for (directly or indirectly).

When money is changing hands with the expectation that software is maintained in good working order, you won't see abandonment like this. But when nobody is paying or getting paid, maintenance stops, regardless of whether the software is a key component of crypto on the internet or a fundamental part of the documentation of most Linux installations.


oh you mean things like windows XP which people have paid for? I rest my case.


You're talking about the same Windows XP which people are currently paying for and receiving support for?

http://www.theguardian.com/technology/2014/apr/07/uk-governm...


operative word 'have'.


But preferable to what happens to abandoned closed source software.


This is a fundamental problem with software.

The fact that someone might possibly be able to do something about it is a fundamental feature of open source software.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: