Hacker News new | past | comments | ask | show | jobs | submit login
moreutils (joeyh.name)
154 points by JoshTriplett on July 2, 2016 | hide | past | favorite | 56 comments

This is a pretty good collection of small utilities but I would recommend using GNU Parallel[0] instead of the included parallel. It contains a ton of extremely useful additional features such as distributing jobs to remote computers.

[0]: https://www.gnu.org/software/parallel/

GNU Parallel will also infloop if any process on the system (run by any user) has an empty argv[0] (which can actually happen; parallel scans /proc to find instances of itself). When I reported the bug, the author refused to fix it on the grounds that this behavior is a "malware detector". Between this weirdness and the citation-interface weirdness, I'm not keen on GNU Parallel.

Lolwut? It responds to "malware" (empty argv) by infinite looping? That's a feature?

One big turnoff with GNU parallel is that the first time it is run on a machine, you must interactively accept a EULA. There are ways around it, but as a rule of thumb I avoid command-line tools that do this, because this inconsistent runtime behavior is annoying at best and can easily break and complicate automation.

And no, I don't want to set an environment variable on a fleet of machines to suppress behavior which shouldn't be enabled by default to begin with! <cough> .NET telemetry </cough>

A real shame, because it is an otherwise useful tool IMHO.

What it actually does is to beg you to cite parallel if you use it in academic work. There's two ways to silence it:

• Run `touch ~/.parallel/will-cite`.

• Pass the flag `--will-cite`.

Still annoying, although it's easy enough to disable.

Fwiw the Debian & Ubuntu packages have disabled this behavior.

jobflow[1] is what I'd be suggesting unless there was a very specific reason to use something massive like GNU Parallel. It is used as part of Sabotage Linux's package manager.

[1]: https://github.com/rofl0r/jobflow

Suprised that nobody mentioned 'xargs -I{} -P N' yet. It is a GNU extension, but quite handy as it comes preinstalled pretty much on every Linux machine.

I propose a new project: lessutils. You post about some program that you use for some simple task that's not among the "standard UNIX utils". (ed, sed, AWK, lex, etc.) Then we show you how to do the same task with only the standard utils. (i.e. no install needed) -- inevitably someone shows us how to do it in AWK and makes all feel stupid. :) If we are successful, you get to eliminate one more dependency from your system, not to mention reducing attack surface.

the suckless people are doing this wih the coreutils, titled sbase. they adhere to POSIX and nothing else, really


Here's one I use: jq for JSON parsing

Yes. letsencrypt.sh is a nice case study, spending a lot of effort to parse json with essentially bash & grep. It works now but is full of assumptions ("this array of hashes will not contain [ anywhere"). There is nothing in POSIX that's well suited to manipulating json in the way that jq is.

May I suggest an "Ask HN." Show your .json input and the desired output.

not to mention reducing attack surface.

Unless the programs are setuid, is that a real problem? I mean, anyone who can call one of these utilities with some arguments can also call "sh -c ...", no?

Every attack surface is a problem. More code means more places bugs can happen and more risk, even if the risk is low.

Your parachute may be very reliable but if you keep jumping off planes you'll eventually die.

My point is that if you have a meter-wide hole in your parachute (sh and other interpreters you can call to execute any code you want), also having small rips in the fabric probably doesn't really matter - if you have to rely on it, you're screwed anyway.

As long as you know you have a meter wide hole you will care enough to guard it. Thousands of smaller holes will eventually go forgotten.

"nobody thought to write long ago"

Quite a few moreutils involve unbounded buffering of input, I wonder if they were left out of Unix originally due to memory limits.

I'm a fan of the unix tools philosophy, but I sometimes wonder if there's much room for new tools to be added to that toolbox. I've always wanted to come up with my own general-purpose new unix tool.

This was something of the inspiration for wf as well. https://github.com/jarcane/wf

Just yesterday, I asked on lobste.rs [1] how I could take my small utility to compute the minimum, maximum, and expected value of a dice roll expressed in D&D notation and make it a better Unix citizen. The commenters were helpful, and with few changes I was able to make the program usable in a pipeline, even if I don't really expect that's going to be a use case for me.

As an aside, my utility is also written in Rust; nice to see more of these small programs written in Rust.

[1] https://lobste.rs/s/fdzyio/how_should_i_make_this_small_prog...

Note that “things like this” can sometimes be done just using sed:

  % sed -i -e "s/root/toor/" -e "/joey/d" /etc/passwd

That's platform dependent. BSD sed doesn't support -i

I don't think that's true anymore, though it was historically. The sed currently shipped with both FreeBSD and OpenBSD supports -i, at least.

You can use Perl in sed mode via "perl -pe" as a more portable replacement for sed.

what should this do?

Once you realize that awk, while useful, is really just a terrible programming language with terrible syntax that you have to write in a string, with random unreadable environment variables to try to hack in useful functionality, then you are finally free.

Are there any modern alternatives out there?

I had to go and write one: http://tkatchev.bitbucket.org/tab/

Hey, that's nice.

Yes. It's ruby. E.g.

  ls -l | awk '{ print $9 }'

  ls -l | ruby -nae 'puts $F[8]'

Full-blown scripting languages are a poor fit for livecoding. (Which is really what shell commands are for.)

awk is a full blown scripting language too. It has a versatile implied top-level control structure + code golf qualities (which is a plus for livecoding). But it's also crazy that we switch a language when its top-level structure is a good fit.

> sponge: soak up standard input and write to a file

How does this differ than just redirecting to a file in the shell?

If you do something like:

  echo "hello world" >> dummyfile
  cat dummyfile | sed 's/hello/goodbye/' > dummyfile
dummyfile will be blank afterwards

if you do:

  cat dummyfile | sed 's/hello/goodbye/' | sponge dummyfile
It'll work as you'd expect (or at least as I'd expect). Otherwise `> dummyfile` will truncate dummyfile before it's read and inputed into sed.

For the sed example, what you actually want is -i which modifies the given file.


sed -i 's/hello/goodby/' dummyfile

That's only in gnu coreutils. BSD coreutils don't support -i

Ah, yes. I like it. I've definitely made the exact mistake you point out. Thanks!

I actually think I learned about sponge after I burned myself with this same mistake.

Isn't that what `tee` does, minus the "also displaying on STDOUT" bit?

tee doesn't store all the input before writing as far as I know, so it wouldn't work either in this case.

Is this true? I though the issue was that the write might happen while the file is still being read?

Does the '>' operator blanks the file first, before the read happens?

> erases the file first and >> appends without erasing

This isn't the issue.

sponge only writes the file once all input has been read, and - if the file already exists - tries to do the operation atomically by writing to a temporary file and renaming it over the old file.

jq would be a nice addition to that list.

Are there any tools in moreutils that aren't written in portable C? i agree. Jq is a must.

I doubt joeyh wants to add tools to moreutils that already exist as separate projects of their own.

jq looks like a poor cousin when compared to App::RecordStream, so no, it's not a must.

I can vouch for Vipe, I have found it immensely useful at times when throwing awk at my input turns out to not be as straightforward as I had hoped.

Good utility, but I would reccomend using the GNU ones as mentioned by another user although it will infloop the proccessess, you can use command-line tools to go around this.

lckdo looks a bit similar to flock(1), though knowing Joey from reputation, I'm pretty sure he had good reasons for including lckdo.

From the lckdo manual: "Now that util-linux contains a similar command named flock, lckdo is deprecated, and will be removed from some future version of moreutils."

vidir and vipe are two of my favorite commands from moreutils.

Also check out combine. It allows you to combine lines from two files using boolean operations:

  combine file1 [OPERATION] file2
Where [OPERATION] is one of [and, not, or, xor]. It allows you to quickly pull out interesting data from your files (the input files don't need to be sorted).

ts is very useful for timestamping unbuffered output to do simple profiling on production systems.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact