Hacker News new | comments | ask | show | jobs | submit login
Libc on macOS invokes Perl as a subprocess for string processing (2017) (twitter.com)
98 points by DyslexicAtheist 31 days ago | hide | past | web | favorite | 54 comments

Well, FreeBSD does it by calling a sh builtin:

  		 * We are the child; make /bin/sh expand `words'.
  &oldsigblock, NULL);
  		if ((pdes[1] != STDOUT_FILENO ?
  		    _dup2(pdes[1], STDOUT_FILENO) :
  		    _fcntl(pdes[1], F_SETFD, 0)) < 0)
  		if (_fcntl(pdesw[0], F_SETFD, 0) < 0)
  		execl(_PATH_BSHELL, "sh", flags & WRDE_UNDEF ? "-u" : "+u",
  		    "-c", "IFS=$1;eval \"$2\";"
  		    "freebsd_wordexp -f \"$3\" ${4:+\"$4\"}",
  		    ifs != NULL ? ifs : " \t\n",
  		    flags & WRDE_SHOWERR ? "" : "exec 2>/dev/null",
  		    flags & WRDE_NOCMD ? "-p" : "",
  		    (char *)NULL);

Seems Linux version also depends on forking to the shell:


For the sake of completeness musl libc also does this.


I guess the lesson is one might not want to implement word expansion in a shell using libc... [ed: if one has ambitions to function as /bin/sh...]

But why? Looking at the wordexp synopsis it doesn't seem like a particularly advanced function. Why don't the various libc's just implement this function directly in C?

Because it's literally meant to perform shell style substitution, with all the myriad of expansions that the shell supports. Similarly to system(), which also passes its input to the shell, it's never safe to use with untrusted input. (The man-page says so.)

The macOS libc is the FreeBSD libc. FreeBSD is upstream. FreeBSD changed it recently to use /bin/sh instead of /usr/bin/perl, because it's a bit smaller and faster.

funny that glob doesn't do that http://xr.anadoxin.org/source/xref/macos-10.14-mojave/Libc-1... however wordexp does.

Glob is easier to code than wordexp. Much fewer rules.

wordexp has to perform command substitution $(command args ...) as well as arithmetic expansion $((a + b * c)).

Slightly OT, you know how people hated on C++ ranges for readability? Honestly this messy soup of C is no better.

What in particular upsets you about it? It's pretty readable to me, and unlike C++ ranges, is actually fairly terse.

Cryptic variable names, weird indentation, ternary operator, string escaping, statements without braces.

> weird indentation

This is regular BSD style, used in lots of places. Tabs for indentation, 4 spaces for wrapped lines.

> ternary operator

Totally fine here IMO. Would get really verbose without.

> without braces

Again, regular BSD style and a matter of taste.

Yeah, but I think they have a point about the variable names. Really hard to tell the significance of anything there.

Looks transparent to me. pdes must be "pipe descriptors". ifs probably refers to the IFS concept in the shell; it may have come from getenv("IFS"). wordexp is described as influenced by IFS, and this function may have to look at that variable even though it's calling an external program to pass it different arguments (which is a bit weird at first glance: the program seems dedicated to the purpose; why can't it handle that aspect internally). flags is probably coming from the wordexp flags argument; it's being tested using the documented bitmask constants in the API.

Basically the coder here didn't just make up identifiers; they have connections to pervasive Unix concepts.

How would you do this in C++ without string escaping, or extra memory allocation for string processing? Double quotes have to be passed embedded in the arguments passed to the shell.

One way (in C or C++) would be to define constants:

    #define DQ "\""
Then instead of "\"blah\" we have DQ "blah" DQ.

It's valid C++, and you will see its ilk in C++ code bases.

Seems it's very old code. New Libc doesn't do this. This is Libc from Mojave:


The 'perl' code was a part of Libc v825.24, which seems to be included between 10.7 (Lion) and 10.8 (Mountain Lion).

Of course I still find it hilarious that even the old code did that!

The current version replaces the perl subprocess with an sh subprocess. Doesn't seem like much of an improvement.

Well, wordexp's purpose is literally to "perform shell-style word expansions", as quoted from the man page. It even supports command substitution if you don't pass WRDE_CMDSUB.

So really, the entire premise of that POSIX function is horrible[1]. Just like system(), which also explicitly executes the given command line using the shell. These functions are not safe to use with untrusted input (e.g. remotely), ever.

EDIT: [1] But arguably only as horrible as calling out to the shell is in general. If you e.g. use it as part of a shell utility that assumes full POSIX-permissioned access to your user anyway, it's not unreasonable because there isn't any privilege escalation at all. Though I'd argue that in the case of system() it's probably more clear to the developer that a shell callout is happening. And also, that the "shell-style" expansion performed here is kinda muddily defined.

OTOH the standard itself mentions implementing the function using shell in a subprocess [1]. POSIX is full of nastiness such as this, hopefully not too many maintained & used software actually needs wordexp.

[1] https://pubs.opengroup.org/onlinepubs/9699919799/functions/w...

Discussed in 2015 (the code was out of date even then): https://news.ycombinator.com/item?id=9025572

That's obviously wrong.

It should use Emacs instead.

I’m actually surprised there isn’t a library called libemacs. It would fulfill the mythos and be really useful for a lot of tools.

Emacs doesn't do anything in C other than running lisp, emacs really is just a lisp implementation with GUI and text editing focus. libemacs would be just like lua or cpython.

Emacs has over a quarter million lines of C, which must be there for a reason.

But then companies could use it to assault rms’s Freedom ;)

All it would (need to) be is just a ELisp interpreter.

Guile's VM already supports this.

Turns out this is actually documented in the manpage for wordexp()! (And refers to the mentioned fact that it now calls 'sh' directly.)


> Do not pass untrusted user data to wordexp(), regardless of whether the WRDE_NOCMD flag is set. The wordexp() function attempts to detect input that would cause commands to be executed before passing it to the shell but it does not use the same parser so it may be fooled.

> The current wordexp() implementation does not recognize multibyte characters, since the shell (which it invokes to perform expansions) does not.

Perl doesn't depend on libc?

It does, but that's "not a problem" as long as it doesn't use this function to implement something that's executed during this function.

Shall somebody send a pull request [0]

    -    /* XXX this is _not_ designed to be fast */
    +    /* XXX this is _not_ designed to be safe */
[0] https://github.com/Apple-FOSS-Mirror/Libc/blob/2ca2ae7464771...

/* This function computes the expansion rate of spacetime. This version contains an additional factor that causes it to accelerate to allow for rapid testing of other aspects of the physics engine. This MUST be removed before the production release or spacetime will accelerate forever and experience heat death instead of reaching steady state. -God */

Isn't the line bellow that line enough? /* wordexp is also rife with security "challenges",

My assumption was that those security "challenges" were related to expansion (wordexp()) per se; not to the way wordexp() was implemented in this particular case.

Now I see why you can't change /usr/bin on macos. actually there is both perl5 and python2.7 in /usr/bin, libc does have a choice (that is if the tweet is true)...


You can’t change use/bin because that’s a common malware attack vector.

It also has the nice effect of forcing user installed utilities to install in the /local/ variants (which user build projects should be doing on Linux iirc), so an OS update doesn’t overwrite user data.

Also if it bothers you, you can just turn it off. It's just difficult enough people can't simply put some screenshots on a webpage to work around it because they are too lazy to code properly.

I find it hard to believe that there's any software out there that doesn't, eventually, invoke Perl as a subprocess .. I mean, its Perl.

That's essentially what the OG tweet is saying: Pinnacle of software development: you can solve the problem with three lines of Perl, but you don’t, because of a non-argument against Perl. Since there didn't used to be that many arguments against perl/Perl it worked its way into a lot of systems even if it wasn't actually implementing the system.

Of course Perl, having fallen out of vogue, probably wouldn't be used today but it used to be everywhere so its footprint is still pretty large.

Also - I can't help but see the irony in shelling out to perl given experienced Perl developers always tell the less experienced ones to avoid shelling out from Perl if possible and to only do that as a last resort if there isn't an existing library to solve the problem.

> experienced Perl developers always tell the less experienced ones to avoid shelling out from Perl if possible

The reason for that is obvious, right? By induction, shelling out from Perl would only result in the called process shelling back to Perl. So it's much better to just call that Perl code directly.

Only loud-mouthed inexperienced middle managers tell their juniors to follow NIH and do everything in pure perl. It's pure fear to be broken by changed dependencies.

More experienced managers tell their devs to shell out to standard tools like sh, wget, curl, dig, mysqlclient and not use the builtin pure-perl libs. The tools are much better, the code is 10x smaller and faster, and you are getting updates for free (e.g. ssl). Even in C I very often call system("wget http://...") and avoid libcurl.

Only loud-mouthed inexperienced middle managers tell their juniors to follow NIH and do everything in pure perl. It's pure fear to be broken by changed dependencies.

It's not NIH if you're advocating using CPAN modules or existing libraries.

Sure, use the right tool - but don't advocate to juniors a method that can lead to OS command injection because they're not experienced enough to know to, or how to, sanitise their inputs.

Not exactly related to the link but apparently the author of that tweet blocked me. I don’t recall having ever had any interactions with them. Is there a way to contact that person and figure out why? I have no idea what I did and I’m quite puzzled.

There is a good chance they subscribe to a blocklist, so you could be blocked by anyone of a thousand people. Image the old PGP web of trust, but for crafting perfect echo chambers. I wonder if anybody has ever done the math on that.

You find out someone has blocked you, and your instinct is to communicate with them? The specific thing they have explicitly disallowed? I'd reconsider this, and just move on.

Is that how you are supposed to deal with that? How is one supposed to improve as a human if one does not know where their mistakes are? If I did something wrong I would like to know.

I did not contact that person anyways and I doubt there is a way. But at least knowing when I was blocked could help me deduce what might be the reason.

Honestly I do find this quite upsetting.

> Is that how you are supposed to deal with that?

I believe the commonly accepted answer is "Yes, it's not my job to give you the necessary information to improve yourself". The same people who say that also often ask "Why aren't the people around me improving themselves?" :P

(Personally, I would prefer it if people pointed out my mistakes, and I do the same for others as a courtesy, not an obligation. I do understand if they don't have the energy to do that, but I think if they don't, then they forfeit the right to complain about a lack of improvement)

> I believe the commonly accepted answer is "Yes, it's not my job to give you the necessary information to improve yourself". The same people who say that also often ask "Why aren't the people around me improving themselves?"

But that’s in no way comparable. When in the real world i screw up I can tell from people’s responses. Body language, their actions etc. In this case it’s just discovering at one point someone blocked one without any indication of when that happened and why without any indication.

//edit: also even weirder in an effort to see where our interactions might have been i found a tweet from 2014 by myself about the same topic: https://mobile.twitter.com/mitsuhiko/status/5264923088676700...

It could just be irrational intolerance on the part of the other person.

Not good looking at all but, latest commit to that repo is Updated on Oct 11, 2012.

So how well does it reflect reality?

The official sources are published by Apple on https://opensource.apple.com

This repository is just a snapshot that somebody else prepared and uploaded to GitHub, but apparently it is not maintained.

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact