Hacker News new | past | comments | ask | show | jobs | submit login
The history and use of /etc./glob in early Unixes (utcc.utoronto.ca)
52 points by zdw 11 hours ago | hide | past | favorite | 25 comments





The linked C source file is an excellent example of ancient C, when it was still more closer to high level assembly:

https://www.tuhs.org/cgi-bin/utree.pl?file=V2/cmd/glob.c


User defined functions were implemented similarly as external execs in early shells. As the script was parsed, functions were dropped into /tmp without their wrappings and then called as external programs. Since they would still reference parameters as $1, $2 etc, it just worked: function bodies and standalone sh scripts had the same interface! Such a clever idea to avoid managing an interpreted call stack in the parent.

Recent versions of Bash don't expand the * (et cetera) patterns when there is no match, which although sometimes useful, I still feel it's a hack.

That's been around since the original Bourne shell; /etc/glob, from what I can see from its source, would refuse to run the command if the resulting expansion turned out completely empty; and the globs with no matches would be simply removed.

> PS: I don't know why expanding shell wildcards used a separate program in V6 and earlier, but part of it may have been to keep the shell smaller and more minimal so that it required less memory.

See, I thought it was a nice separation of concerns and wondered why we lost such a nice approach, until I read:

> How escaping wildcards works in the V5 and V6 shell is that all characters in commands and arguments are restricted to being seven-bit ASCII. The shell and /etc/glob both use the 8th bit to mark quoted characters, which means that such quoted characters don't match their unquoted versions and won't be seen as wildcards by either the shell

at which point I suddenly became a fan of ditching it. I do wonder if there's not some better way to factor that functionality out...


Important thing to remember is that even after the move to PDP-11, early Unix systems had to deal with 32kB as entire space available to userland program, both code and data (including stack)

You mean 32k words, not 32k bytes, right[1]? And AFAIK by V5 or V6, Unix could use split instruction and data if the MMU supported it giving a bit more headroom. But, yeah, memory was very tight, and a lot of very clever tricks were used to get around it.

[1] Even worse, the top 4kW/8kB was reserved for I/O.


Why would I want to factor out some syntactic functionality of one specific (and not very well thought out) shell to reuse, again?

But if you really insist, you can write your own glob(1) that would invoke glob(3) for you, sure. There is also wordexp(3) although I believe its implementations had security problems for quite some time?


> at which point I suddenly became a fan of ditching it. I do wonder if there's not some better way to factor that functionality out...

Just use backslash escaping like we do practically everywhere else in the Unix world?


That's kind of cure worse than disease. Just ditch escaping completely.

The way Murex works is each parameter is first compiled into an AST, and then globing only works against the unquoted tokens.

Globbing is also a separate built in, which allows for other types of wildcard matches like regex too. Eg https://murex.rocks/tour.html#filesystem-wildcards-globbing

So you have have the best of both worlds: inline globbing for convenience and also wildcard matching as a function too.


Why is there a period after etc in the title? Another example of HN's stupid automated title editing?

Probably the submitter typed it on a phone instead of copy-paste and "etc" got autoincorrected.

Sweet.

I use xterm.js a lot and have a "shell backbone" that I use to make shell based access to APIs, S3 and other things "cloud." This is essentially how I implement globbing as well. The convenience is that you can run glob by itself to get an idea of exactly what kind of automated nightmare you are about to kick off.

Anyways.. mine currently has V3 behavior. My shell command exec routine could actually benefit from that hack. What's old is new again?


binaries in /etc/ -- i mean __really__

Fun fact: the linux kernel itself actually also looks for `/etc/init` before it even looks for `/bin/init`

https://github.com/torvalds/linux/blob/4a5df37964673effcd9f8...


Even now you'll come across this, for example "/etc/rmt" probably exists, and other tape-related binaries if installed.

Yes, really. That's what /etc was for.

[flagged]


This reads like slop for some reasons; even to my non-native brain.

I wonder what the point of these accounts is - they show up in almost every post now. If the goal is farming karma, they aren't doing a very good job.


This is php.ini level of madness, and I'm glad it's gone from (semi-)modern shells. A formal (e.g. programming) language should be defined in its entirety by its formal grammar, its semantics by a formal spec, etc. There's barely any good reason to let the system administrator change the logic and semantics of deployed code.

You could argue that Lisp reader macros also somewhat violate this rule. As a longtime Lisp fan, I dislike reader macros, but I'm more conflicted about macros in general. A good macro system should aim to provide enough context for IDEs and LSPs to aid the developer, but Lisp macros are entirely about just transforming the AST. It's usually just better to evolve the language.


It's not there to give the system administrator flexibility. It's there because early Unix was heavily constrained, and doing thing with lots of little overlays (and what was decades later known as "Bernstein chaining") rather than 1 big program was the way to architect stuff. exit(1), goto(1), and if(1) were all external commands in the Thompson shell.

* https://v6sh.org


I would argue with almost anyone else, that this is a poor design, but...

Thank you for your perspective, work, and contributions.


The other thing to bear in mind is that it’s undergone literally decades of evolution while still being backwards compatible.

The shells weren’t originally intended to be Turing complete. They were just a job launcher. What you use today would have been unimaginable when these shells were first designed.

Whereas all other programming languages have had a drastically smaller evolution in comparison and yet still had a worse compatibility story.

It’s very easy to be critical of the Bourne shell (and compatible shells too) because they are archaic by modern standards. But they weren’t written to solve modern problems. So it’s like looking at a bicycle and complaining how the designers didn’t design a sports car while ignoring the fact that technology didn’t exist and still push bikes are good enough for millions to use daily.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: