Hacker News new | past | comments | ask | show | jobs | submit login
Relational shell programming (might.net)
162 points by g3orge on Jan 12, 2012 | hide | past | favorite | 24 comments



There was an entire system called "rdb" written around this philosophy of flat-files-as-relations:

http://en.scientificcommons.org/43495509

I'd be curious if anyone actually used it -- is it obscure because it is a bad idea or because it didn't offer much more over the shell, etc.?

"There are many database systems available for UNIX. But almost all are software prisons that you must get into and leave the power of UNIX behind. Most were developed on operating systems other than UNIX. Consequently their developers had very few software features to build upon, and wrote the functionality they needed directly, without regard for the features provided by the operating system. The resulting database systems are large, complex programs which degrade total system performance, especially when they are run in a multi-user environment. UNIX provides hundreds of programs that can be piped together to easily perform almost any function imaginable. Nothing comes close to providing the functions that come standard with UNIX. Programs and philosophies carried over from other systems put walls between the user and UNIX, and the power of UNIX is thrown away. The shell, extended with a few relational operators, is the fourth generation language most appropriate to the UNIX environment. 1. Fourth Generation Systems In recent years, a variety of developments in programming language design have emerged. Object-oriented languages are becoming common, and languages explicitly supporting multiple tasks and inter-task communication are also gaining popularity. "


You beat me to posting this.

Also of interest: NoSQL - a non-SQL RDBMS, seems to have been inspired by rdb and implemented using awk (http://www.strozzi.it/cgi-bin/CSA/tw7/I/en_US/nosql/Home%20P...):

NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model".


Interesting, didn't know about that one. It looks more modern and usable.

I'm trying to think if I would actually download it and try to use it. Or I could just keep on making use of that all the hard-earned sed/awk magic line noise. :)


If you're a student in mattmight's class, you shouldn't read this.

N hfrshy gbby gung negvpyr zvffrf vf gung frg qvssrerapr pna or vzcyrzragrq (zhpu zber rssvpvragyl guna uvf dhnqengvp nytbevguz), hfvat naq .

Vs lbh unir gjb svyrf N naq O juvpu ner frgf (v.r., ab vagreany qhcyvpngrf), N - O pna or rkcerffrq ol "fbeg N O O | havd -h".

Bs pbhefr, vs lbh TAH pberhgvyf ba lbhe flfgrz (naq vs lbh'er ehaavat Yvahk, lbh nyzbfg pregnvayl qb), vg'f rira rnfvre naq zber rssvpvrag: "pbzz -23 N O".


That's the exercise at the bottom of the page. ;)

I hope my students don't see this comment.

Edit: Thanks for "encrypting" your post. Decrypting it in one tweet's worth of sed will be next Tuesday's homework.


Really? Sed? Me thinks they should TRy harder ;-)


Matt, you should know better. You've scared everyone who doesn't read HN out of the class.


>> Of course, if you GNU coreutils on your system (and if you're running Linux, you almost certainly do), it's even easier and more efficient: "comm -23 A B".

Maybe I'm misunderstanding your point, but I don't think GNU's coreutils is the only way to get comm[1].

Nice update to the original post, by the way. :)

[1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/co...


No, you understood my comment; I assumed since the manpage I was looking at made no reference to a standard, that it was nonstandard. It seems you're right, and comm(1) is more widely available, excellent.


The author misses the key command for performing set operations:

    comm
This can perform set differences, unions, and even symmetric difference.

    comm -23 a b   # a minus b
    comm -12 a b   # a and b
    comm -3  a b   # a symmetric difference b (with some extra tabs)


Thank you! I couldn't remember what this command was called and I was scanning that post trying to see if he'd mention it.


A more idiomatic way to phrase the awk in the Selection example:

    awk -F ":" '{ if ( $3 != $4 ) print }' /etc/passwd
would be like this:

    awk -F ":" '$3 != $4 { print }' /etc/passwd
The first is obviously valid awk, but the second highlights awk's unique "pattern { action }" syntax.


I would argue the following below is a little nicer:

    awk -F':' '$3 != $4' /etc/passwd


  `pattern action`
also idiomatic sed


join is also worth mentioning: as expected, performs relational join of supplied input, equivalent to SQL's STRAIGHT / LEFT / RIGHT JOIN.

http://man.cat-v.org/plan_9/1/join


I just posted an update. Thanks for pointing this out!


It looks you changed the URL of your blog post too?

I got two different RSS items for the post and the update, and the original post's location (/articles/relationa-algebra-in-bash/) no longer resolves. Was that intended?


Also don't forget about comm(1), which can do an equijoin and set difference on whole line records.


some one-liners that are often useful:

    cat a b | sort | uniq > c      # c = a union b
    cat a b | sort | uniq -d > c   # c = a intersect b
    cat a b b | sort | uniq -u > c # c = set difference a - b


  Sed and awk have become a lost art. They're the only
  languages I know that frequently beat perl in semantic
  density. If you haven't learned them yet, you'll be
  impressed with what they can do for you.
I recently learned that sed had labels, and that you could repurpose its inner spaces as state. I sat down for a minute, while running a ~tetris written in sed.


Nice article. I actually worked for a Japanese company that uses relational shell scripting for developing web systems. They use space separated tables and have developed a set of commands to process text files. Those commands are pretty useful and easy to use. Few examples would be:

- self 1 2 fileName - selects first and second column from the file. There is corresponding tagself command which assumes the first row of the file is the column name. - Join commands which are essentially implementations of sql joins. - Sum commands. One that I remember is sm2.

   $ cat data
     a 15 asdf
     a 30 aaa
     b 13 bbb

   $ sm 1 2 data
     a 45
     b 13
So you would chain these with unix native commands such as awk, sed, grep and do some interesting stuff. Here is a very simple example shows how it would look like:

cat data | self 1 2 | sm 1 2 | grep some_expression > new-data

There is a lot other of these commands that do various sort of thing with text and html files. Anyone who understands japanese might wanna check this out http://www.youtube.com/watch?v=1MYKp7dswQg.


These tools have been bacon saving on more than one occasion. When it's time to marry your shell hacking output to Oracle I recommend checking out external tables.


Did someone say "bacon"?

http://basic-converter.org

BaCon, by Peter van Eerten, provides a useful replacement for shell scripts when you need more power, but don't want to move up to the complexity of C.


Another relevant command originating (if I guess correctly) in AT&T UNIX and present in GNU coreutils: paste. There are also ways to put the whole SQL abstraction over text files, one of which is Perl DBD::CSV.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: