
Relational shell programming - g3orge
http://matt.might.net/articles/sql-in-the-shell/
======
chubot
There was an entire system called "rdb" written around this philosophy of
flat-files-as-relations:

<http://en.scientificcommons.org/43495509>

I'd be curious if anyone actually used it -- is it obscure because it is a bad
idea or because it didn't offer much more over the shell, etc.?

"There are many database systems available for UNIX. But almost all are
software prisons that you must get into and leave the power of UNIX behind.
Most were developed on operating systems other than UNIX. Consequently their
developers had very few software features to build upon, and wrote the
functionality they needed directly, without regard for the features provided
by the operating system. The resulting database systems are large, complex
programs which degrade total system performance, especially when they are run
in a multi-user environment. UNIX provides hundreds of programs that can be
piped together to easily perform almost any function imaginable. Nothing comes
close to providing the functions that come standard with UNIX. Programs and
philosophies carried over from other systems put walls between the user and
UNIX, and the power of UNIX is thrown away. The shell, extended with a few
relational operators, is the fourth generation language most appropriate to
the UNIX environment. 1. Fourth Generation Systems In recent years, a variety
of developments in programming language design have emerged. Object-oriented
languages are becoming common, and languages explicitly supporting multiple
tasks and inter-task communication are also gaining popularity. "

~~~
prakashk
You beat me to posting this.

Also of interest: NoSQL - a non-SQL RDBMS, seems to have been inspired by rdb
and implemented using awk ([http://www.strozzi.it/cgi-
bin/CSA/tw7/I/en_US/nosql/Home%20P...](http://www.strozzi.it/cgi-
bin/CSA/tw7/I/en_US/nosql/Home%20Page)):

 _NoSQL is a fast, portable, relational database management system without
arbitrary limits, (other than memory and processor speed) that runs under, and
interacts with, the UNIX1 Operating System. It uses the "Operator-Stream
Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL
Language". There are a number of "operators" that each perform a unique
function on the data. The "stream" is supplied by the UNIX Input/Output
redirection mechanism. Therefore each operator processes some data and then
passes it along to the next operator via the UNIX pipe function. This is very
efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the
"Relational Model"._

~~~
chubot
Interesting, didn't know about that one. It looks more modern and usable.

I'm trying to think if I would actually download it and try to use it. Or I
could just keep on making use of that all the hard-earned sed/awk magic line
noise. :)

------
jemfinch
If you're a student in mattmight's class, you shouldn't read this.

N hfrshy gbby gung negvpyr zvffrf vf gung frg qvssrerapr pna or vzcyrzragrq
(zhpu zber rssvpvragyl guna uvf dhnqengvp nytbevguz), hfvat naq .

Vs lbh unir gjb svyrf N naq O juvpu ner frgf (v.r., ab vagreany qhcyvpngrf), N
- O pna or rkcerffrq ol "fbeg N O O | havd -h".

Bs pbhefr, vs lbh TAH pberhgvyf ba lbhe flfgrz (naq vs lbh'er ehaavat Yvahk,
lbh nyzbfg pregnvayl qb), vg'f rira rnfvre naq zber rssvpvrag: "pbzz -23 N O".

~~~
mattmight
That's the exercise at the bottom of the page. ;)

I hope my students don't see this comment.

Edit: Thanks for "encrypting" your post. Decrypting it in one tweet's worth of
sed will be next Tuesday's homework.

~~~
joshcorbin
Really? Sed? Me thinks they should TRy harder ;-)

------
twp
The author misses the key command for performing set operations:

    
    
        comm
    

This can perform set differences, unions, and even symmetric difference.

    
    
        comm -23 a b   # a minus b
        comm -12 a b   # a and b
        comm -3  a b   # a symmetric difference b (with some extra tabs)

~~~
softbuilder
Thank you! I couldn't remember what this command was called and I was scanning
that post trying to see if he'd mention it.

------
minimax
A more idiomatic way to phrase the awk in the Selection example:

    
    
        awk -F ":" '{ if ( $3 != $4 ) print }' /etc/passwd
    

would be like this:

    
    
        awk -F ":" '$3 != $4 { print }' /etc/passwd
    

The first is obviously valid awk, but the second highlights awk's unique
"pattern { action }" syntax.

~~~
malkomalko
I would argue the following below is a little nicer:

    
    
        awk -F':' '$3 != $4' /etc/passwd

------
dexen
_join_ is also worth mentioning: as expected, performs relational join of
supplied input, equivalent to SQL's STRAIGHT / LEFT / RIGHT JOIN.

<http://man.cat-v.org/plan_9/1/join>

~~~
mattmight
I just posted an update. Thanks for pointing this out!

~~~
rufibarbatus
It looks you changed the URL of your blog post too?

I got two different RSS items for the post and the update, and the original
post's location (/articles/relationa-algebra-in-bash/) no longer resolves. Was
that intended?

------
klochner
some one-liners that are often useful:

    
    
        cat a b | sort | uniq > c      # c = a union b
        cat a b | sort | uniq -d > c   # c = a intersect b
        cat a b b | sort | uniq -u > c # c = set difference a - b

------
agumonkey

      Sed and awk have become a lost art. They're the only
      languages I know that frequently beat perl in semantic
      density. If you haven't learned them yet, you'll be
      impressed with what they can do for you.
    

I recently learned that sed had labels, and that you could repurpose its inner
spaces as state. I sat down for a minute, while running a ~tetris written in
sed.

------
byambatsogt
Nice article. I actually worked for a Japanese company that uses relational
shell scripting for developing web systems. They use space separated tables
and have developed a set of commands to process text files. Those commands are
pretty useful and easy to use. Few examples would be:

\- self 1 2 fileName - selects first and second column from the file. There is
corresponding tagself command which assumes the first row of the file is the
column name. \- Join commands which are essentially implementations of sql
joins. \- Sum commands. One that I remember is sm2.

    
    
       $ cat data
         a 15 asdf
         a 30 aaa
         b 13 bbb
    
       $ sm 1 2 data
         a 45
         b 13
    

So you would chain these with unix native commands such as awk, sed, grep and
do some interesting stuff. Here is a very simple example shows how it would
look like:

cat data | self 1 2 | sm 1 2 | grep some_expression > new-data

There is a lot other of these commands that do various sort of thing with text
and html files. Anyone who understands japanese might wanna check this out
<http://www.youtube.com/watch?v=1MYKp7dswQg>.

------
ErikRogneby
These tools have been bacon saving on more than one occasion. When it's time
to marry your shell hacking output to Oracle I recommend checking out external
tables.

~~~
ThaddeusQuay2
Did someone say "bacon"?

<http://basic-converter.org>

BaCon, by Peter van Eerten, provides a useful replacement for shell scripts
when you need more power, but don't want to move up to the complexity of C.

------
minopret
Another relevant command originating (if I guess correctly) in AT&T UNIX and
present in GNU coreutils: paste. There are also ways to put the whole SQL
abstraction over text files, one of which is Perl DBD::CSV.

