I'd be curious if anyone actually used it -- is it obscure because it is a bad idea or because it didn't offer much more over the shell, etc.?
"There are many database systems available for UNIX. But almost all are software prisons that you must get into and leave the power of UNIX behind. Most were developed on operating systems other than UNIX. Consequently their developers had very few software features to build upon, and wrote the functionality they needed directly, without regard for the features provided by the operating system. The resulting database systems are large, complex programs which degrade total system performance, especially when they are run in a multi-user environment. UNIX provides hundreds of programs that can be piped together to easily perform almost any function imaginable. Nothing comes close to providing the functions that come standard with UNIX. Programs and philosophies carried over from other systems put walls between the user and UNIX, and the power of UNIX is thrown away. The shell, extended with a few relational operators, is the fourth generation language most appropriate to the UNIX environment. 1. Fourth Generation Systems In recent years, a variety of developments in programming language design have emerged. Object-oriented languages are becoming common, and languages explicitly supporting multiple tasks and inter-task communication are also gaining popularity. "
NoSQL is a fast, portable, relational database management system without arbitrary limits, (other than memory and processor speed) that runs under, and interacts with, the UNIX1 Operating System. It uses the "Operator-Stream Paradigm" described in "Unix Review", March, 1991, page 24, entitled "A 4GL Language". There are a number of "operators" that each perform a unique function on the data. The "stream" is supplied by the UNIX Input/Output redirection mechanism. Therefore each operator processes some data and then passes it along to the next operator via the UNIX pipe function. This is very efficient as UNIX pipes are implemented in memory. NoSQL is compliant with the "Relational Model".
Interesting, didn't know about that one. It looks more modern and usable.
I'm trying to think if I would actually download it and try to use it. Or I could just keep on making use of that all the hard-earned sed/awk magic line noise. :)
>> Of course, if you GNU coreutils on your system (and if you're running Linux, you almost certainly do), it's even easier and more efficient: "comm -23 A B".
Maybe I'm misunderstanding your point, but I don't think GNU's coreutils is the only way to get comm[1].
No, you understood my comment; I assumed since the manpage I was looking at made no reference to a standard, that it was nonstandard. It seems you're right, and comm(1) is more widely available, excellent.
It looks you changed the URL of your blog post too?
I got two different RSS items for the post and the update, and the original post's location (/articles/relationa-algebra-in-bash/) no longer resolves. Was that intended?
Sed and awk have become a lost art. They're the only
languages I know that frequently beat perl in semantic
density. If you haven't learned them yet, you'll be
impressed with what they can do for you.
I recently learned that sed had labels, and that you could repurpose its inner spaces as state. I sat down for a minute, while running a ~tetris written in sed.
Nice article. I actually worked for a Japanese company that uses relational shell scripting for developing web systems. They use space separated tables and have developed a set of commands to process text files. Those commands are pretty useful and easy to use. Few examples would be:
- self 1 2 fileName - selects first and second column from the file. There is corresponding tagself command which assumes the first row of the file is the column name.
- Join commands which are essentially implementations of sql joins.
- Sum commands. One that I remember is sm2.
$ cat data
a 15 asdf
a 30 aaa
b 13 bbb
$ sm 1 2 data
a 45
b 13
So you would chain these with unix native commands such as awk, sed, grep and do some interesting stuff. Here is a very simple example shows how it would look like:
cat data | self 1 2 | sm 1 2 | grep some_expression > new-data
There is a lot other of these commands that do various sort of thing with text and html files. Anyone who understands japanese might wanna check this out http://www.youtube.com/watch?v=1MYKp7dswQg.
These tools have been bacon saving on more than one occasion. When it's time to marry your shell hacking output to Oracle I recommend checking out external tables.
BaCon, by Peter van Eerten, provides a useful replacement for shell scripts when you need more power, but don't want to move up to the complexity of C.
Another relevant command originating (if I guess correctly) in AT&T UNIX and present in GNU coreutils: paste. There are also ways to put the whole SQL abstraction over text files, one of which is Perl DBD::CSV.
http://en.scientificcommons.org/43495509
I'd be curious if anyone actually used it -- is it obscure because it is a bad idea or because it didn't offer much more over the shell, etc.?
"There are many database systems available for UNIX. But almost all are software prisons that you must get into and leave the power of UNIX behind. Most were developed on operating systems other than UNIX. Consequently their developers had very few software features to build upon, and wrote the functionality they needed directly, without regard for the features provided by the operating system. The resulting database systems are large, complex programs which degrade total system performance, especially when they are run in a multi-user environment. UNIX provides hundreds of programs that can be piped together to easily perform almost any function imaginable. Nothing comes close to providing the functions that come standard with UNIX. Programs and philosophies carried over from other systems put walls between the user and UNIX, and the power of UNIX is thrown away. The shell, extended with a few relational operators, is the fourth generation language most appropriate to the UNIX environment. 1. Fourth Generation Systems In recent years, a variety of developments in programming language design have emerged. Object-oriented languages are becoming common, and languages explicitly supporting multiple tasks and inter-task communication are also gaining popularity. "