
There's a Relational Database in Your Unix CLI - philk10
https://spin.atomicobject.com/2019/06/16/unix-cli-relational-database/#.XQa5nMJWYag.hackernews
======
orf
[https://github.com/BurntSushi/xsv](https://github.com/BurntSushi/xsv) is all
you need here, not a mess of sort commands.

    
    
       xsv join enrollments.csv courses.csv | xsv table
    

Uses an index as well.

~~~
da_chicken
Yeah, xsv is great for one time tasks like this. If you're going to need to do
a lot of queries, though, it's easier just to .mode csv .import in SQLite.

~~~
6thaccount2
Doesn't this convert all data to strings? I think that makes some queries
wonky.

Or does it search through each row and see that one element out of 1000
integers was a float, so the whole column should be floats?

~~~
da_chicken
The format is .import CSVFILE TABLENAME [0]. If TABLENAME already exists, it
would use that schema. Otherwise it will all be TEXT data using the first row
as column names. However, SQLite uses dynamic typing (they call it "type
affinity") so it _kind of_ doesn't matter.

I like SQLite a lot, but one of the funniest things about it is how they try
to sell you that type affinity is a great thing [1] when really it's not at
all what you want in 99% of RDBMS situations. There's a reason that everybody
from MySQL and PostgreSQL to MS SQL Server, Oracle, and DB2 use static typing.

[0]:
[https://sqlite.org/cli.html#csv_import](https://sqlite.org/cli.html#csv_import)

[1]:
[https://www.sqlite.org/datatype3.html](https://www.sqlite.org/datatype3.html)

~~~
6thaccount2
Will doing a where clause (ex: where column_x > 3) actually work if column_x
is now all strings? I recall my query executing and returning incorrect
results. I figured out it would work if I explicitly caste the where clause as
( where float(column_x) > 3) or something like that (this was a few years
ago).

Edit: I assume most databases like Oracle (it's SQL packages looked bizzare to
me until I started learning Ada and figured out it has the same syntax) use
static typing for performance, query optimization, and error checking? Note
that although I use SQL almost daily, I am no expert on the innards. SQLite
has a different use case. It seems to be for embedded and ad-hoc analysis
where dynamic typing makes more sense. This is just a guess though.

------
smitty1e
The mighty SQLite is well worth considering.
[http://www.sqlitetutorial.net/sqlite-import-
csv/](http://www.sqlitetutorial.net/sqlite-import-csv/)

------
rongenre
While that's cool, you can do it with way more functionality by loading comma
or tab-separated data into sqlite.

~~~
Maxious
It even will use the CSV headers to make the table schema for you
[https://sqlite.org/cli.html#csv_import](https://sqlite.org/cli.html#csv_import)

------
ineedasername
There's quite a bit more to a relational database than the equivalent of
multiple tables that might share a "key", and some utility to join.

Further, if you're going to steer clear of easy options like SQLlite, there
are more robust commands than "join". Awk, for example, can do much more than
just join. Though if we're keeping this to what can be done via CLI, you still
have tools like python. Between Pandas and, if you really must have SQL
syntax, Pandasql, you can simulate an RDB even more simply.

------
gumby
> Did you know that there’s a relational database hiding in your Unix shell?

No I did not know this because there isn't.

There likely happens to be a 'join' program installed on your computer and
like any program it can be invoked from your shell.

~~~
shakna
> There likely happens to be a 'join' program installed on your computer and
> like any program it can be invoked from your shell.

'join' is a Linux coreutil, so though it might not always be there for
strictly UNIX, it will be for Linux.

Calling a coreutil part of the shell does make sense in some contexts, and I'd
suggest when talking about what you can do with a cli, this is one of those
times.

~~~
HillaryBriss
there's also this list:
[https://en.wikipedia.org/wiki/List_of_Unix_commands](https://en.wikipedia.org/wiki/List_of_Unix_commands)
which includes join (and, importantly, awk), as you pointed out

------
HillaryBriss
IDK, what about indexes? what about atomic transactions which
add/update/delete rows to/from/from multiple tables?

~~~
PaulHoule
It's more accurate to say that you can implement the relational algebra via
the Unix command line

[https://en.wikipedia.org/wiki/Relational_algebra](https://en.wikipedia.org/wiki/Relational_algebra)

~~~
HillaryBriss
ok. yeah. pursuing that, i'm wondering if there's a simple way to convince
someone that this is true.

one idea I saw somewhere was that because awk is part of a Turing complete
language, it can be used to implement the relational algebra. though I don't
know how to prove that myself, except by actual construction: implementing
each part of the relational algebra definition.

looking over the definition of relational algebra, at first I wondered: what
about a theta join? it's not directly built into the CLI "join" command. but
then the article points out that if you can obtain a Cartesian product of two
tables then you can use a Select to filter out the rows that do not meet the
criteria of your theta join.

and then there's division, which I can only guess would depend heavily on some
careful awk scripting.

and so it goes. IDK. it seems a bit laborious to prove it all by construction.

