Yeah, xsv is great for one time tasks like this. If you're going to need to do a lot of queries, though, it's easier just to .mode csv .import in SQLite.
The format is .import CSVFILE TABLENAME [0]. If TABLENAME already exists, it would use that schema. Otherwise it will all be TEXT data using the first row as column names. However, SQLite uses dynamic typing (they call it "type affinity") so it kind of doesn't matter.
I like SQLite a lot, but one of the funniest things about it is how they try to sell you that type affinity is a great thing [1] when really it's not at all what you want in 99% of RDBMS situations. There's a reason that everybody from MySQL and PostgreSQL to MS SQL Server, Oracle, and DB2 use static typing.
Will doing a where clause (ex: where column_x > 3) actually work if column_x is now all strings? I recall my query executing and returning incorrect results. I figured out it would work if I explicitly caste the where clause as ( where float(column_x) > 3) or something like that (this was a few years ago).
Edit: I assume most databases like Oracle (it's SQL packages looked bizzare to me until I started learning Ada and figured out it has the same syntax) use static typing for performance, query optimization, and error checking? Note that although I use SQL almost daily, I am no expert on the innards. SQLite has a different use case. It seems to be for embedded and ad-hoc analysis where dynamic typing makes more sense. This is just a guess though.
There's quite a bit more to a relational database than the equivalent of multiple tables that might share a "key", and some utility to join.
Further, if you're going to steer clear of easy options like SQLlite, there are more robust commands than "join". Awk, for example, can do much more than just join. Though if we're keeping this to what can be done via CLI, you still have tools like python. Between Pandas and, if you really must have SQL syntax, Pandasql, you can simulate an RDB even more simply.
> There likely happens to be a 'join' program installed on your computer and like any program it can be invoked from your shell.
'join' is a Linux coreutil, so though it might not always be there for strictly UNIX, it will be for Linux.
Calling a coreutil part of the shell does make sense in some contexts, and I'd suggest when talking about what you can do with a cli, this is one of those times.
ok. yeah. pursuing that, i'm wondering if there's a simple way to convince someone that this is true.
one idea I saw somewhere was that because awk is part of a Turing complete language, it can be used to implement the relational algebra. though I don't know how to prove that myself, except by actual construction: implementing each part of the relational algebra definition.
looking over the definition of relational algebra, at first I wondered: what about a theta join? it's not directly built into the CLI "join" command. but then the article points out that if you can obtain a Cartesian product of two tables then you can use a Select to filter out the rows that do not meet the criteria of your theta join.
and then there's division, which I can only guess would depend heavily on some careful awk scripting.
and so it goes. IDK. it seems a bit laborious to prove it all by construction.