
Show HN: cq – Query CSVs using SQL - jolmg
https://github.com/jolmg/cq
======
cosmie
This is a really slick little utility! I'll definitely be adding it to my data
munging grab bag.

For those that deal with this sort of thing often, here are some similarly
useful tools:

⠀

Really useful for generating table definitions and/or importing data from a
csv to a variety of databases (bunch of other useful utilities in the repo as
well):
[https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html](https://csvkit.readthedocs.io/en/latest/scripts/csvsql.html)

Similar to cq for querying a CSV file directly:
[http://harelba.github.io/q/](http://harelba.github.io/q/)

Not used often, but really handy when you come across a csv file it's designed
for: [http://colin.maudry.com/csvtool-manual-
page/](http://colin.maudry.com/csvtool-manual-page/)

Diff csv files:
[http://paulfitz.github.io/daff/](http://paulfitz.github.io/daff/)

Can convert flatfiles between a bunch of different formats. Useful blunt tool
to get things to and from csv files (among others):
[https://github.com/dflemstr/rq](https://github.com/dflemstr/rq)

And for searching JSON: jq:
[https://stedolan.github.io/jq/](https://stedolan.github.io/jq/) gron:
[https://github.com/tomnomnom/gron](https://github.com/tomnomnom/gron)

Or use in2csv from csvkit or rq to get the JSON over to CSV and go from there.

------
jolmg
I made this because it came up too often that I had to process CSVs. I would
write a quick ruby script for that, but ruby isn't really ideal for doing
stuff like joins and other things, so I wrote this.

~~~
dmlittle
Nice! As I was reading the title I thought to myself how this could easily be
done by importing the CSV to SQLite and querying against it. This is exactly
what you did but with a really clean and easy to use interface. This is
awesome.

~~~
jolmg
Thanks. I haven't checked, but I'm hoping it would be simple to add support
for using PostgreSQL and/or MariaDB/MySQL since I found that Sqlite doesn't
support some constructs of SQL (I can't remember what they were).

The big advantage of sqlite is the lack of static types for columns. I can
just add numbers without explicitely typing a column as being numeric.

~~~
theonemind
It doesn't do right joins or full outer joins. It can do left joins and inner
joins.

------
loa_in_
I have not tried it out, but the := notation and bash expansion example looks
very compelling.

One could in fact template SQL scripts expecting tables a,b,c as input and
just substitute inputs for premade operations.

~~~
jolmg
That's pretty good. Seems like a good reason to add an -f alternative to -q to
provide an sql file.

In fact, now that I remember, it has happened that I'll be provided a CSV, so
I write this big query. Then, from analyzing the results, they'll tell me they
made a mistake with the CSV they gave me and provide me with a corrected one.
They'll use a different name to avoid confusion in not knowing which is the
bad one and which is the corrected one, and I just change the file in the
assignment and rerun the same SQL.

------
vkaku
Good stuff, it's finally here! Now what I'd like is for someone to integrate
the DDL/DML stuff as well (Git and CSV); The idea behind that is:

1\. 'create table' should create a new .csv and new .schema.csv;

2\. 'insert into' should insert records into .csv;

3\. 'commit' should do a git commit

That way we get a human readable database + all the transaction semantics and
data versioning + all the query goodness. Would like to know if there's a
project doing exactly this.

~~~
jolmg
The query is just passed as-is to sqlite, so you can do inserts (and create
tables, too):

    
    
      $ cq t:=- -q '
        insert into t values (2,3);
        select * from t;
      ' <<EOF
      foo,bar 
      1,2
      EOF                                                       
      foo,bar
      1,2
      2,3
    

I can't think of a good use-case to do create table from this tool, though. It
would always be simpler to just create the CSV file.

Also, I can't see the point of interpreting "commit" in the SQL to do a git
commit. This tool just passes the SQL to an established SQL engine (only
SQLite for now). I don't think I want to get into interpreting my own version
of SQL. It doesn't sound KISS. In the same vein, interpreting "create table"
to create a schema.csv also sounds out of scope just from having to interpret
that instead of passing it to an SQL engine, and I can't see the point either.

Ultimately, my aim is for this tool to be more analogous to jq than to sqlite.
It's not so much an RDBMS, although you can query a directory of CSVs by
globbing like it's a database. It's mainly an ad-hoc querying tool.

EDIT: I don't know anymore. I'm growing to like your idea, and thinking of how
it can be supported. I'll keep it in consideration.

Does anyone have a concrete use-case for a human readable database with
automatic git commits like that? It wouldn't be fast enough for anything
intensive. I can only see it as a CLI version of Excel.

It may also be nice to be able to use that to export a DB to SQL and then use
cq to turn that into a set of CSV files. Would anyone really find something
like that useful, though? Maybe that's pointless.

~~~
vkaku
Concrete use-case: Infrastructure as code. Sometimes we like to keep track of
servers, environments, accounts or what not, and a human readable/verifiable
representation is useful to check differences in a period of time, for audit
purposes. Git/SCM is a good tool to do versioning and atomicity; The CSV is
for the readability.

------
mcpherrinm
Command line tools for csvs are great. I've shoved a CSV into sqlite many
times before for this sort of thing.

Some related tools:

[https://github.com/BurntSushi/xsv/](https://github.com/BurntSushi/xsv/)
[http://ebay.github.io/tsv-utils/](http://ebay.github.io/tsv-utils/)

------
Existenceblinks
I can imagine we could have a new kind of shell programs (process)
communication via pipe. In stead of reading line, reading [cartesian]product
(record). awk/sed/cut sort of doing column based processing already. We could
try record as interchangeable data (don't make it hierarchy (e.g json)).

~~~
jolmg
A disadvantage this has over awk/sed/cut is that those tools will start to
process their input on a line-by-line basis. They don't need to wait for their
inputs to complete to start outputting their results. cq/sqlite on the other
hand needs to read the whole input file before starting to work. This is
because sqlite, at the time it's importing the file into the in-memory
database, doesn't know what kind of query you're going to ask of it. It might
require processing the data in any order, and it's simplest to just slurp the
whole file, rather than lazily loading it based on the query.

------
karmajunkie
This is awesome, I couldn’t begin to guess how many times I’ve imported csv’s
into sqlite to clean them up and trim them down. Great job!

------
oftenwrong
If I have to run complex queries over a CSV I usually use COPY to import it
into Postgres.

~~~
jolmg
This does the same thing, only with SQLite. The main point of this tool is to
concisely join the steps of 1) starting your SQL client, 2) loading the CSV,
3) running the query, in a single step.

You can also join those steps in a single command with Postgres, something
like:

    
    
      psql << EOF
        create table ...
        copy tablename from filename csv ...
        copy (select ...) to stdout csv ...
        drop table ...
      EOF
    

However, this tool is more concise to do the above and facilitates combining
with brace-expansion, file tab-completion, and <() process substitution.

