
Q – Execute SQL on text. Supports joins across files, RPM available - harelba
http://github.com/harelba/q
======
mjn
If you need only a subset of full SQL, e.g. just joins, counting/aggregation,
and date manipulation (as in several of these examples), I've found it fairly
easy to work with a mixture of the standard Unix join(1) [found on nearly all
systems], and some of the additions from Google's crush-tools
([https://code.google.com/p/crush-tools/](https://code.google.com/p/crush-
tools/)), mainly 'aggregate', 'grepfield', 'funiq', and 'convdate'. I find
chaining them together a bit easier than writing monolithic SQL statements,
but there's probably some crossover point at which that wouldn't be true.

It'd be interesting to compare runtimes as well. I would guess that there's
some overhead in loading into the DB up front, but that you might gain some
speedup by converting longer chains of Unix pipes into one SQL query. On the
other hand you might lose some parallelism. Would take some testing on
different kinds of queries and data sets to get an idea of the pros/cons I
think.

~~~
ozgune
Another interesting alternative could be using or writing a PostgreSQL foreign
data wrapper.

There is one that supports tabular data (file_fdw), and another one for JSON
files (json_fdw). If you have files in other formats, you can also write your
fdw for it. This way, you get complete SQL coverage.

Also, if you don't want to pay the overhead of parsing the file every time,
you can use the new materialized feature for caching:
[http://www.postgresql.org/docs/9.3/static/rules-
materialized...](http://www.postgresql.org/docs/9.3/static/rules-
materializedviews.html)

(Disclaimer: Enthused Postgres user.)

------
jdp
Matt Might wrote a blog post about using standard CLI tools for this purpose:
[http://matt.might.net/articles/sql-in-the-
shell/](http://matt.might.net/articles/sql-in-the-shell/)

Rather than making a custom tool to issue SQL, the idea is that regular CLI
tools map well to the traditional relational algebra operations. sed is like
selection, cat is like union, etc.

------
unwind
It's written in Python, and seems to use SQLite under the hood.

I guess it just applies tokenization and throws the text into a temporary
database.

Quite similar to the Go project
[https://github.com/dinedal/textql](https://github.com/dinedal/textql), at
least superficially.

~~~
j_s
A lot of similar tools were discussed when that Go project came up 3 weeks
ago:

[https://news.ycombinator.com/item?id=7175830](https://news.ycombinator.com/item?id=7175830)

    
    
      > MS ADO / ODBC
      > Perl DBI
      > npm j (with jqa)
      > Ruby (csv2sqlite) 
      > Python (csvkit)
      > Go (textql, comp)
      > Java (optiq, openrefine, H2 SQL)
      > R (sqldf)
      > Haskell (txt-sushi)
      > XML (xmlstarlet, xmllint, xmlstar)
      > HTML (HtmlAgilityPack, Chrome $x())
      > Postgres file_fdw
      > Oracle external tables
      > SQL Server OPENDATASOURCE and OPENQUERY
      > Log file viewers (MS LogParser, Apache asql, lnav)

~~~
harelba
Hi, I'm q's creator, Harel.

There are obviously lots of other software which can provide a similar
capability, and while I haven't checked all of them out, I'm really believe
that most of them do a great job. However, my rationale for creating this tool
was to provide a seamless addition to the Linux command line toolset - A tool
as most Linux commands are, and not a capability. The distinction I'm doing
here is that tools are reusable, composable and such, vs a capability which is
usually less reusable in different contexts. I'm sure that some of the above
are definitely tools. I just hope that the tool I have created provides value
to people and helps them with their tasks.

As I posted here elsewhere, my complete rationale for creating the tool is
available on the README of the github project. Comments and issues are most
welcome.

Harel Ben-Attia

------
hipsters_unite
This seems like a cool project, but Q is already a well-used JS promise
library so there's a bit of a name conflict there even if the libs are aimed
at very different tasks...

~~~
mbreese
It's also a single letter in the alphabet. Anyone who names their project a
single letter (or two) is expecting some amount of collision. The real
question is whether or not the collision is worth it.

In this case, it might be... they are trying to make a command-line tool. So
in theory, you'll be typing the command often, meaning that a short name is
preferable.

But honestly, it would have probably been a better idea to use a more
descriptive name.

~~~
harelba
Hi, i'm q's creator (HN made the name q uppercase, but it's actually a
lowercase q). The reasoning was that it's used as a command line tool, and
used often. So "q" and not "Q" :)I'm currently preparing the debian package,
and one-letter names are not allowed, so it's going to be named "qsql" there.

------
ajtulloch
Dataset ([https://dataset.readthedocs.org/](https://dataset.readthedocs.org/))
is a similar project that provides ad-hoc SQL querying of structured data -
example usage:

    
    
      import dataset
    
      db = dataset.connect('sqlite:///:memory:')
    
      table = db['sometable']
      table.insert(dict(name='John Doe', age=37))
      table.insert(dict(name='Jane Doe', age=34, gender='female'))
    
      john = table.find_one(name='John Doe')

~~~
mseebach
Dataset provides ad-hoc SQL querying of ... SQL databases.

It looks like a neat API, but it's not really similar to this project.

------
sz4kerto
Eh, q as a name is going to be a serious problem as q is the language for
programming kdb, a column-oriented database. As both are related to databases,
it's difficult to defend this name.

~~~
BMarkmann
That was my initial impression, also... I though kx had added text-processing
to kdb.

------
zeckalpha
Don't join
([https://en.wikipedia.org/wiki/Join_%28Unix%29](https://en.wikipedia.org/wiki/Join_%28Unix%29)),
sed, and grep get you most of the way there?

~~~
harelba
Hi, I'm q's creator, Harel Ben-Attia.

The Linux toolset is really great, and I use it extensively. The whole idea of
the tool is not to replace any of the existing tools, but to extend the
toolset to concepts which treat text as data. In a way, it's a metatool which
provides an easy and familiar way to add more data processing concepts to the
linux toolset. There are many cases where I use 'wc -l' in order to count rows
in a file, but if i need to count the rows of only the ones which have a
specific column which is larger than the value X, or get the sum of some
column per group, then q is a simple and readable way to do it properly,
without any need for "tricks".

My rationale for creating it is also explained in the README of the github
project.

Any more comments are most welcome.

Harel

------
ww2
Name collision with Q/kdb+.

------
jmnicolas
What I really wished for when I read the title was for something that could
make me write regexes but verbosely.

~~~
AUmrysh
In what way would you write regexes verbosely? I'm actually quite interested
in the idea because regexes can be confusing to write at times, and it's
difficult to remember which form to use where, if you use them in many
languages/interfaces.

There are tools like Regexper[1] that let you visualize the regex as an
automata graph, and there are tools like text2re[2] which will allow you to
put in text and visually generate a regex to match it.

I feel like better regex tools should exist on the command line, and it's
potentially a great place for such tools to be rapidly developed and adopted.
There are GUI tools for this like poirot[3], but the command line still exists
because of its accessibility, uniformity, and extensibility.

links:

[1] [http://www.regexper.com/](http://www.regexper.com/)

[2]
[http://txt2re.com/index.php3?s=24%3AFeb%3A2014+%22This+is+an...](http://txt2re.com/index.php3?s=24%3AFeb%3A2014+%22This+is+an+Example%21%22&submit=Show+Matches)

[3] [http://www.espgraphics.com/poirot/](http://www.espgraphics.com/poirot/)

~~~
peterwwillis
I'm probably heavily biased, but to me Perl is the best command-line regex
tool. Perl was invented to gather data and report on it, and its regex engine
is incredibly fast and powerful. As an added bonus it supports some Python and
PCRE-specific extensions. But this Q app is useful for people who either don't
know Perl or can get what they need done faster with SQL than with scripting.

In terms of 'verbosity' you can embed comments inside a regular expression, or
build a regular expression over multiple lines, or make a set of regex objects
and interpolate them into larger regex's. Perl has copious amounts of
documentation to help you understand the many ways to use regexs in Perl.

[http://perldoc.perl.org/perlrequick.html](http://perldoc.perl.org/perlrequick.html)
[http://perldoc.perl.org/perlretut.html](http://perldoc.perl.org/perlretut.html)
[http://perldoc.perl.org/perlfaq6.html#How-can-I-hope-to-
use-...](http://perldoc.perl.org/perlfaq6.html#How-can-I-hope-to-use-regular-
expressions-without-creating-illegible-and-unmaintainable-code%3f)

~~~
theOnliest
> As an added bonus it supports some Python and PCRE-specific extensions.

This is a bit of a strange thing to say, since nearly all of the advanced
regex features showed up in Perl _first_. PCRE stands for "Perl-compliant
regular expressions," so there's certainly no extensions there that didn't
originally come from Perl. I'm less sure about Python, but I get the sense
that they borrow from Perl regular expressions as well.

~~~
peterwwillis
[http://perldoc.perl.org/perlre.html#PCRE/Python-
Support](http://perldoc.perl.org/perlre.html#PCRE/Python-Support)

    
    
      PCRE/Python Support
      
      As of Perl 5.10.0, Perl supports several Python/PCRE-specific extensions to the
      regex syntax. While Perl programmers are encouraged to use the Perl-specific
      syntax, the following are also accepted:
      
          (?P<NAME>pattern)
          Define a named capture group. Equivalent to (?<NAME>pattern).
      
          (?P=NAME)
          Backreference to a named capture group. Equivalent to \g{NAME} .
      
          (?P>NAME)
          Subroutine call to a named capture group. Equivalent to (?&NAME).

------
spacemanmatt
I'm still not seeing anything easier than ETLing into a regular old database,
at which point you have the power and flexibility of a regular old database.
Which is pretty spiffy, actually.

Are people really so bad at databases that they'll gladly suffer hacks like
this to avoid using one?

------
spacemanmatt
I have ended up with MySQL and PostgreSQL on nearly every linux I've
installed. It is baffling to me that someone would decline these tools in
favor of text file drivers or SQLite, with the exception of using SQLite as an
embedded config database or something.

~~~
super_mario
Yeah I also do the same, import the csv (it doesn't have to be comma
separated) file into PostgreSQL and analyze it using SQL. Alternatively, I
will load it into IPython/pandas and work with it like that.

This is however useful for one off, throwaway query that offers familiar SQL
syntax, if you don't want to use awk that is.

------
gtrubetskoy
The credit should go to SQLite, which is extremely versatile. A while back I
wrote Thredis (thredis.org), which lets you use SQL commands inside Redis
(also using SQLite, of course).

------
pointernil
Now imagine some kind of RX.net libs made applicable to the output of the tail
-F command... Reactive log file processing?

------
aethertap
This is really clever, and I'm definitely going to give it a workout on my
system. Thanks!

------
chris_wot
Cool idea!

~~~
npongracic
Cool idea but i've read about something similar on Scott Hanselman's blog,
it's called LogParser:
[http://www.hanselman.com/blog/AnalyzeYourWebServerDataAndBeE...](http://www.hanselman.com/blog/AnalyzeYourWebServerDataAndBeEmpoweredWithLogParserAndLogParserLizardGUI.aspx)

