
PgOSQuery: Expose the operating system as a Postgres database - Shish2k
https://github.com/shish/pgosquery
======
JonnieCache
[https://github.com/shish/pgosquery/blob/master/pgosquery/__i...](https://github.com/shish/pgosquery/blob/master/pgosquery/__init__.py)

Wow. That does look easy.

~~~
Shish2k
Indeed it is; TBH, having seen how easy this was to make, I wonder if there
would be any merit to making a stripped-down postgres package which is just
daemon + SQL parser + FDW interface (so no built-in storage engine) -- the FDW
interface is amazing for building SQL interfaces for arbitrary data sets, but
the whole postgres bundle seems a little heavy-weight and complex to install
if that's the only thing you want it for...

Or I guess you could start from the other direction, and make a FDW/Multicorn-
compatible API for SQLite? (Facebook's OSQuery is SQLite-based, but it looks
like the virtual table interface over there is way more complex)

~~~
johnteslade
I tried this with sqlite and the apsw python library. A quick example is at
[https://github.com/johnteslade/pysqliteosquery](https://github.com/johnteslade/pysqliteosquery)

The interface is not quite as nice as Multicorn's but certainly nicer than the
C/C++ SQLlite interface. But you could probably create a wrapper around apsw
so it looked similar to Multicorn.

~~~
Erwin
FYI, you can use ":memory:" as the sqlite data source, instead of having to
conditionally drop the tables (unless you actually do want to store some
persistent data on disk).

~~~
johnteslade
Good spot. The in memory table also runs 3 times quicker.

------
sehrope
Now _this_ is cool! (and I'm not saying that just because I was thinking about
it yesterday[1]).

It's amazing to see how with the multicorn[2] FDW it takes less than 40 lines
of python to put this together.

[1]:
[https://news.ycombinator.com/item?id=8529044](https://news.ycombinator.com/item?id=8529044)

[2]: [http://multicorn.org/](http://multicorn.org/)

~~~
oooooomg
Give me a break with all these "Oh, I commented about a similar thing
yesterday therefore I am a prophet" comments.

------
vezzy-fnord
Nice. Now if only we could find a way to expose the operating system as a
hierarchical file system.

~~~
dragonwriter
> Now if only we could find a way to expose the operating system as a
> hierarchical file system.

Step 1: The OS-as-Postgres-DB system in this thread.

Step 2: FUSE filesystem wrapping the DB, as in
[https://github.com/petere/postgresqlfs](https://github.com/petere/postgresqlfs)

 _Q.E.D._

------
Beltiras
Run postgresql as root, make an interface for the filesystem, hook up to a
Django ORM and you have a webpage with the power of bash? (yes, bad idea on so
many levels but still, potential is .... staggering.)

~~~
rosser
Postgres can't be run as root. From src/backend/main.c:

    
    
        if (geteuid() == 0)
        {
            write_stderr("\"root\" execution of the PostgreSQL server is not permitted.\n"
                         "The server must be started under an unprivileged user ID to prevent\n"
              "possible system security compromise.  See the documentation for\n"
                      "more information on how to properly start the server.\n");
            exit(1);
        }

~~~
Morgawr
Comment it out, for science!

------
Erwin
Cool, I got inspired by this system too -- the aspw library and examples made
it extremely easy to get started. I happen to have a lot of non-relational yet
tabular data sitting around and the interface to read it is Python. It took a
short time before I could join up several of my proprietary databases to do
queries on them.

One thing I am planning on doing is to let the user mix in just about any data
source: any list of objects with properties/values can be used really. So you
can take your database (in proprietary format) and join it up with a text file
(quite a useful thing for me). But you could also mix in an API call (again
done using your identity). I've been thinking about how many REST APIs suport
a subset of SQL via letting you seletct what fields to output, how to order
things etc. -- if you instead allow the user to process it via SQL/sqlite you
will allow the user quite powerful remixing/aggregation facilities.

------
shawn-butler
What benefit does this afford over existing SNMP tools?

[http://www.oid-info.com/get/1.3.6.1.2.1.25.4.2.1](http://www.oid-
info.com/get/1.3.6.1.2.1.25.4.2.1)

or

[http://www.oid-info.com/get/1.3.6.1.2.1.25.5.1.1](http://www.oid-
info.com/get/1.3.6.1.2.1.25.5.1.1)

~~~
kbenson
Complex joining of multiple data sets with filters and conditions utilizing a
query language that has been utilized to great effect for longer than SNMP has
existed.

~~~
shawn-butler
The output of snmptable is already hierarchal and there is already well-
defined MIB for thousands of data sources.

I guess using SQL instead of regex is the benefit you mean? Certainly more
readable.

Not that it really matters much to conversation at hand but standard SQL is
not really that much older than SNMP. Both were late 80s if memory serves.

~~~
odysseas
About 15 years apart to be exact: 1974 for SQL
[http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974....](http://www.almaden.ibm.com/cs/people/chamberlin/sequel-1974.pdf)
and 1988 for SNMP
[http://tools.ietf.org/html/rfc1065](http://tools.ietf.org/html/rfc1065) .

~~~
dragonwriter
shawn-butler is technically correct -- he refers to _standard_ SQL (first
standardized in 1986) and SNMP being about the same time, and both in the late
1980s, and this is correct.

Of course, SQL was around quite a while before being standardized.

~~~
kbenson
I looked it up before posting, and saw the same thing, that SQL was
standardized the year before the SNMP RFCs were put out, but that it was in
use since the 70s. That made the statement correct by either a little or a
lot, depending on how you wanted to look at it, so I felt safe in making it.

------
crxgames
Never worked with FDWs before, but man that looks simple to work with.

------
DrJ
am I understanding this requires one to run Pg on every server that you want
to run the query on?

~~~
johnmudd
Not necessarily. You could put a Python (with psutil module) server on
machines and connect from a single Postgres/FDW server.

------
peterwwillis
While this looks nice, it still depends on Postgres. Why not just import a set
of key/values into _any_ SQL database? Really dumb 5-minute example below.

    
    
      ~/$ ps af | perl -lne 's/(^\s+|\s+$)//g;next if(!@h&&(@h=split(/\s+/,$_)));@_=map{s/"/\\"/g;$_}split(/\s+/,$_,@h);print "INSERT INTO process_table (".join(",",@h).") VALUES (".join(",",map{"\"$_\""}@_).");"'
      INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("19789","pts/8","Ss+","0:00","/bin/bash");
      INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("30616","pts/23","Ss+","0:00","/bin/bash");
      INSERT INTO process_table (PID,TTY,STAT,TIME,COMMAND) VALUES ("29661","pts/22","Ss+","0:00","/bin/bash");

~~~
Sir_Cmpwn
The reason is obvious - the postgres-driven version lets you query the system
in realtime, rather than repeatedly updating the database through a manual
process every so often.

~~~
peterwwillis
Ohhhh, I didn't realize that was the end goal. In that case yeah, this is
useful, if you want to run Postgres on every node in your network (which,
yikes). Would still rather separate the query engine from the data feed,
though. My ideal would be a database application that sends out a request to
update the database and returns new records as they come in (or sorted/ordered
if preferred)

------
gjvc
Am I alone in that whenever I see a TL;DR at the top of something, I
immediately lose interest because it appears to want to appeal to the
idiocracy.

Ironic, I know.

~~~
dragonwriter
A first-party TL;DR is just another name for what in more formal contexts is
called an "abstract" or "executive summary".

I'm not particularly fond myself of that name having caught on for that use,
but I can't manage to be as put-off by it as you say you are.

~~~
swasheck
Yeah, I think I'd rather see a format of:

Abstract:

Content:

