
Records: Python library for making raw SQL queries to Postgres databases - infinite8s
https://github.com/kennethreitz/records
======
cjauvin
I had planned to "show it to HN" soon, but the occasion is too good now: I
have also created a thin "psycopg2 wrapper", pretty much in the same spirit as
Records (but different in purpose), which allows you to perform SQL operations
with plain data structures:

[https://github.com/cjauvin/little_pger](https://github.com/cjauvin/little_pger)

I have been using this for myself for quite some time, but feedback would be
very appreciated!

Edit: A cool thing about it (I believe) is that it provides an "upsert
wrapper" which makes use of either the new PG 9.5 "on conclict" mechanism if
it's available, or an "update if exists, insert if not" two-step fallback, for
previous versions.

~~~
aschampion
Neat library. Just to note that this approach to fake upsert is not safe to
concurrent sessions. There were exactly two concurrency-safe ways to implement
upsert prior to 9.5:

1\. An exclusive lock on the table, which is slow as hell 2\. A PGPSQL
function that attempts to INSERT, catches the duplicate key exception and
instead UPDATEs (also handling the case that the row vanishes), which is
slower than hell

This is why ON CONFLICT is such a huge deal. If it were as simple as checking
keys with a SELECT then sending some to INSERT and some to UPDATE it would
just be nice syntax sugar.

~~~
approaching
> 2\. A PGPSQL function that attempts to INSERT, catches the duplicate key
> exception and instead UPDATEs (also handling the case that the row
> vanishes), which is slower than hell

That's not true, at least not when you're operating on a single row at a time.
According to my benchmarks the function approach was usually within 10% of the
native upsert -- even outperforming it by around 5% depending on data
distribution. For insert-or-select (instead of update) the function
consistently outperformed INSERT ... IGNORE.

~~~
aschampion
Good point -- I was talking in the context of bulk upserts but didn't make
that explicit.

------
dtheodor
Sorry, I don't see the point of this library. What exactly does it provide? It
looks like a very thin wrapper (150 lines of code) over psycopg2 that does
exactly the same thing, just with a slightly different API.

~~~
pekk
More than one library by Kenneth Reitz can be described in this way. It is a
way of complaining about the color of the bikeshed.

------
monkmartinez
This looks like a (very thin) wrapper of the psycopg2 library, which
implements the python DBAPI. Tablib functionality is integrated for bonus
points. 137 lines.

So am I correct in my observation that this is just a different way to
use/call the psycopg2 library?

The more I look at the code, the more I wonder where is this going? Will the
psycopg2 requirement become a problem or a benefit over time?

SQLalchemy is pretty darn good.... there are more ways to connect to a DB in
python than I shake a stick at. I guess I am confused...

Edit: Fixed wording on DBAPI

~~~
masklinn
> This looks like a (very thin) wrapper of the psycopg2 library, which is a
> wrapper around the python DB API

psycopg2 is an implementation of DBAPI2 (with extensions). As the name
denotes, DBAPI is an API which database libraries ought implement, it's not a
library.

~~~
monkmartinez
You are right, thanks.

------
Grue3
>Rows are represented as standard Python dictionaries

That seems inefficient. You're always going to have the same columns in each
row, so storing column names for each row is a huge waste of memory. Django's
"cursor.fetchall" returns a list of tuples, which seems to be a saner way to
represent db rows in Python.

~~~
glifchits
Reitz has a public Facebook post on this. Here are his comments:

Harry Percival Looks nice! Y u no namedtuple?

Kenneth Reitz Dictionaries are a much more predictable interface ;)

Kenneth Reitz Just because you can do something, doesn't mean you should!

~~~
kenneth_reitz
Well, that's creepy.

~~~
glifchits
Hahaha... I can only imagine. Sorry to creep you out. Your Facebook profile is
just a click away from your personal website, and your discussion about
Records is up front and center on your feed for me :)

------
benatkin
Looks awesome. A couple questions:

If you do a query that returns 100 records and use rows.next(), does it just
get them from the cursor without pulling down the rest from the database? Do
you need to dispose of the cursor or is it automatically cleaned up?

The export formats are just a an attribute, not a method call. It would need
to use @property IIRC. Does this go against the guideline in Python that
"explicit is better than implicit"?

~~~
pyre
He's using cursors directly from the his database library[1] which is
psycopg2[2] (and we can also see that the cursor_factory is how the dicts are
generated). The psycopg2 docs say[3]:

    
    
      When a database query is executed, the Psycopg cursor usually
      fetches all the records returned by the backend, transferring
      them to the client process. If the query returned an huge amount
      of data, a proportionally large amount of memory will be
      allocated by the client.
    

[1]
[https://github.com/kennethreitz/records/blob/master/records....](https://github.com/kennethreitz/records/blob/master/records.py#L108)

[2]
[https://github.com/kennethreitz/records/blob/master/records....](https://github.com/kennethreitz/records/blob/master/records.py#L82)

[3] [http://initd.org/psycopg/docs/usage.html#server-side-
cursors](http://initd.org/psycopg/docs/usage.html#server-side-cursors)

------
sauere
It does look neat, but what would the advantages be over using the SQLAlchemy
tools (not the ORM)?

~~~
hobarrera
This does look a lot simpler to use, especially if I want to quickly run a
console, grab some data, and work on it.

A psql shell doesn't have the power of python once I get the results, and
SQLAlchemy isn't as trivial to use.

------
Wonnk13
Interesting. I love requests. SQL isn't my area of expertise so im genuinely
surprised in 2016 we're still building libraries like this for Python; are
existing libraries really that bad?

~~~
ForHackernews
> are existing libraries really that bad?

I would say 'no'. SQLAlchemy is very good, but extremely complex and powerful.
I think people don't understand that you can use the core features without any
of the ORM overhead, if you don't want it.

If you want to "Just write SQL" you can do that, easily:
[http://docs.sqlalchemy.org/en/latest/core/tutorial.html#usin...](http://docs.sqlalchemy.org/en/latest/core/tutorial.html#using-
textual-sql)

~~~
mastazi
I'm just curious: in the first example of the section that you linked, what's
the purpose of the "|| ', ' || " bit? I think the double pipe is used for
concatenation in SQL, but I don't understand why the ', ' part needs to be
concatenated like that.

~~~
zzzeek
that example is intending to produce a single column result that concatenates
two fields into a comma-separated string, e.g. "your name, you@somesite.com".

I think jumping into that example directly is probably kind of disorienting.
The SQL in that string is a little bit weird/contorted just to illustrate,
"it's text, do whatever you want!". I also wrote that example like ten years
ago.

~~~
mastazi
Oh OK I get it, it makes perfect sense but yes, jumping to that example
directly might be disorienting. Maybe a one-line explanation (just like the
one you gave me) placed immediately above or below the code example would make
it more readable!

------
aj_nikhil
How to connect to my db ?

>>> db = records.Database('postgres://localhost:5432')

>>> rows = db.query('select * from customer_customer')

Traceback (most recent call last): File "<console>", line 1, in <module> File
"/Users/ns/teabox_django_env/lib/python2.7/site-packages/records.py", line
109, in query c.execute(query, params) File
"/Users/ns/teabox_django_env/lib/python2.7/site-packages/psycopg2/extras.py",
line 223, in execute return super(RealDictCursor, self).execute(query, vars)
ProgrammingError: relation "customer_customer" does not exist LINE 1: select *
from customer_customer

------
bobby_9x
I've tried lots of different python sql libraries and the only one that I
didn't have to abandon because of limitations is sqlalchemy.

This sounds nice, but I'm skeptical.

------
kpmah
It still needs a bit of work, but I wrote a small library for Python to use
SQL in the Yesql style:
[https://github.com/KMahoney/pyesql](https://github.com/KMahoney/pyesql)

~~~
draegtun
It's good to see the SQL Phrasebook Pattern still in use. They were quite
popular before ORMs caught on. I'm currently writing one in Rebol so they must
have come back into fashion :)

There is an another (slightly older) Clojure library -
[https://github.com/ray1729/sql-phrasebook](https://github.com/ray1729/sql-
phrasebook)

This was based on the venerable Perl CPAN module -
[https://metacpan.org/pod/Data::Phrasebook::SQL](https://metacpan.org/pod/Data::Phrasebook::SQL)

Also an interesting read -
[http://www.perl.com/pub/2002/10/22/phrasebook.html](http://www.perl.com/pub/2002/10/22/phrasebook.html)
|
[http://ootips.org/yonat/patterns/phrasebook.html](http://ootips.org/yonat/patterns/phrasebook.html)

------
yahelc
This is amazingly well-timed. Just yesterday I was complaining that the Python
SQL libraries are annoying and verbose, and that I wished there was a library
as good as requests is for HTTP.

~~~
masklinn
wrt basic behaviour, this library seems to save all of two lines which can
easily be bundled in a convenience function:

    
    
        def query(connection, q):
            cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
            cursor.execute(q)
            return iter(cursor.fetchone, None)
    
        # db = records.Database('postgres://...')
        db = psycopg2.connect('postgres://...')
        # rows = db.query('select * from active_users')
        rows = query(db, 'select * from active_users')
        # rows.next()
        next(rows)
        # for row in rows:
        #    spam_user(name=row['name'], email=row['user_email'])
        for row in rows:
            spam_user(name=row['name'], email=row['user_email'])
        # rows.all()
        list(rows)

~~~
monkmartinez
You are right, thanks for the example. I checked the psycopg2 library after
reading the announcement which drove my confusion. Psycopg is a super
straightforward db driver. I wonder if this is the start of an orm built on
top of psycopg2...? color me confused.

------
eeZi
For interactive data analysis, this is one _very_ nice iPython library:

[https://github.com/catherinedevlin/ipython-
sql](https://github.com/catherinedevlin/ipython-sql)

Bonus points for native Pandas integration.

------
pmorici
Why not just use Psycopg directly?

------
Animats
Amusingly, this is the other extreme from Diesel for Rust, discussed
yesterday. This is the totally dynamic approach, while Diesel is the totally
static approach.

~~~
dikaiosune
That's a good point. As much as I love python, I think (for me) this
illustrates why I tend to switch languages once I get over ~500-1000 lines.
Seeing diesel was the exact opposite for me - total overkill when I just need
a report, but very tempting if I'm going to have to evolve and maintain a
larger code base.

~~~
natrius
The Python community just doesn't have good tools to type dictionaries. In
JavaScript, you can use a tool like tcomb[1] that lets you specify the
structure of your data and add functionality onto that data. Throwing raw
dictionaries around in your program isn't maintainable, but if you adopt the
Clojure ethos that functions operate on data that meets a set of requirements,
you'll have maintainable code without a type system that gets in your way when
you don't need it.

In general, I think most Python developers would be happier with Clojure or
JavaScript written in a functional style (e.g. heavy use of tcomb,
Immutable.js, and transducers). I'm happier, at least.

[1] [https://gcanti.github.io/tcomb/](https://gcanti.github.io/tcomb/)

------
vram22
There is also PyDAL, which seems roughly at the same abstraction level as
SQLAlchemy Core:

[https://pypi.python.org/pypi/pyDAL](https://pypi.python.org/pypi/pyDAL)

Small example of use:

[http://jugad2.blogspot.in/2014/12/pydal-pure-python-
database...](http://jugad2.blogspot.in/2014/12/pydal-pure-python-database-
abstraction_30.html)

------
sergiotapia
Reminds me of RedbeanPHP:
[http://www.redbeanphp.com/index.php?p=/querying](http://www.redbeanphp.com/index.php?p=/querying)

You would just require the single redbean.php file, and you could query your
database. The fact that this library is similar is clearly a great sign!

------
bitwize
Eleven years ago I actually wrote something like this in Scheme for a bespoke
Web app. I called it QB (for Query Builder) and basically implemented a useful
subset of SQL as a macro-based language inside Scheme. When you unquoted
Scheme expressions into a query, it did the Right Thing in terms of escaping
strings, etc.

------
talideon
I'm not in the practice of self-promotion, but I'll just throw this out there:
[https://github.com/kgaughan/dbkit](https://github.com/kgaughan/dbkit)

------
malkia
Is there support for the binary protocol (it's much more faster for some
things) - e.g. avoid coverting back and forth floating point numbers.

------
heed
Overall cool wrapper but is it safe to simply overwrite the cursor variable
without closing it first? Will that clear fetched result sets out of memory?

------
Beltiras
I suppose this tool has it's place in the ecosystem. I just wonder why anyone
would use this when there are ORMs ....

EDIT: I'm not saying this is a waste of time from a developer who obviously
puts out quality projects. I'm saying I don't see _myself_ using it. I'm most
familiar and comfortable with the Django ORM and have overseen a large project
with it. There were several raw queries when I took over but once I left it
there were none and all the replacement I did was clean ORM queries more
performant than their raw SQL predecessors.

~~~
xyzzy4
ORMs are more difficult to use than queries, and they often make them slower.
It's lose/lose

~~~
collyw
They aren't always more difficult.

I am pretty familiar with the Django ORM where you hit limitations fairly
early (if you are writing any moderately complex queries). But I still use it
for maybe 90% of the stuff I do because that 90% involves pretty simple
queries, and it saves a ton of typing, and integrates well with the rest of
the Django ecosystem.

Updating data is a good example. I recently split up some database strings
into multiple columns. At first I just wrote the SQL to update a joined table.
Then I needed to get regexes involved to pull out certain parts of the string.
Doing that in SQL gets ugly, and its far simpler just to use the ORM to pull
out the data required and operate on it in Python on a loop.

I don't see the ORM as an either / or, but as a complimentary tool to SQL.

------
edward
v0.2.0:
[https://github.com/kennethreitz/records/commit/27ed8d0292529...](https://github.com/kennethreitz/records/commit/27ed8d02925290980626783ffacd83c3b23dac02)

------
Myrmornis
I assume this is just the early stages of something the author was considering
working on.

------
saurik
I have something similar, that I think "goes further" than this, which I use
as part of my Python Cydia backends. I got extremely addicted to using the
moral equivalent of TCL's uplevel, inspect.currentframe(-...), to build
interfaces in Python where I don't have to pull out parameters. My "cyql"
interface (which is barely anything, really: just a thin wrapper for psycopg2,
which is itself excellent; in Clojure I wrap a slightly modified copy of the
PostgreSQL JDBC driver to provide an even better implementation that does
"compile"-time type SQL statement verification), uses this to allow me to
remove what feels like _all_ of the boilerplate.

[http://gitweb.saurik.com/cyql.git](http://gitweb.saurik.com/cyql.git)

In addition to .run (which returns the number of affected rows) and .all
(which returns an array of ordered dicts?), I have .has (which returns a bool
and wraps an exists query), .gen (which returns a generator and iterates a
cursor), and .one (which verifies you only got back a single row and returns
just that row). I also have easy support for transactions (either on an
existing connection or in one line as part of making a connection), easily
turning off synchronous commit (for log tables), and I carefully manage
autocommit to make certain that I am using the minimum number of possible SQL
statements to the server (which I care about a lot).

[http://gitweb.saurik.com/cyql.git/blob/HEAD:/__init__.py](http://gitweb.saurik.com/cyql.git/blob/HEAD:/__init__.py)

That said, I agree with the post by masklinn below: neither the linked library
nor my library are solving problems that I believe are worth inheriting code
from someone else over. There is some basic configuration of psycopg2 which is
necessary, but the underlying library itself is what works here. I have spent
over a decade thinking about how I like to build SQL interfaces, and have now
implemented a similar interface for myself in numerous languages, each time
evolving the design slightly (and sometimes having enough of an epiphany that
I go back and retrofit some of the older ones), but it ends up being built
around the way I think about stuff.

[https://news.ycombinator.com/item?id=11053877](https://news.ycombinator.com/item?id=11053877)

And that also means that as I learn more and "level up", I start making
different decisions. My implementation in Clojure stresses stored procedures a
lot more, as while it took me a long time to really figure out how to use them
in my workflow, I now see them as exceedingly correct and feel a lot of the
code I've written in the past where I had tons of free statements is
essentially "what I wrote from back when I didn't know how to use the database
to organize my API layer" (though I still haven't worked out some of the
tooling around shifting to stored procedures, and have been distracted with
other higher-level problems the last couple years).

Essentially, I'm arguing that the same will happen to you. Put differently:
some problems are hard, and some problems are easy; I find a lot of libraries
that seem to be solving easy problems that new developers think are hard, and
a lot of libraries that pretend to solve a hard problem, but only because the
problem looked easy and the result doesn't actually work (such as the
PostgreSQL drivers that were available in Ruby for a long time, which were all
unusably bad). Wrapping something that works well so it is slightly easier for
you to use can be valuable if it is upstreamed into the original project, but
even then is likely to be something you will paper over yourself in time as
you will think about the problem differently than they did.

    
    
        # at the top of the code somewhere
        dsn = {'port': ..., 'user': '...', 'password': '...', 'database': '...'}
        
        with cyql.connect(dsn) as sql:
            provider, account, key = sql.one('''
                select
                    "payment"."provider",
                    "payment"."account",
                    "payment"."transaction"
                from "cydia"."payment"
                where
                    "payment"."id" = %(payment_id)s
            ''')
        
        with cyql.connect(dsn) as sql:
            sql.run('''
                update "cydia"."token" set
                    "token" = %(token)s,
                    "email" = %(email)s,
                    "country" = %(country)s,
                    "shipping" = %(shipping)s,
                    "billing" = %(billing)s,
                    "data" = %(data)s
                where
                    "id" = %(token_id)s and
                    "token" is null
            ''')

------
ForHackernews
Meh. SQLAlchemy will already do almost exactly the same thing, if you use its
lower-level functions and avoid the fancy ORM layer. It's also cross-database,
so you can use the same code with SQLite for development, and deploy to MySQL
or Postgres.

~~~
rspeer
Why would you want to switch databases when you deploy? That's asking for bugs
that only get revealed in production.

Also, the intersection of SQLite, MySQL, and Postgres is a pretty terrible
database. You can be a lot more effective if you decide which one you're
writing for.

~~~
ForHackernews
> Why would you want to switch databases when you deploy? That's asking for
> bugs that only get revealed in production.

Only if you deploy directly to production!

Any competent organization should have _at least_ a staging environment (and
probably some other pre-staging testing environments) where you deploy and run
your full application stack, and only promoted verified builds to production
after they pass QA on earlier environments.

~~~
ta0967
the top-level comment in this thread praises SQLAlchemy for being "cross-
database, so you can use the same code with SQLite for development, and deploy
to MySQL or Postgres.", and your parent just says that's a recipe for
disaster.

~~~
ForHackernews
It's pretty common to run SQLite on your local machine while developing.
Apparently the parent is confused by reading "deploy" as "deploy directly to
production".

------
stuaxo
Does it have a safe way to build queries, as well as the pure sql ?

------
alan-wagner
so, is nobody going to point out that Han Solo is _Harrison_ Ford, not Henry
Ford? :)

~~~
kenneth_reitz
I'm going to pretend that was totally intentional.

------
_paulc
It looks like a number of people have done something similar so though I would
mention my attempt - pgwrap [1][2]. This is also a thin wrapper over psycopg2.
This provides the following -

    
    
        * Simplified handling of connections/cursor
            * Connection pool (provided by psycopg2.pool)
            * Cursor context handler 
        * Python API to wrap basic SQL functionality 
            * Simple select,update,delete,join methods extending the cursor 
              context handler (also available as stand-alone methods which
              create an implicit cursor for simple queries)
        * Query results as dict (using psycopg2.extras.DictCursor or any other PG Cursor factory)
        * Callable prepared statements
        * Logging support
    

Essentially you can do stuff like:

    
    
      >>> import pgwrap
      >>> db = pgwrap.connection(url='postgres://localhost')
      >>> with db.cursor() as c:
      ...     c.query('select version()')
      [['PostgreSQL...']]
      >>> v = db.query_one('select version()')
      >>> v
      ['PostgreSQL...']
      >>> v.items()
      [('version', 'PostgreSQL...')]
      >>> v['version']
      'PostgreSQL...'
    
      >>> db.create_table('t1','id serial,name text,count int')
      >>> db.create_table('t2','id serial,t1_id int,value text')
      >>> db.log = sys.stdout
      >>> db.insert('t1',{'name':'abc','count':0},returning='id,name')
      INSERT INTO t1 (name) VALUES ('abc') RETURNING id,name
      [1, 'abc']
      >>> db.insert('t2',{'t1_id':1,'value':'t2'})
      INSERT INTO t2 (t1_id,value) VALUES (1,'t2')
      1
      >>> db.select('t1')
      SELECT * FROM t1
      [[1, 'abc', 0]]
      >>> db.select_one('t1',where={'name':'abc'},columns=('name','count'))
      SELECT name, count FROM t1 WHERE name = 'abc'
      ['abc', 0]
      >>> db.join(('t1','t2'),columns=('t1.id','t2.value'))
      SELECT t1.id, t2.value FROM t1 JOIN t2 ON t1.id = t2.t1_id
      [[1, 't2']]
      >>> db.insert('t1',{'name':'abc'},returning='id')
      INSERT INTO t1 (name) VALUES ('abc') RETURNING id
      [2]
      >>> db.update('t1',{'name':'xyz'},where={'name':'abc'})
      UPDATE t1 SET name = 'xyz' WHERE name = 'abc'
      2
      >>> db.update('t1',{'count__func':'count + 1'},where=   {'count__lt':10},returning="id,count")
      UPDATE t1 SET count = count + 1 WHERE count < 10 RETURNING id,count
      [[1, 1]]
    

Also it allows you to create callable prepared statements (which I find really
useful in structuring apps):

    
    
      >>> update_t1_name = db.prepare('UPDATE t1 SET name = $2 WHERE id = $1')
      PREPARE stmt_001 AS UPDATE t1 SET name = $2 WHERE id = $1
      >>> update_t1_name(1,'xxx')
      EXECUTE _pstmt_001 (1,'xxx')
    

[1]
[https://github.com/paulchakravarti/pgwrap](https://github.com/paulchakravarti/pgwrap)
[2] [https://pypi.python.org/pypi/pgwrap](https://pypi.python.org/pypi/pgwrap)

