Hacker News new | past | comments | ask | show | jobs | submit login
Records: Python library for making raw SQL queries to Postgres databases (github.com)
257 points by infinite8s on Feb 7, 2016 | hide | past | web | favorite | 129 comments

I had planned to "show it to HN" soon, but the occasion is too good now: I have also created a thin "psycopg2 wrapper", pretty much in the same spirit as Records (but different in purpose), which allows you to perform SQL operations with plain data structures:


I have been using this for myself for quite some time, but feedback would be very appreciated!

Edit: A cool thing about it (I believe) is that it provides an "upsert wrapper" which makes use of either the new PG 9.5 "on conclict" mechanism if it's available, or an "update if exists, insert if not" two-step fallback, for previous versions.

Neat library. Just to note that this approach to fake upsert is not safe to concurrent sessions. There were exactly two concurrency-safe ways to implement upsert prior to 9.5:

1. An exclusive lock on the table, which is slow as hell 2. A PGPSQL function that attempts to INSERT, catches the duplicate key exception and instead UPDATEs (also handling the case that the row vanishes), which is slower than hell

This is why ON CONFLICT is such a huge deal. If it were as simple as checking keys with a SELECT then sending some to INSERT and some to UPDATE it would just be nice syntax sugar.

> 2. A PGPSQL function that attempts to INSERT, catches the duplicate key exception and instead UPDATEs (also handling the case that the row vanishes), which is slower than hell

That's not true, at least not when you're operating on a single row at a time. According to my benchmarks the function approach was usually within 10% of the native upsert -- even outperforming it by around 5% depending on data distribution. For insert-or-select (instead of update) the function consistently outperformed INSERT ... IGNORE.

Good point -- I was talking in the context of bulk upserts but didn't make that explicit.

Wow - this looks really really similar to my wrapper library (pgwrap [1]). See comment below [2]

[1] https://github.com/paulchakravarti/pgwrap [2] https://news.ycombinator.com/item?id=11054944

Why would I use it instead of SQLAlchemy? Simplicity?

Sorry, I don't see the point of this library. What exactly does it provide? It looks like a very thin wrapper (150 lines of code) over psycopg2 that does exactly the same thing, just with a slightly different API.

More than one library by Kenneth Reitz can be described in this way. It is a way of complaining about the color of the bikeshed.

I wonder if Reitz considered submitting a PR to psycopg2 with his ideas for improvement?

This looks like a (very thin) wrapper of the psycopg2 library, which implements the python DBAPI. Tablib functionality is integrated for bonus points. 137 lines.

So am I correct in my observation that this is just a different way to use/call the psycopg2 library?

The more I look at the code, the more I wonder where is this going? Will the psycopg2 requirement become a problem or a benefit over time?

SQLalchemy is pretty darn good.... there are more ways to connect to a DB in python than I shake a stick at. I guess I am confused...

Edit: Fixed wording on DBAPI

> This looks like a (very thin) wrapper of the psycopg2 library, which is a wrapper around the python DB API

psycopg2 is an implementation of DBAPI2 (with extensions). As the name denotes, DBAPI is an API which database libraries ought implement, it's not a library.

You are right, thanks.

>Rows are represented as standard Python dictionaries

That seems inefficient. You're always going to have the same columns in each row, so storing column names for each row is a huge waste of memory. Django's "cursor.fetchall" returns a list of tuples, which seems to be a saner way to represent db rows in Python.

Reitz has a public Facebook post on this. Here are his comments:

Harry Percival Looks nice! Y u no namedtuple?

Kenneth Reitz Dictionaries are a much more predictable interface ;)

Kenneth Reitz Just because you can do something, doesn't mean you should!

Well, that's creepy.

Hahaha... I can only imagine. Sorry to creep you out. Your Facebook profile is just a click away from your personal website, and your discussion about Records is up front and center on your feed for me :)

%s/'s/ is not/i Fixed it, you're welcome.

I learned from a coworker recently that namedtuple is a much lower memory usage than a dictionary, so if you're dealing with lots and lots of data regularly, and lots of processes on a single machine, (i.e. memory is a concern) namedtuple might be a better choice.

One argument I've seen for using dictionaries is that namedtuples restrict you to column names that are valid Python identifiers, which is a smaller set than postgres allows for column names (postgres allows the '$' and non-Latin characters).

So does python:

>>> d = {'ñanáöß$': 1} >>> d {'ñanáöß$': 1} >>> d.keys() dict_keys(['ñanáöß$']) >>>

The parent was talking about namedtuple.

    >>> collections.namedtuple("test", "$")
    Traceback (most recent call last):
    ValueError: Type names and field names can only contain alphanumeric characters and underscores: '$'

That's because attribute names can't be called "$", so `sometuple.$` would be a syntax violation.

I guess everyone is aware of that, thanks,

The parent comments were discussing why choosing dicts over namedtuples for db-originated columns, and the fact that field names are restricted on namedtuples but not on dicts is a very compelling point.

Making examples usind dicts and explaining why namedtuples have restrictions is completely missing the point.

HN supports "<pre><code>" blocks in a similar manor to Markdown, but using a two-space prefix (rather than Markdown's 4-space prefix):

  >>> d = {'ñanáöß$': 1}
  d {'ñanáöß$': 1}
  >>> d.keys()

Named tuples seem to offer the best of both worlds, as they're immutable.

This memory waste argument is only true if you want to fetch all rows, which you could possibly avoid with different techniques. Otherwise, a dictionary seems like a very simple, convenient, Pythonic, and intuitive representation of a row.

This lib also looks brand new, so there is plenty of time for the community to optimize the hell out of it.

If this were a big deal, it could be optimized using something like a key-sharing dict implementation [1]. You can keep the dict interface to accommodate column names that aren't valid identifiers, and avoid most of the memory overhead.

[1] https://www.python.org/dev/peps/pep-0412/

python-sqlite3's RowFactory does this as well.

I've made a pull request implementing namedtuple support at https://github.com/kennethreitz/records/pull/6.

I'm also a fan of NamedTupleCursor. Not sure how NamedTuples compare to raw tuples in terms of memory consumption, but they make code look a lot nicer.

They're essentially the same footprint IIRC.

Namedtuple CPU-speed performance for some common operations is TERRIBLE compared to dictionaries, at least on 2.7. Orders of magnitude difference. So bad that it really matters. It's shocking how bad the implementors got this.

namedtuple is just a thin wrapper around tuple. It literally constructs a class definition that sub-classes tuple as a string and then executes it. You can see the string template that it uses here[1]. If you're interested in something like namedtuple there are other[2] things[3] you can use depending on your use-case.

[1] https://github.com/python/cpython/blob/master/Lib/collection...

[2] http://stackoverflow.com/a/2648186

[3] https://pypi.python.org/pypi/frozendict/

It's really weird that immutable tuples take a performance hit, compared to mutable lists (almost 50% on access!?!).

That's because Python 2 has a special case for lists[1]. This does not exist in Python 3[2] (and therefore neither does the performance difference).

[1] https://github.com/python/cpython/blob/2.7/Python/ceval.c#L1...

[2] https://github.com/python/cpython/blob/master/Python/ceval.c...

It looks like list access got slower as a result though.

Curious what the operations you're talking about are. Get?

Pickling, for one.

  >>> nt = namedtuple('nt', 'a b')
  >>> nt10k = [nt(1, 2) for i in range(10000)]
  >>> dict10k = [{'a':1, 'b':2} for i in range(10000)]
  >>> timeit pickle.dumps(nt10k)
  10 loops, best of 3: 26.3 ms per loop
  >>> timeit pickle.dumps(dict10k)
  100 loops, best of 3: 2.49 ms per loop

If you look at the the StackOverflow in my sibling comment to yours, running 'obj.attrname' on a namedtuple takes longer because it needs to translate 'attrname' to an integer index and then run (e.g.) 'obj[0]'. Outside of that, I'm not sure.

probably so you don't have to remember what order they were returned in

Looks awesome. A couple questions:

If you do a query that returns 100 records and use rows.next(), does it just get them from the cursor without pulling down the rest from the database? Do you need to dispose of the cursor or is it automatically cleaned up?

The export formats are just a an attribute, not a method call. It would need to use @property IIRC. Does this go against the guideline in Python that "explicit is better than implicit"?

He's using cursors directly from the his database library[1] which is psycopg2[2] (and we can also see that the cursor_factory is how the dicts are generated). The psycopg2 docs say[3]:

  When a database query is executed, the Psycopg cursor usually
  fetches all the records returned by the backend, transferring
  them to the client process. If the query returned an huge amount
  of data, a proportionally large amount of memory will be
  allocated by the client.
[1] https://github.com/kennethreitz/records/blob/master/records....

[2] https://github.com/kennethreitz/records/blob/master/records....

[3] http://initd.org/psycopg/docs/usage.html#server-side-cursors

I don't get why the export functionality would be combined into a query helper. Aren't those two quite different functions?

Since he's not using server side cursors, I believe all rows are pulled into ram when the query executes.

It does look neat, but what would the advantages be over using the SQLAlchemy tools (not the ORM)?

This does look a lot simpler to use, especially if I want to quickly run a console, grab some data, and work on it.

A psql shell doesn't have the power of python once I get the results, and SQLAlchemy isn't as trivial to use.

Interesting. I love requests. SQL isn't my area of expertise so im genuinely surprised in 2016 we're still building libraries like this for Python; are existing libraries really that bad?

> are existing libraries really that bad?

I would say 'no'. SQLAlchemy is very good, but extremely complex and powerful. I think people don't understand that you can use the core features without any of the ORM overhead, if you don't want it.

If you want to "Just write SQL" you can do that, easily: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#usin...

I'm just curious: in the first example of the section that you linked, what's the purpose of the "|| ', ' || " bit? I think the double pipe is used for concatenation in SQL, but I don't understand why the ', ' part needs to be concatenated like that.

that example is intending to produce a single column result that concatenates two fields into a comma-separated string, e.g. "your name, you@somesite.com".

I think jumping into that example directly is probably kind of disorienting. The SQL in that string is a little bit weird/contorted just to illustrate, "it's text, do whatever you want!". I also wrote that example like ten years ago.

Oh OK I get it, it makes perfect sense but yes, jumping to that example directly might be disorienting. Maybe a one-line explanation (just like the one you gave me) placed immediately above or below the code example would make it more readable!

SQLAlchemy tries to provide a pythonic wrapper for most database operations, in an implementation-agnostic way, which can be a convenience but also makes it large and complex. It really isn't the thing you want to use if what you want is to just send a plain snippet of SQL to your database. There isn't a simple 'requests-for-sql' type of thing, until now.

No they aren't, this just looks like an itch that has been scratched. Outside of the ORM libraries there isn't much fanfare for this sort of thing though.

I spent a couple of hours looking for exactly this yesterday for a new Tornado based project and ended up using queries: https://github.com/gmr/queries which seems on the same level for my purposes of wanting to cleanly pass raw SQL and get python data structures in return. Queries also provides asynchronous API interaction with Tornado which will be nice for me.

Coincidently queries was inspired by Kenneth Reitz's work on requests.

I think it is often tempting to write something like this as it is quite a light thing and can reacquaint someone with SQL if they haven't used it for a while. I know I was tempted after pouring through ORMs.

[edit] despite negative expectations, I still upvoted OA cause even "wrong" innovation is better than stagnation.

pyscopg2 is pretty great. Every lib I've seen is based on it. Many apps use it directly. It is the lowest, comfortable layer you'd want. What you need on top of that is Either;

  1) So minor, most just write a few helper functions/classes

  2) So elaborate and opinionated it needs years of use/work to get right. SQLAlchmey, Django ORM.

  3) So specific, customized to your requirements that general purpose libs/frameworks don't fit.
Some people get tired of redoing #1 or don't like #2's opinions or think there should be something lighter. So, they build something they hope to be middle ground. I don't think there is a middle ground. As you use it you realize the complexity (dbs/SQL are much, much more complicated/powerful than you think, the impedance between relations and objects) and middle ground projects remain "toys" or grow into SQLAlchemy.

This works on multiple databases, yet still uses plain old sql. I think it's simpler than using a multi-database ORM for plain old sql, or using a databases' driver directly, like psycopg2.

Edit: actually it just works on postgres. I don't know if there are plans to make it work on other SQL databases.

Really? Have you looked at the code? It looks like it's hardwired to psycopg2 and only works with Postgres. I could be wrong.

The github page says "Records is a very simple, but powerful, library for making raw SQL queries to Postgres databases" - so it does seem PG specific

I have now, and you are correct. It also says so in the README. The API looks like it's designed to be compatible with any SQL database, though.

> are existing libraries really that bad?

No, but they're meant for other use-cases. This fills in a gap for "simple, but powerful", as advertised. :)

How to connect to my db ?

>>> db = records.Database('postgres://localhost:5432')

>>> rows = db.query('select * from customer_customer')

Traceback (most recent call last): File "<console>", line 1, in <module> File "/Users/ns/teabox_django_env/lib/python2.7/site-packages/records.py", line 109, in query c.execute(query, params) File "/Users/ns/teabox_django_env/lib/python2.7/site-packages/psycopg2/extras.py", line 223, in execute return super(RealDictCursor, self).execute(query, vars) ProgrammingError: relation "customer_customer" does not exist LINE 1: select * from customer_customer

I've tried lots of different python sql libraries and the only one that I didn't have to abandon because of limitations is sqlalchemy.

This sounds nice, but I'm skeptical.

It still needs a bit of work, but I wrote a small library for Python to use SQL in the Yesql style: https://github.com/KMahoney/pyesql

It's good to see the SQL Phrasebook Pattern still in use. They were quite popular before ORMs caught on. I'm currently writing one in Rebol so they must have come back into fashion :)

There is an another (slightly older) Clojure library - https://github.com/ray1729/sql-phrasebook

This was based on the venerable Perl CPAN module - https://metacpan.org/pod/Data::Phrasebook::SQL

Also an interesting read - http://www.perl.com/pub/2002/10/22/phrasebook.html | http://ootips.org/yonat/patterns/phrasebook.html

Awesome! I've always wanted something like this. Thanks for sharing.

This is amazingly well-timed. Just yesterday I was complaining that the Python SQL libraries are annoying and verbose, and that I wished there was a library as good as requests is for HTTP.

wrt basic behaviour, this library seems to save all of two lines which can easily be bundled in a convenience function:

    def query(connection, q):
        cursor = connection.cursor(cursor_factory=psycopg2.extras.DictCursor)
        return iter(cursor.fetchone, None)

    # db = records.Database('postgres://...')
    db = psycopg2.connect('postgres://...')
    # rows = db.query('select * from active_users')
    rows = query(db, 'select * from active_users')
    # rows.next()
    # for row in rows:
    #    spam_user(name=row['name'], email=row['user_email'])
    for row in rows:
        spam_user(name=row['name'], email=row['user_email'])
    # rows.all()

You are right, thanks for the example. I checked the psycopg2 library after reading the announcement which drove my confusion. Psycopg is a super straightforward db driver. I wonder if this is the start of an orm built on top of psycopg2...? color me confused.

For interactive data analysis, this is one very nice iPython library:


Bonus points for native Pandas integration.

Why not just use Psycopg directly?

Amusingly, this is the other extreme from Diesel for Rust, discussed yesterday. This is the totally dynamic approach, while Diesel is the totally static approach.

That's a good point. As much as I love python, I think (for me) this illustrates why I tend to switch languages once I get over ~500-1000 lines. Seeing diesel was the exact opposite for me - total overkill when I just need a report, but very tempting if I'm going to have to evolve and maintain a larger code base.

The Python community just doesn't have good tools to type dictionaries. In JavaScript, you can use a tool like tcomb[1] that lets you specify the structure of your data and add functionality onto that data. Throwing raw dictionaries around in your program isn't maintainable, but if you adopt the Clojure ethos that functions operate on data that meets a set of requirements, you'll have maintainable code without a type system that gets in your way when you don't need it.

In general, I think most Python developers would be happier with Clojure or JavaScript written in a functional style (e.g. heavy use of tcomb, Immutable.js, and transducers). I'm happier, at least.

[1] https://gcanti.github.io/tcomb/

There is also PyDAL, which seems roughly at the same abstraction level as SQLAlchemy Core:


Small example of use:


Reminds me of RedbeanPHP: http://www.redbeanphp.com/index.php?p=/querying

You would just require the single redbean.php file, and you could query your database. The fact that this library is similar is clearly a great sign!

Eleven years ago I actually wrote something like this in Scheme for a bespoke Web app. I called it QB (for Query Builder) and basically implemented a useful subset of SQL as a macro-based language inside Scheme. When you unquoted Scheme expressions into a query, it did the Right Thing in terms of escaping strings, etc.

I'm not in the practice of self-promotion, but I'll just throw this out there: https://github.com/kgaughan/dbkit

Is there support for the binary protocol (it's much more faster for some things) - e.g. avoid coverting back and forth floating point numbers.

Overall cool wrapper but is it safe to simply overwrite the cursor variable without closing it first? Will that clear fetched result sets out of memory?

I suppose this tool has it's place in the ecosystem. I just wonder why anyone would use this when there are ORMs ....

EDIT: I'm not saying this is a waste of time from a developer who obviously puts out quality projects. I'm saying I don't see myself using it. I'm most familiar and comfortable with the Django ORM and have overseen a large project with it. There were several raw queries when I took over but once I left it there were none and all the replacement I did was clean ORM queries more performant than their raw SQL predecessors.

It's for people like me who do not use ORMs. I also use things like procedures and multiple result sets. I'm downright vulgar these days.

Because not everything is an Object. Not every program uses Objects. OO is only one (an overused one) of several software paradigms.

RDBMS <-> Objects are a poor match. So much so, many pursued NoSQL to find relief from the pain.

Maybe this is just me being so accustomed to my hammers that I solve everything like it was a nail. My toolset involves exactly that: modelling the space of requests with a complete enough relational database that I can find columns and keys for all incoming data. For the occasional tidbit that doesn't fit properly, stick it in the _extra json field.

ORMs are more difficult to use than queries, and they often make them slower. It's lose/lose

They aren't always more difficult.

I am pretty familiar with the Django ORM where you hit limitations fairly early (if you are writing any moderately complex queries). But I still use it for maybe 90% of the stuff I do because that 90% involves pretty simple queries, and it saves a ton of typing, and integrates well with the rest of the Django ecosystem.

Updating data is a good example. I recently split up some database strings into multiple columns. At first I just wrote the SQL to update a joined table. Then I needed to get regexes involved to pull out certain parts of the string. Doing that in SQL gets ugly, and its far simpler just to use the ORM to pull out the data required and operate on it in Python on a loop.

I don't see the ORM as an either / or, but as a complimentary tool to SQL.

Not really. Very complex queries are usually quite simple in django, while I'd have to sit down a while to think how to write the raw sql myself.

It does have the overhead of having to learn the ORM, but once you do that, it pays off.

I've yet to encounter a query I could not write using the Django ORM effectively.

ORMs are pretty terrible for anything to do with analytics or data science, for example.

Really anything that wants to pull all your data back into python is going to be horrible for analytics. What's really needed in this space is something that is smart enough to push most data-intensive operations down to the database and pull only the minimum amount of stuff back to python to do anything that's CPU intensive.

Because maybe an ORM is overkill sometimes (frequently? always?). I frequently find myself trying to figure out how to do something in the ORM "DSL" that's pretty easy (for me) in SQL.

In addition to the array of other correct responses, I'd all stored procedures or functions. They're so much easier to interact with via SQL, if the ORM supports them at all.

If you've already got the SQL handy, this is pretty useful.

Anyone who has ever used ORMs would know the answer :)

One example is if you're using the JSON features of Postgres. ORMs usually are not suited to these kinds of use cases.

Peewee orm supports json, jsonb, hstore, arrays, full text search, server side cursors...

Django's ORM has the JSONB and features from HSTORE for PostgreSQL.

> Django's ORM

That's the catch right there.

Because one tool never fits all situations.

Because doing something as simple as `select sum(length(content)) from app_comment;` in Django ORM is not possible.

It's possible with the ORM, but much more complicated than the SQL you intend to get:



`.all()` is completely redundant (it just clones the queryset).

What is the generated SQL?

You can still use SQL from the Django ORM.

Exactly, using .raw

I assume this is just the early stages of something the author was considering working on.

I have something similar, that I think "goes further" than this, which I use as part of my Python Cydia backends. I got extremely addicted to using the moral equivalent of TCL's uplevel, inspect.currentframe(-...), to build interfaces in Python where I don't have to pull out parameters. My "cyql" interface (which is barely anything, really: just a thin wrapper for psycopg2, which is itself excellent; in Clojure I wrap a slightly modified copy of the PostgreSQL JDBC driver to provide an even better implementation that does "compile"-time type SQL statement verification), uses this to allow me to remove what feels like all of the boilerplate.


In addition to .run (which returns the number of affected rows) and .all (which returns an array of ordered dicts?), I have .has (which returns a bool and wraps an exists query), .gen (which returns a generator and iterates a cursor), and .one (which verifies you only got back a single row and returns just that row). I also have easy support for transactions (either on an existing connection or in one line as part of making a connection), easily turning off synchronous commit (for log tables), and I carefully manage autocommit to make certain that I am using the minimum number of possible SQL statements to the server (which I care about a lot).


That said, I agree with the post by masklinn below: neither the linked library nor my library are solving problems that I believe are worth inheriting code from someone else over. There is some basic configuration of psycopg2 which is necessary, but the underlying library itself is what works here. I have spent over a decade thinking about how I like to build SQL interfaces, and have now implemented a similar interface for myself in numerous languages, each time evolving the design slightly (and sometimes having enough of an epiphany that I go back and retrofit some of the older ones), but it ends up being built around the way I think about stuff.


And that also means that as I learn more and "level up", I start making different decisions. My implementation in Clojure stresses stored procedures a lot more, as while it took me a long time to really figure out how to use them in my workflow, I now see them as exceedingly correct and feel a lot of the code I've written in the past where I had tons of free statements is essentially "what I wrote from back when I didn't know how to use the database to organize my API layer" (though I still haven't worked out some of the tooling around shifting to stored procedures, and have been distracted with other higher-level problems the last couple years).

Essentially, I'm arguing that the same will happen to you. Put differently: some problems are hard, and some problems are easy; I find a lot of libraries that seem to be solving easy problems that new developers think are hard, and a lot of libraries that pretend to solve a hard problem, but only because the problem looked easy and the result doesn't actually work (such as the PostgreSQL drivers that were available in Ruby for a long time, which were all unusably bad). Wrapping something that works well so it is slightly easier for you to use can be valuable if it is upstreamed into the original project, but even then is likely to be something you will paper over yourself in time as you will think about the problem differently than they did.

    # at the top of the code somewhere
    dsn = {'port': ..., 'user': '...', 'password': '...', 'database': '...'}
    with cyql.connect(dsn) as sql:
        provider, account, key = sql.one('''
            from "cydia"."payment"
                "payment"."id" = %(payment_id)s
    with cyql.connect(dsn) as sql:
            update "cydia"."token" set
                "token" = %(token)s,
                "email" = %(email)s,
                "country" = %(country)s,
                "shipping" = %(shipping)s,
                "billing" = %(billing)s,
                "data" = %(data)s
                "id" = %(token_id)s and
                "token" is null

Meh. SQLAlchemy will already do almost exactly the same thing, if you use its lower-level functions and avoid the fancy ORM layer. It's also cross-database, so you can use the same code with SQLite for development, and deploy to MySQL or Postgres.

Why would you want to switch databases when you deploy? That's asking for bugs that only get revealed in production.

Also, the intersection of SQLite, MySQL, and Postgres is a pretty terrible database. You can be a lot more effective if you decide which one you're writing for.

targeting both MySQL and Postgresql is very reasonable and is commonplace. Openstack does it, for example. Lots of apps do. This assumes you're doing an app that's mostly CRUD. If you're doing a BI kind of app and need fancy analytical functions, then yes you'd target Postgresql.

Hey, wait. We don't even have a project defined but you're already talking about being effective. Isn't it a bit early? ;)

A lot of CRUD apps (which, I think, a lot of websites/webapps are) don't usually use anything but the basic SQL stuff.

> Why would you want to switch databases when you deploy? That's asking for bugs that only get revealed in production.

Only if you deploy directly to production!

Any competent organization should have at least a staging environment (and probably some other pre-staging testing environments) where you deploy and run your full application stack, and only promoted verified builds to production after they pass QA on earlier environments.

the top-level comment in this thread praises SQLAlchemy for being "cross-database, so you can use the same code with SQLite for development, and deploy to MySQL or Postgres.", and your parent just says that's a recipe for disaster.

It's pretty common to run SQLite on your local machine while developing. Apparently the parent is confused by reading "deploy" as "deploy directly to production".

It's sad that ORMs are always restrictive and raw SQL can't be safely and easily modified. I'd always wished for a library that one could just write SQL, and the library could do the transformations.

I.e. one writes plain SQL strings, but they're parsed under the hood, so AST transformations can be applied, like adding extra WHERE clauses conditionally or applying LIMIT.

Sadly, I haven't found any good Python SQL parsing library. That was long time ago, though - maybe someone had written one already.

Peewee [1] is my tool of choice because of its convenience and flexibility combined. I can create pretty complex queries using its Query API, but also execute arbitrary queries when I know I need something a little more nonstandard. Plus the maintainer (Charles Leifer) is an awesome dude who has always been really helpful to his users.

[1] Peewee: https://github.com/coleifer/peewee

Thanks for the kind words!

That’s sort of like SQL Alchemy Core works[1]. The ORM stuff is all built on top of that and completely optional.

[1] http://docs.sqlalchemy.org/en/latest/core/tutorial.html

Unfortunately, no, that's not what I meant, sorry.

I can't give SQLAlchemy (Core or ORM) a string like "SELECT * FROM foo AS f LEFT JOIN bar AS b ON b.id = f.bar_id WHERE f.baz > %(baz)s" and then transform it, by, say, appending the LIMIT clause or adding extra WHERE condition. AFAIK, there's no way to provide a raw SQL string and then say something like `query.where("NOT f.fnord")` OR `query.limit(10)` and get the updated SQL.

With ORMs or non-object-mapping wrappers if I want transformations, I have to use their own language instead of SQL. I do, but don't really want to.

Or things have changed and this is what SA can do this nowadays? I'll be more than happy to learn that I'm wrong.

you know what the problem is there, that a. parsing your SQL string into a tokenized structure b. altering it based on your given instructions c. re-compiling it back into a string, all on top of an interpreted scripting language, and then d. sending it to a database so that the database can parse it a second time, is just so inefficient, for a particular API style that is arguably not even any better than just learning how to write Core expressions. Core and ORM expressions can be cached to their resultant SQL to save on the tokenized-structure->string step too.

there are certainly SQL parsers that can easily produce such tokenized structures and from a technical standpoint, your API is pretty simple to produce, with or without shallow or deep SQLAlchemy integrations. It's just there's not really any interest in such a system and it's never been requested.

You actually can do that with SA. See the example here: http://docs.sqlalchemy.org/en/latest/core/tutorial.html#spec...

If you specify the columns involved with your text query, then you get back a 'TextAsFrom' object, and you can apply other transformations to it like .where() or .unique().

At least you can do that by adding other SA objects.

I used to do exactly this kind of filtering, by building up by SA query, back in the day (about 2008) when I last used SA in anger.

Agreed! I was heavily exploring building this upon SQLAlchemy, but decided that the first version needed only to support the only database I use regularly: Postgres.

There's a very big chance that SQLAlchemy will be integrated into the project, to allow for connections to multiple database types.

To generate raw SQL queries that is database engine independent, you can use: https://pypi.python.org/pypi/python-sql

Does it have a safe way to build queries, as well as the pure sql ?

so, is nobody going to point out that Han Solo is Harrison Ford, not Henry Ford? :)

I'm going to pretend that was totally intentional.

It looks like a number of people have done something similar so though I would mention my attempt - pgwrap [1][2]. This is also a thin wrapper over psycopg2. This provides the following -

    * Simplified handling of connections/cursor
        * Connection pool (provided by psycopg2.pool)
        * Cursor context handler 
    * Python API to wrap basic SQL functionality 
        * Simple select,update,delete,join methods extending the cursor 
          context handler (also available as stand-alone methods which
          create an implicit cursor for simple queries)
    * Query results as dict (using psycopg2.extras.DictCursor or any other PG Cursor factory)
    * Callable prepared statements
    * Logging support
Essentially you can do stuff like:

  >>> import pgwrap
  >>> db = pgwrap.connection(url='postgres://localhost')
  >>> with db.cursor() as c:
  ...     c.query('select version()')
  >>> v = db.query_one('select version()')
  >>> v
  >>> v.items()
  [('version', 'PostgreSQL...')]
  >>> v['version']

  >>> db.create_table('t1','id serial,name text,count int')
  >>> db.create_table('t2','id serial,t1_id int,value text')
  >>> db.log = sys.stdout
  >>> db.insert('t1',{'name':'abc','count':0},returning='id,name')
  INSERT INTO t1 (name) VALUES ('abc') RETURNING id,name
  [1, 'abc']
  >>> db.insert('t2',{'t1_id':1,'value':'t2'})
  INSERT INTO t2 (t1_id,value) VALUES (1,'t2')
  >>> db.select('t1')
  [[1, 'abc', 0]]
  >>> db.select_one('t1',where={'name':'abc'},columns=('name','count'))
  SELECT name, count FROM t1 WHERE name = 'abc'
  ['abc', 0]
  >>> db.join(('t1','t2'),columns=('t1.id','t2.value'))
  SELECT t1.id, t2.value FROM t1 JOIN t2 ON t1.id = t2.t1_id
  [[1, 't2']]
  >>> db.insert('t1',{'name':'abc'},returning='id')
  INSERT INTO t1 (name) VALUES ('abc') RETURNING id
  >>> db.update('t1',{'name':'xyz'},where={'name':'abc'})
  UPDATE t1 SET name = 'xyz' WHERE name = 'abc'
  >>> db.update('t1',{'count__func':'count + 1'},where=   {'count__lt':10},returning="id,count")
  UPDATE t1 SET count = count + 1 WHERE count < 10 RETURNING id,count
  [[1, 1]]
Also it allows you to create callable prepared statements (which I find really useful in structuring apps):

  >>> update_t1_name = db.prepare('UPDATE t1 SET name = $2 WHERE id = $1')
  PREPARE stmt_001 AS UPDATE t1 SET name = $2 WHERE id = $1
  >>> update_t1_name(1,'xxx')
  EXECUTE _pstmt_001 (1,'xxx')
[1] https://github.com/paulchakravarti/pgwrap [2] https://pypi.python.org/pypi/pgwrap

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact