Hacker News new | past | comments | ask | show | jobs | submit login
Simple SQL in Python (github.com/nackjicholson)
233 points by polyrand on Aug 13, 2020 | hide | past | favorite | 79 comments

The syntax follows https://github.com/krisajenkins/yesql/ which works with Clojure. So then the DB queries can be defined in a language agnostic way, so you don't need a gateway API to access the DB :)

There are many opportunities to grow this, e.g. transforming the SQL files into stored procedures, and being able to lint and check against a DB schema definition. However, I'm not sure that the indirection step of hiding the SQL away from the code actually makes sense in the long run. A tight coupling allows filters and other optimisations to be easily added to the end of a query, and saves one lookup that is not yet supported in your editor.

Hey, thanks for the opportunity ideas! You're totally right about the editor lookup problem. I wrote this project, I use this project in a production setting and not being able to jump to the SQL from code does bother me.

Seems similar in inspiration to https://pugsql.org/ which is a python reimplementation of hugSQL, more or less.

I'd say an advantage of pugSQL is that it's sqla-core under the hood, so lots of stuff will Just Work and you can drop down to core to commit war crimes if you have to.

Exactly my first thought.

Happy user of PugSQL here. I just wish there was a VSCode extension for autocompletion.

I haven't used pug, but I read through it a bit and like their approach. I think they have a more intricate way of parsing SQL than we have in aiosql. We're doing it quick and dirty with some regular expressions.

> I'd say an advantage of pugSQL is that

Plus it’s got a cool name

is there something like this for ruby?

[hp]ugSQL seems to have been inspired by[1] Yesql, a library with ``a similar take on using SQL'' that has also inspired this[2] rom-rb adapter; not a Ruby guy so there may be something a lot better/closer, but this is what I came up with after a couple minutes of searching.

[1] https://www.hugsql.org/#faq-yesql

[2] https://github.com/rom-rb/rom-yesql (documentation: https://www.rubydoc.info/gems/rom-yesql)

Small nitpick: SQLAlchemy is not just an ORM, it provides a lot of functionality to interact with SQL databases without mapping results to objects.

They call the non-ORM part "core". https://docs.sqlalchemy.org/en/13/core/

Indeed, to me the whole point of SQLAlchemy is the composable SQL DSL; I never use the ORM part.

No interest in the tool itself, but love the neutral outline of use cases it's a good/bad fit for - more software should do this.

Go has something similar to this https://github.com/kyleconroy/sqlc

My initial thought is I'd rather just have a module that would find project-related .sql files and parse into named blocks based on the comments, treating them as named strings. No db entanglement required, just a simple way to organize queries in sql files. You would use it more like this:

    import sqlite3
    import project_queries as queries  # a module that loads queries from local .sql files as strings in module namespace

    conn = sqlite3.connect('myapp.db')
    cursor = conn.cursor(queries.get_all_users)
    #                    ^^^^^^^^^^^^^^^^^^^^^
    users = cursor.fetchall()
    # >>> [(1, "nackjicholson", "William", "Vaughn"), (2, "johndoe", "John", "Doe"), ...]
aiosql may add some features to help with passing values to the query. I see `^` used with `:users` in an example but don't quite get it.

    -- name: get-user-by-username^
I often need to generate DDL details like table and column names in the query, which you don't want escaped, along with data like values and ids where you would want escaping (:users or %(users)s).

It works nicely to use string formatting (`CREATE TABLE {table_name} ...`) for DDL along with interpolation (`WHERE user_id IN %(user_ids)s`) in the same query.

Perfect, thank you! I wasn't aware of importlib.resources.

It's new. A good habit I've developed recently is to read all the PEPs since the last thing, and all the library changes. Python 3 is in full swing and some great things are in there already

it’s only a decade old or so ;)

Oh it was a fun ten years

i know how lucky i was to only start on 3.5. Avoided a LOT of pain.

You should check out the typing stuff in 3.8. Combined with mypy, gets a lot of the way there to statically checked python without the hassles of statically typed code.

The "^" marks a query returning one or zero values:


I’ll unhelpfully note that I could swear I’ve seen something that does what you describe, and thought that this project was what I recalled - alas.

AnoSQL is a project along these lines I’ve seen on here before.

I forked aiosql from anosql in order to support asyncio based sql drivers. Huge thanks to Honza Pokorny for starting this adventure.

That’s the one, thank you!

Nice idea, but this pattern hides your queries away from the developer. In practice, developers need the query transparent and not hidden in a file somewhere away from the code they're reading. Otherwise you're building a pattern for using queries without knowing the query's cost. If you need raw SQL, use SQLAlchemy Core and just don't use the ORM features.

I use roughly this approach in Java, and I would argue that it makes the queries more available to the developer.

Yes, they're in a different file. But it's another file in the same codebase, and it's directly referenced by the code that's using it. This means it's not really any harder to track down than the code to a function that lives in a different file. It's really NBD.

And, in return for putting your SQL in .sql files, you get all sorts of nice things. Syntax highlighting for your SQL, for starters. And it's somewhere that autoformatters and linters and SQL unit testing tools and the like can get at easily. And you get SQL code that isn't being forced through whatever laundry wringer and or cheese grater is imposed by your language's string literal syntax. (This last one admittedly isn't such a problem in Python.)

This really isn't too far off from why your JavaScript code is generally happiest in .js files and your CSS is generally happiest in .css files, rather than making them all inline in your HTML.

> And, in return for putting your SQL in .sql files, you get all sorts of nice things. Syntax highlighting for your SQL, for starters.

Until recently I was looking for a good scheme to keep my sql in .sql files primarily for this reason. But some time in the past few releases, pycharm has started detecting SQL in strings, and if I add an instance of my DB as a data source to the project, it autocompletes table/column names, etc.

I still see some appeal to putting my SQL in .sql files, but good IDE support has given me most of the benefit while keeping it in local strings, plus some help with prepared statements that would have required some added effort if I'd moved it to SQL files.

I definitely see an advantage there. But I still tend to prefer non-IDE solutions for things like code formatting and linting, because it means you have the option to configure them once in the build settings and forget about it.

With IDE-based solutions, even if you can guarantee everyone is using the IDE (Which is already something I don't even want to guarantee. Friends don't make vim-loving friends use heavyweight IDEs. But I digress.), you still have to also remember to get everyone to set their IDE up properly, and verify that they did it. Otherwise you have a tendency for lint to collect unnoticed in the code for a while before it becomes obvious that someone missed a step in the onboarding checklist.

> This means it's not really any harder to track down

Right. It’s not much different than a constants file.

What tools let you do this in Java? I'm looking at porting my QueryFirst extension to Java, and I couldn't find anything in the space.


I stick the *.sql files in the resources, and then created my own helper functions to grab them by filename.

I was very new to Java at the time, so I had to spin my wheels on wrapping my head around how Java manages resources for an afternoon. It would have been done in no time at all if I had known about the Reflections library at the time.

It feels just like an include to me. Why can't the developer look in the file?

Some developers just put some hard mental restrictions on themselves.

Sure, with SQLAlchemy Core you might start out by adding some `conn.execute` statements where necessary. But, inevitably, you'll want to use some queries in multiple places. Some of them you'll even want to use in multiple different files. Your schema will surely change over time, and your queries along with them. It would be a shame if you needed to hunt for queries across all of your files and make the necessary changes to each copy of your queries which need to be change.

It seems much more convenient to have some part of your code base, a separate module, which is dedicated to communication with the database.

The rest of your code communicates with that module through function calls and doesn't need to worry about how it's implemented. Realistically, you'll probably always use a database, but this does make it much easier to switch between databases, or decide you want to use a flat-file instead, or even use dynamodb for some high-traffic part which doesn't need much consistency.

Once you have a module, it's incredibly natural to de-duplicate queries. Instead of writing out each query once for each time it's called, you can stick it into a helper function and call that function from many places. You can organize queries by their purpose or which tables they use, making it easier to change all the queries impacted by each schema change.

In a world where you have large queries SQLAlchemy lets you give meaningful names to subqueries and different expressions in the larger queries, making them easier to understand. If you share some subqueries between queries... that module dedicated to working with your db starts to make a lot of sense!

In all, I can think of a lot of situations where "hides your queries away from the developer" is a cost out-weighed by many other benefits.

> It seems much more convenient to have some part of your code base, a separate module, which is dedicated to communication with the database.

That module doesn't have to be .sql file. You can organize your code in whatever language to achieve what you are describing. And have IDE help with Jump/Peek definition.

Yeah this approach is definitely way nicer with some tooling support.

For example https://github.com/cashapp/sqldelight for Kotlin integrates with IntelliJ to provide ctrl+click navigation (see gif in their readme), and https://github.com/simolus3/moor/ for Dart has similar features that integrate with VSCode (https://moor.simonbinder.eu/docs/using-sql/sql_ide/).

(...since everyone is linking their favorite libraries with a similar approach!)

How is this different from having a repository of queries somewhere - this is basically a query repository but not in the active language in fact.

Your queries should be hidden away from the developer - I mean obfuscation is never the goal but separation is... If your query is written inline in some business logic that section of code has poor tests and is unreliable. SQL is complex, I absolutely adore it but it's not simple - keep it isolated from the logic of your system and, if possible, use a layered architecture approach that allows the entire persistence layer to be detachable.

> How is this different from having a repository of queries somewhere - this is basically a query repository but not in the active language in fact.

Looks like they do fancy stuff like template substitution &etc instead of just opening a text file and feeding it to the sql engine.

I was looking at it and thinking "why not just use jinga?" but then you wouldn't get the 'query manager' object and name spacing (and probably other features I've missed).

It actually seems a fairly good way to mix languages, I haven't looked at the code to see what they're really up to but I like the concept.

Sorry - is it because you don't analyze the queries even though you write them (and thus are responsibly for optimization) that is the issue? Or do you have the same complaints about .NETs Entity Framework?

Yes, this feels like it reinvents prepared statements in a less portable way. It kind of goes against the spirit of their opening justification for using it: "SQL is code, you should be able to write it, version control it, comment it, and run it using files." If that's the goal encourage the use of a way to leverage native SQL that could run on any platform or engine.

> If you need raw SQL, use SQLAlchemy Core and just don't use the ORM features.

How heavy is that?

Or as a developer use the orm for ease and ignore learning a new language. Saving a couple miliseconds is really not that important to most devs.

> Saving a couple miliseconds is really not that important to most devs.

Users care.

Not really. Talk to an end user. If the query took minutes it doesnt ruin their day.

Sorry to break the news. But end users have one requirement...don't fucking annoy me

That it. That's the key to any saas.

For me, the new language is Python, not SQL.

Looks kinda nice, but can it handle "IN (...)" statements correctly? Or conditionally adding some WHERE param? If not, this is rather simplistic and couldn't replace writing inline queries, and then I think I'd rather not mix together 2 approaches.

I've built pre-parsing layers for queries that can explode out params properly for IN() handling - though depending on your SQL variant you can also get away with string passing an array to get around that.

Conditional WHERE seems unapproachable with this tool - which is one of my doubts about the need of a tool like this.

If you instead approach your SQL by having each query wrapped in a function in an isolated part of your code base you, as a company/team/whatever, can choose how much logic to let reside inside of those functions - building up conditional WHERE clauses is a very common thing to need to do.

Take a look at JinjaSQL (https://github.com/hashedin/jinjasql). Supports conditional where clauses as well as in statements. It uses Jinja templates, so you get a complete template language to create the queries.

SQL should have been in .sql files from the start. SQL in string literals is an industry-wide decades long aberration. I fixed this problem for C# with QueryFirst. I'm delighted to see all the support here the approach, which is clearly the way forward for data access.


I really like this approach because as you said, it gets really messy to do string replacements inline in a python file.

I'm curious if any people here are comfortable using stored procedures as an alternative to this.

Stored procs give you the benefit of your sql being easy to change in a sql editor, but you also get query planner caching of the results which means it will likely execute faster than string replacement inline sql.

I think it really depends on the project. Stored procs are really nice for a lot of different use-cases, but one major downside is that they can be challenging to version control.

Storing this sort of stuff outside of the database with the Python project itself would make version control trivially easy.

Why are stored procs hard to version control?

They're just text in a text file.

Almost, but not exactly. If you import a Python file you get exactly what's in that file. A stored proc is not tied to a file- you can change the file and forget to update the proc, or change the proc without updating the file.

That's true. I've never attempted to keep procs in VC, but I think it would be pretty easy if you wanted to.

I'm relatively new to Python and I was looking for something like this. I do wonder if there is a well-known Python equivalent of Dapper: https://github.com/StackExchange/Dapper

You don't need Dapper for Python. Python is a dynamic language. If your problem is hydration[0] it can be done easily, even with Python classes.


For query building SQLAlchemy can do that and more.

I think that, with JSON support in most RDBMS, ORM as a concept has become way easier to handle. I come to think that this is the promise of the 90's Object oriented databases being fulfilled somehow.


Hydration is not a big problem. However, Dapper does much more than that: it handles multi-mapping (eg mapping users and posts from a single query), query parameterization etc. Surely, you can do it all ad hoc, but Dapper provides very convenient set of primitives out of the box.

> it handles multi-mapping (eg mapping users and posts from a single query)

You can achieve all that with JSON support in SQL, which most RDBMS have. You can then de-serialize JSON rows into Python objects. No need for an ORM.

Looks like a cool idea. I wonder if I can make it to work easily with this other lovely python library https://github.com/pudo/dataset

Looks like we had the same idea at the exact same time! https://news.ycombinator.com/item?id=24148552

I'm a fan of using raw sql like this but curious if anyone has solved deduplicating common statements and dynamic queries in a good way.

For example:

1) Imagine lots of queries having to show the results in the context of a user and therefore use the same JOIN and WHERE clause all over. Not being DRY, this breaks down when having to change the clause at all.

2) Imagine a reporting page that allows for filtering and ordering by different columns and therefore need some way to compose the final sql.

Don't query builders already let you do that? By making the WHERE clause map to a data structure, you can just duplicate the data structure to duplicate the WHERE clause e.g. https://bokwoon95.github.io/sq/query-building/reusing-where.... (full disclosure: I'm the author).

Total agree and I use query builders all the time. (Yours definitely looks nice.) I was curious though how others might of handled these issues when using libraries like the one posted here which is just raw sql.

+1 for query builders being an excellent choice for a lot of applications and uses. Keeping things DRY being one. We haven't solved the repetition in SQL with aiosql. We also haven't tried. :)

I've always wanted to build a larger app based on this approach. I quite like Massive, and 5 years (!) ago I wrote https://gist.github.com/grncdr/4555208#file-sss-py as a one-off response to a tweet from Ted Dziuba (is he still around?)

Nice approach with annotations in the comments.

I wrote something a little more tounge in cheek a while ago -- I modified SQL syntax to include functions with arguments


Anyone who is attracted to this approach should look into stored procedures instead: https://www.postgresqltutorial.com/postgresql-create-procedu...

Stored procedures are not available for SQLite. For Postgres the syntax for SPs are so horrible that I'd rather use this.

Edit: It's quite telling that the example in the link don't show how to return a resultset from a SP in Postgres.

I like this idea a lot actually. I've been pondering of doing the same thing in Go.

There are some edges though... for example what if you want to do further composition based on if/else clause.

You could have a look at our attempt, it allows dynamic SQL using templating.


Funny to see on HN just started building a version of this that actually generates functions from sql queries so you get all the IDE goodness and sightly smaller runtime overhead

This could go one step further by implementing an import hook, then you could do import users; users.get_all_users(conn)

It's something I've been meaning to try with other languages.

We wrote a similar lib and use namespaces: https://quma.readthedocs.io/en/latest/usage.html#running-que...

Ibis Project is also similar: https://ibis-project.org/

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact