Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Parse your Postgres queries into a fully-typed AST in TypeScript (github.com/pg-nano)
116 points by aleclarsoniv 80 days ago | hide | past | favorite | 21 comments
Hey all, I'm the creator of @pg-nano/pg-parser. I'm using it in pg-nano[1] to statically analyze Postgres schemas spread across multiple SQL files for a couple of reasons:

1. Each CREATE statement needs to be in topological order, so pg-nano's dev command can execute them without issue.

2. pg-nano has a plugin system like Vite that allows SQL generation based on the parsed schema.

Probably to the surprise of no one, working with an untyped AST feels like you're back in the days of JavaScript, because well... you are. Most of you know by now just how great TypeScript and static types in general are, especially if you appreciate SQL.

So why is this project worth sharing with you?

Well, writing the AST type definitions by hand would have taken me way too much time. It would also be a bear to keep up-to-date as Postgres continues to evolve.

To my surprise, I discovered that libpg_query, the C library used under-the-hood, includes JSON definitions in their /srcdata/ folder. I figured I could use them to generate the type definitions. Genius, right? Okay... maybe not genius, but still cool, I think.

You see, those JSON definitions provided by libpg_query? They don't exactly contain the TypeScript definitions (was that obvious?). No, no. I had to translate them into TypeScript definitions. (I'm sure you could have done it, yes yes. But did you? No siree bob)

It was pain-staking, but overall really not too hard. Time-consuming? Yes, but not as much as writing the type definitions by hand. So... was it worth it? Only time will tell. I hope you find it as useful as I do. And that's all I've got, so thanks for reading.

P.S. The build for Windows is broken, so if anyone could lend a hand, you would be a true hero.

[1]: https://github.com/pg-nano/pg-nano (not ready for production use)




Hey, this is really cool.

Suggestion: check out the Slonik library for Postgres. It encourages writing raw SQL using string template literals (i.e. sql`select foo from bar where zed = ${someParam}`). It also supports strong typing of results, but the programmer needs to create their own Zod validators manually to check this and get strong typing support.

Seems like this tool could essentially pre-create the types for any raw Postgres SQL statement?

I think this approach of using raw SQL is much better than a custom query builder like kysely, where half the time I'm just trying to translate the SQL that I know in my head to some custom, library-specific API. But the benefit of using kysely is the automatic typing support. Being able to use raw SQL and get query types "for free" would be amazing.


> Seems like this tool could essentially pre-create the types for any raw Postgres SQL statement?

It's a syntax parser that produces an AST. So only information explicitly defined in the syntax is available. To infer input/output types for an arbitrary SQL command, you need introspection (the most fool-proof way being PQdescribePrepared[1]).

> Being able to use raw SQL and get query types "for free" would be amazing.

That's basically what pg-nano does, but you need to use Postgres functions, rather than "$1" or "?" placeholder templating. Of course, some people prefer co-locating their raw SQL inside their TypeScript files, in which case, pg-nano is not for them.

[1]: https://www.postgresql.org/docs/current/libpq-exec.html#LIBP...


This is interesting. You seem to provide extra functionality besides the typescript types over libpg-query, like the walk function, right? I assume that's the reason these changes can't be easily merged into the main library and you chose to fork entirely.

As an aside, do you think it's possible to use your libraries to get the return type of an arbitrary postgres query, even if it's dynamic? I have a library that requires users to write the return type of queries manually always, and I'd like to automate that step.


The main reason I didn't contribute my changes via PR is I wanted a package without "deparse" support, which adds considerably to the install size. I also didn't want pre-compiled binaries for every supported platform to be published to NPM, preferring a postinstall script that only downloads the necessary binary. I don't know how the walk function would be received by the original maintainers, as I didn't ask.

> do you think it's possible to use your libraries to get the return type of an arbitrary postgres query, even if it's dynamic?

Yes it is. I've solved that exact problem in pg-nano. I use the `describePrepared` feature of libpq: https://github.com/pg-nano/pg-nano/blob/4cca3dbe6be609c479e4...


If it needs to be in topological order, how do you handle tables where the DDL includes mutually referencing columns (column A1 references B, column B1 references A).


That's not supported yet, but it will involve removing the foreign key constraint from one table (by parsing and rewriting the CREATE statement before running it). Then after the referenced table has been created, add the FK constraint with an ALTER statement.

There's also the DEFERRABLE constraint setting[1] for one-to-one relationships (which can usually be avoided via a joint reference table). This pattern should already work in pg-nano.

[1]: https://www.postgresql.org/docs/current/sql-set-constraints....


Books have been written about SQL Injection. But in the end, SQL Injection just means that you get a different AST than what was intended.

So the simplest, stupid check for injection is to parse the query and see if multiple STMT's are found where only one was intended.

Better checks can easily be imagined.


offtopic: I tried writing some TypeScript, but ran into problems using Jest to test the code. I wanted to write in "up to date" ES6 but I'm not very experienced and wasn't sure which docs or examples to follow.

Should typescript code be written as .mts files, with "type = module" in package.json?

What test layout works best? (i.e. __tests__ in project root? filename.test.mts in the same directory as code?)

Are there any good examples of jest.config.mts (mjs?) and tsconfig.json?

Is the typescript compiler supposed to build the test files too? Is it correct to have a ./build/ in your project root, with all built files including tests under that? Do you then strip out the tests when deploying?

This would be targeting an AWS Lambda environment or similar, not browser based so no bundling, is that correct?


Sharing some packages, tsconfig.json, and jest.config.ts I used in a recent project that I'm pretty happy with below (I'm not using `"type": "module"` in my package.json.

Relevant npm packages: - @jest/globals (I use these to import `describe`, `test`, `expect` and other test-related functions) - ts-jest

My tsconfig.json:

  {  
    "buildOptions": {},  
    "compilerOptions": {  
      "target": "ES6",  
      "module": "ESNext",  
      "lib": [  
        "ES6",  
        "DOM",  
        "ES2017",  
        "DOM.Iterable"  
      ],  
      "moduleResolution": "Bundler",  
      "jsx": "react-jsx",  
      "declaration": true,  
      "strict": true,  
      "esModuleInterop": true,  
      "skipLibCheck": true,  
      "forceConsistentCasingInFileNames": true,  
      "outDir": "./dist"  
    },  
    "include": [  
      "src",  
      "__tests__",  
      "*.config.ts",  
      "sandbox.ts",  
      "sandbox-ui.js"  
    ]
  }
and my jest config (jest.config.ts):

  import type { Config } from "jest";

  export default {  
    preset: "ts-jest",  
    testEnvironment: "jsdom",  
    testMatch: ["\*/__tests__/\*/*.test.ts"],  
    globals: {  
      fetch: global.fetch,  
    },  
  } as Config;
I have my tests set up under the `__tests__` directory in project root as you noted, and use a `.test.ts` suffix on all relevant test files. Doing this, ts-jest handles the actual transpilation + execution of the tests (just by running jest), and you don't have to worry about including them in your built solution. I have a separate `tsconfig.build.json` for actually building my project (this... is probably inefficient... but it works well for me :P).


You are a fantastic person, thank you for helping this random internet stranger :) This looks to be exactly what I was after!


This is extremely cool. Well done! If I was able to use this with node-postgres without schema management, but views and plpgsql, I'd finally have what I always wanted for pg <-> ts.

I think this library is going to set a new standard for db integration!


libpg_query is a good library!

"C library for accessing the PostgreSQL parser outside of the server environment"

https://github.com/pganalyze/libpg_query?tab=readme-ov-file#...

  pg_query wrappers in other languages:
  - Ruby: pg_query
  - Go: pg_query_go
  - Javascript (Node): pgsql-parser
  - Javascript (Browser): pg-query-emscripten
  - Python: psqlparse, pglast, psqlparse2
  - OCaml: pg_query-ocaml
  - Rust: pg_query.rs


libpg-query-node already supports AST types via `@pgsql/types` which is generated by `pg-proto-parser` (using the protobuf file in libpg_query): https://github.com/launchql/pg-proto-parser

(v15 exports these, I think they still need to be added to v16).

I've worked with the maintainers of libpg-query-node and they are very friendly and open to improvements. My suggestion would be to work with them to upstream useful changes vs. forking.


question: we are using kysely.dev as postgresql query builder and porsager's postgres.js for high performance.. is this something that can complement our stack or something to replace it entirely?


You could use both side-by-side if you prefer query builders for certain tasks, but that means you'll be bundling two Postgres drivers in your application server, which could mean hitting the connection limit of your Postgres instance. Although, pg-nano has a way to limit number of parallel connections at the cost of reduced throughput.


Well the question you saw coming (hopefully) - how does it compare to Prisma use cases?

One thing I really like about Prisma is only updating my schema and having migrations generated as the "diff".


I wouldn't compare the two directly, as they serve different preferences. I would say that using Postgres functions is more powerful, but that may not matter for your app, depending on its complexity and needs. Ultimately, I'm not concerned with persuading Prisma users to switch over just yet.

As far as migrations go, pg-nano is taking the same “schema diffing” approach that I assume Prisma does, where the active schema of your Postgres instance is compared to the desired schema (defined via SQL files in pg-nano's case) and a migration plan is generated from there. In the context of migrating a non-local Postgres instance, pg-nano still has some R&D to do.


Are those migrations still editable, out of curiosity? Oftentimes I'll want to have migrations add things that simply aren't possible with some ORMs (e.g. generated columns, which can't be schema'd in most ORMs).

The main thing holding me back from Prisma is precisely what you like about it - if the migrations are auto-generated and I can't edit them afterward, I won't be able to do what I need to.


You can edit Prisma migrations. They are plain SQL files with no magic.


+1 have had to edit Prisma files after generating a migration, it went fine. As ‘moltar says, no magic.


hey cool project! thanks for your efforts in the open-source space




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: