That friction and vague uneasiness for me is as the nagging thought that it's simply not correct code. In my case I prefer to write as much SQL as reasonably possible to prevent having to write any code in an imperative language.
+1. And the subset of those applications concerned with simply converting result sets <-> JSON over HTTP may well even tolerate use of an API generator (eg PostgREST, PostGraphile, Hasura), reducing that thin layer to merely a membrane.
Interesting. You mean make all the queries as detailed and specific as possible, processing things as much as possible in SQL so you don't have to mess around with the data you get from/put in the DB in an imperative langauge?
Do you have any good examples of this? Code bases that can be read?
Messing with data significantly outside of SQL is often asking for trouble.
SQL queries are compiled into very efficient operations which would take a lot longer to get right imperatively. Not only that, but database engines are improving all the time, so the same code you wrote which declares your desired transformations tends to get faster over time and there is nothing to update or refactor. The transformations you write are sort of timeless because they are strongly decoupled from the implementation and hardware.
Lots of data transformation steps in imperative languages require the persistence of an intermediate calculation (e.g., np.ndarray). SQL database only do this when the query planner deems it absolutely necessary. It will show up in your query plan as a "materialize" step.
The EXPLAIN feature of SQL is the most useful performance debugging tool. It also alerts me to a potential logic flaw quickly when the proposed plan looks insane. I have personally replaced several analytical programs a very large bank used to monitor loan originations and performance.
I don't use any special SQL language features to this day. The most sophisticated it typically gets is involving lots of subqueries, joins, and window functions. The real skill is distilling the convoluted imperative mess into its essence. I got really good at this.
Half the work effort is usually spent on socializing the changes I have to make to their logic to get it working right in SQL when it changes the behavior of their existing application's logic. Often times I find the client's imperative code attempting to perform a logical operation such as a join but it is implemented incorrectly.
Their existing imperative code operations actually produced the wrong results (subtly) frequently or their imperative code depended on the order of the data returned from the database (undefined behavior). Yikes.
What they actually wanted was provably implemented incorrectly or relied on undefined behavior and their mistakes and the proper resolution in SQL could be easily verified with paper and pen on a sample set of loans if necessary to drive the point home.
> Not only that, but database engines are improving all the time, so the same code you wrote which declares your desired transformations tends to get faster over time and there is nothing to update or refactor. The transformations you write are sort of timeless because they are strongly decoupled from the implementation and hardware.
Only true up to a certain extent. Counterexample: performance regressions due to changes in the query planner / its input statistics. Changes aren't always positive and logically equivalent plans can have very different perf characteristics.
Fully agree. I am mostly on the analytical side. When my client uses Snowflake it's usually smooth sailing because it's so automated and performant. When I have my own analytical Postgres instance on my local machine I tune costs and memory consumption parameters in postgres.conf but I only rarely run into major gotchas. If my client uses IBM... I go for a walk on the beach or go out to lunch when I launch my query.
Your point about equivalent plans can have very different perf characteristics is very true. I always try to review the query plan with EXPLAIN if my query takes more than a minute and rewrite the logic if necessary.
Very cool. So you write it with solid, declaritive SQL and you can trust that it will be rock solid and optimizied. Need to learn SQL instead of just doing NoSQL all the time. Thanks for the explanations.
I load JSON into Postgres all the time these days for analyzing data for a government client and use Postgres JSONB operators to untangle and index it. JSONB because I don't care about preserving the original string representation of the record (strings in fields are still completely intact).
Although I heavily lean on psql with \COPY and stdin or stdout pipes to a compressor like zstd (psql can natively pipe CSV to and from any program reliably even in Windows) I found loading JSON records to be extremely frustrating this way.
Whatever you do NEVER use pipes in Powershell. They don't stream. They buffer in RAM fully into your computer crashes. Microsoft is insane.
Since you use NoSQL you can write a very short Python program that uses psycopg2 directly to load a list of dict as JSONB rows into a single Postgres table with one column (I call mine "record")
At that point you can basically use Postgres as a NoSQL database and structure the records if you want using a VIEW.
We're in the process of documenting their use of JSON for financial records keeping as a design defect.
Their existing imperative code operations actually produced the wrong results (subtly) frequently or their imperative code depended on the order of the data returned from the database (undefined behavior). Yikes.
I'm in agreement with GP; I work daily on code that does it the traditional way, and I use the "do as much as possible in SQL" in my own side projects, and it always seems more maintainable to push off all the data logic into SQL.
> Do you have any good examples of this? Code bases that can be read?
I don't know of any projects that do this, but I do this for many of my side projects.
Keeping the business logic in-database was a major move in my career and a huge source of « well done sentiment ».
My perfect combo today rely on golang as the (thin) API layer, querying data and making CRUD using plsql procedures, enforced by triggers and constaints.