Hacker News new | past | comments | ask | show | jobs | submit login

I posted this hours ago and then stepped away. The story captures so much about the Lee I knew so well. I'll add one piece of praise for Lee's early architecture of Cloudflare.

Everything was controlled by a single Postgres database that made very heavy use of stored procedures, that called other procedures, that called others. It was one giant program inside the database. It took me a while to comprehend what he'd done but it was really great. The database ran everything and all those functions made sure that audit logs were kept, that the calls were allowed for the user ID being passed in, and some of these procedures made external calls to APIs including getting things like SSL certificates.

It was a magnificent monolith inside a database.

I worked on the periphery of the database (it was truly Lee's domain) and he'd tell me what output to expect or API to create and I'd code to his spec. and we'd just hook it up.

If any single artefact represents what he did at Cloudflare, it's that database. And he used to code it on a laptop we called "The Beast" because it was so crazily heavy and overloaded with memory etc. that he'd carry around a mini, test Cloudflare wherever he went.




I’m sure he did a great job, but I’ve seen those things be a huge nightmare for anyone other than the irreplaceable genius code wizard who created them.


I guess you're using the word genius ironically, but I've also seen such things be radical in the sense that they're built in a way that no one else would have done, yet it ends up working perfectly for the business and being far more successful than anyone would have predicted. There's a certain type of mind that is immune to fashionable or normal technical choices, that often goes along with remarkable productivity. I suppose the word 'genius' might be a good fit for that.

The interesting thing is that such cases can also actually be the huge nightmare you write about, in the sense that there are tradeoffs in the original design that originally made sense but become harder to deal with later, but also in the sense that their nonstandardness can be hard for more typical engineers to work with. (This is often referred to as the 'hiring problem', though that can be misleading.) The trouble with the typicalness of more typical engineers is that they're steeped in a technical culture which they regard as the proper and obvious way to do things, but which really is mostly just the set of conventions they're used to. Those two things are easy to conflate when all your peers agree on what's proper and obvious (for example that you should never build a complex system entirely in Postgres stored procedures, or whatever the weird thing is). Those conventional assumptions are at least part of what makes such systems a 'nightmare', because if you're working on a system that feels 'wrong', then you fight it rather than blend with it, and that makes it harder to maintain. Everything that's objectively wrong with it will loom larger in your experience than it would if you didn't have the feeling that it should never have been built that way in the first place. Worse, many things that aren't objectively wrong with it, in the sense that they aren't intrinsic to the system or the domain, end up blowing up into major issues because people find they can't work with this system in the way that they're used to, and it's too hard to abandon ways that you're used to—it always feels like the system is at fault. Meanwhile, everything it got right will fade into the background as you assume that any system that had been built 'properly' (i.e. the way you'd have done it) would have gotten all those things right too, at the same cost or better.

It's hard for the later engineer who experiences the 'nightmare' to fully appreciate that if it weren't for that weird, obviously 'wrong' system, they wouldn't have their job in the first place (nightmare and all), because the business that succeeded was built on the weird system.

I don't know anything about this case other than what John posted above, but I've seen something like it in other situations, and in fact I would say that PG and YC have this in common with the case of Lee and Cloudflare as John describes it—as remote from one another as Postgres procedures and custom Lisps must be in every other respect.

What I wish we could teach younger engineers is: when you run across a system like this, pause. Then, pause. Then pause again. The creativity at the core of it can be remarkable. Often you see things that one person did which would have required a large team in other hands, and therefore would never have happened otherwise. Maybe that's one reason why these systems tend to be found at the start of companies: more conventional approaches wouldn't have worked at a stage when hiring a team wasn't an option yet; and on the other hand, the teams you do hire at the later stages tend to have more conventional engineers whose minds (or rather, whose conditioning) can't grok why you'd ever have built something so weirdly.

Usually the later engineers bond with each other around how much they dislike the 'legacy' system, and lobby to replace it, and usually get their way in the end. That's a cycle-of-life thing, but it's also sad when we prefer our conventions and assumptions over seeing, and honoring, the creativity in what came before. Sometimes masterpieces get paved over because of that. Of course, not every nightmare system is an unappreciated masterpiece. But some are, and the bias in our industry is extremely strong toward perceiving them as nightmares. I've seen people take chainsaws to masterpieces without realizing that that's what they were doing, and it makes me sad to think of it. But it makes me happy to read what John wrote to honor Lee, because that's how I'd want someone to write and feel about something I'd built, if I built something like that.


and in fact I would say that the case of PG and YC have this in common with the case of Lee and Cloudflare as John describes it—as remote from one another that Postgres procedures and custom Lisps may be in every other respect.

Oh, you reminded me of something fun. I had a lot of freedom under Lee when I first joined Cloudflare (that's partly how Cloudflare became a Go user) and one of my first jobs was writing a new WAF service... I made an entire working version in Arc. I almost made all our customers write rules using it.


And those conventions change over time. So you might write something using Java EJBs at some point and then the world moves on and engineers no longer know how those things work. What looked like a great architecture is now confusing for most engineers.


That's true, and weird unorthodox architectures can actually end up being more stable in the long run, especially if their design was an intrinsic fit for the domain. A weird architecture that is a good fit for its domain often has simple ways of meeting key requirements that would have been more costly to develop with more conventional techniques. I'm just hallucinating this from your comment upthread, but a requirement like "making sure audit logs are kept" is the sort of thing I mean. Any architecture can do that, but some make it much easier. When a business starts out with a weird architecture in its early days and ends up succeeding, it's possible that that weirdness gave some competitive advantage.


I think there is a lot of truth to this caricature, but it rubs me wrong the way you are painting the "later" programmers as blinded by fashion or tradition. "Geniuses vs ignorant sheeple" is a false dichotomy.

Yes, in some similar situations, some programmers will see things that way. But have some faith and respect for individuals' abilities to reason and apply their experience. As you implied, these kinds of systems are often optimized so that one person could run them.

So it's often a trade off — "this would not be easy for a medium or a large team to work on, but it's the easiest thing for me as an individual to maintain by myself that gets the job done." And that's OK. And yes, it is natural to shift over time. But this is precisely because you now want 40 developers maintaining it instead of 1.

Overengineering is real. But architectural tradeoffs that don't work well for large teams are real too.


> as remote from one another as Postgres procedures and custom Lisps must be in every other respect.

Just a fun fact: Parts of postgres (before the SQL days, I think) used to be in lisp. To this day we still have some weird function/macro names stemming from those (e.g. lcons).

  * Once upon a time, parts of Postgres were written in Lisp and used real
  * cons-cell lists for major data structures.  When that code was rewritten
  * in C, we initially had a faithful emulation of cons-cell lists, which
  * unsurprisingly was a performance bottleneck.  A couple of major rewrites
  * later, these data structures are actually simple expansible arrays;
  * but the "List" name and a lot of the notation survives.


Thanks for the touchingly poignant homage to creative problem solving. Even if we appreciate the effectiveness of such a solution, the sheer artistry in it is highly under-appreciated. Most people sadly walk around with so many blinders that they cannot really _see_. _"[...] pause. Then, pause. Then pause again."_ indeed.


I actually think that doing things in stored procedures could be a great hack to squeeze out some extra performance, and I'm sure that a system like this can accomplish some important business goals.

However, in the example I'm thinking of, it was much more dysfunctional. The advantage of stored procedures there was that the organization's release process was so broken that it was much faster for sales and marketing to get features added by going through Carl, the DBA, instead of submitting the request to the dev team. This was because Carl was able to add the logic by sending one query rather than going through the broken months-long release process.

Carl was actually a real nice guy, although hard to get a meeting with due to his great importance and busy schedule.


Reading code is harder than writing and separating taste from business logic is even harder. Often what looks like an ugly wart was a very compact representation of domain knowlege for a special use case. Untangling that knowlege and re-mastering that use case in a refactoring or even rewrite goes wrong far too often.


Yeah, stored procedures making API calls to external services, sounds like a nightmare of tech debt to unwind.

I'm sure when you have a brilliant engineer like this you can make it work though.


Step back and take an honest look, and in the vast majority of systems, it's hard to deny that stored functions and/or procedures are one of the most powerful tools in the toolbox. If anything, they're widely underutilized. This is especially the case with Postgres -- it has built-in support for functions in several non-SQL languages (including Python).

In practice, the aversion to stored funcs/procs mostly arises from political concerns. Most devs don't understand databases very well, and in most companies, independent DBA teams and independent dev teams end up burning tons of energy fighting for turf.

One of the advantages of a startup is that the smart people are free to utilize the best tools in the toolbox for the problem at hand, rather than having to conform to the social mores around "don't use that tool because it's under the control of the other team and we'll never be allowed to touch it again".


I personally think consideration of this apprach speaks of the technical range and sophistication of the practitioner. It is unusual, but we're in agreement that this is due mostly due to soft and orthogonal issues.

(Tooling could be improved to make the development and debugging process more seamless.)


Working with stored procedures is just the pits. I wonder why you would describe that as "great". Maybe it's optimized, but so would be a giant executable written in assembly.


As long as you have the necessary tooling for source control, versioning, code search, etc. -- why is it the pits?

I mean, if you stick to a rigorous procedure of automatically exporting your stored procedures to files and committing them to git with messages... how is that different from any other development?

Serious question -- I've done tons of database work but only a little bit with stored procedures, so wondering if I'm missing a crucial detail here.


I'd highly recommend playing around with stored procedures. They, at a minimum, can provide data integrity guarantees that you can't get with anything else. As an example, for ages the only way to do a reliable (pseudo atomic) upsert call on most databases was through a stored procedure. Once I discovered stored procedures, I started using them everywhere. After a while I realised that you need to put your logic in the appropriate place for all of the usual reasons (performance, maintainability, security etc) and resisted the temptation to dump all business logic into stored procedures, but I still use them a lot. They're like the bridge between the dba world and the traditional developer world.

"An elegant weapon, for a more civilised age..."


Not OP, but I want to say there's a lot more that goes into the development process than just source control, versioning, and code search. There's linters, and dozens of different kinds of tests, and ways of automatically running those tests, and layers of procedure before code changes go public. This seems much harder to do for stored procedures. Especially considering that stored procedure languages are fragmented between different databases; you don't just pay the cost of having to change the stored procedures when changing databases, you are also paying for the fact that it is harder to build automated tools to analyze the stored procedures, because the API for each database is going to be different.

That being said, there are lots of factors that influence how maintainable a specific stored procedure is. Team size, business needs, company culture, etc. Stored procedures get a bad rep from the places where they are train wrecks. YMMV.


Let's say that when you deploy, you want to direct 1% of traffic to the new code, then 5%, then 25%, then 50%, and finally 100%. And if monitoring metrics go red at any point, automatically roll back. A typical canary deployment.

How do you do that with a sproc? With normal application code, you can have the load balancer send an arbitrary percentage of traffic to the new instances of the service. A sproc binds the code and the data together. But you only have one database, so there's no way to do a gradual deploy of the code. That would be like doing a gradual deploy of a schema migration. It's only possible if you have a lot of databases.

Also unless you're using MS SQL server, debugging sprocs is a pain. The only workaround I've seen is to add a debug or loglevel arg to the sproc. But actually pausing execution and inspecting data? Good luck.


> Let's say that when you deploy, you want to direct 1% of traffic to the new code, then 5%, then 25%, then 50%, and finally 100%. And if monitoring metrics go red at any point, automatically roll back.

> How do you do that with a sproc?

Wrap the real sproc (well, the set of real sprocs, since old and new will sometimes coexist) with a wrapper sproc that probabilistically directs to the appropriate working sproc based on a table value (or anything else the sproc can read) that is updated to reflect the current stage in the conversion process, including any monitoring-based reversals.


How do you correlate those errors at the application level? If only 1% of sproc calls are the new sproc, any elevated error rates in the application will be hard to see over typical noise.


> How do you correlate those errors at the application level? If only 1% of sproc calls are the new sproc, any elevated error rates in the application will be hard to see over typical noise.

By using the DBs error reporting features to report back errors to the application, including information about which ultimate implementation was handling the request.


That only covers errors that occur inside the sproc, not errors that are caused by the sproc. For example: What if the new version of the sproc succeeds, but the data it returns causes the application code to fail in some way? The sproc metrics would report 100% success and the application error rate wouldn't get above noise until a significant fraction of traffic was using the new buggy version of the sproc.

It's possible to add more epicycles to try and correct for these deficiencies, but the truth is that sprocs are dangerously double-edged. Yes, your code runs next to your data, but to get parity with application code (canary deploys, debugging, logging, etc) you must reinvent lots of tooling.


@ggreer, the grand-grand...parent, wrote:

> there's no way to do a gradual deploy of the code

There could be version numbers?

    some_procedure_v1()
    some_procedure_v2()
and new v2 version application server code calls v2 stored procedures. And if v2 app server code gets rolled back / cancelled, then, the v2 stored procedures automatically won't be used any more.


Version numbers work great as schemas. You could have v1.proc and v2.proc.

Also, for an API schema, only exposing views + stored procedures and hiding tables in private schemas makes the SQL easier to refactor. This is something we recommend at postgrest.org: http://postgrest.org/en/v7.0.0/schema_structure.html#schema-...

Sharing in case this is useful to you.


Interesting idea — I had never thought about that (I mean, looking at tables as internals, and procedures + views as the public interface). I'll think about that the next time I'm doing a data migration.

Thanks for sharing, Postgrest looks like intended for people building a web app, and who don't want to write application server code, instead, they install just a database?

Also, I imagine Postgrest is nice for internal admin apps? then one "just" needs to write Javascript and connect to Postgrest via REST?

Actually, it'd be nice if https://postgrest.org homepage explained some different use cases :-) I had to think for a little while, still not sure if I understand all cases when Postgrest can be useful.

Edit: Now I see there're explanations here: https://postgrest.org/en/v7.0.0/ " Motivation Using PostgREST is an alternative to manual CRUD programming" etc, if I just scroll down ab bit. — I didn't do that until now, I probably stopped reading at the "Sponsors" text & images previously, or I thought it was a documentation page.

This is quite cool: "Creating new views happens in SQL with known performance implications. A database administrator can now create an API from scratch with no custom programming".

It looks a bit like GraphQL also: https://postgrest.org/en/v7.0.0/api.html#resource-embedding

Nice that it's written in Haskell, was my favorite language long ago :- )

(PS. http://postgrest.org redirect to: http://postgrest.org/en/v6.0/ which is not the latest version (& not https), maybe you'd like to redirect to: https://postgrest.org/en/v7.0.0/ instead)


> Postgrest looks like intended for people building a web app, and who don't want to write application server code, instead, they install just a database?

Besides usual webapps, I've also used it for building an IoT project(sensors logic in the db) and a GIS project(postgis/pgrouting). Some users have also used it for open data projects(OpenAPI is provided).

It's also great for internal admin apps and there are ready-made libs like https://github.com/Wellers0n/react-admin-postgrest-client.

Those are some of the use cases off the top of my head. In general, I think it's good for cases where you'd like to use the database as a "processing engine" instead of a "dumb storage".

The docs definitely need a lot of improvement. Thanks a lot for your feedback!


Exactly this. Treat the DB schema as you would any typical API schema. A lot of the techniques used for evolving application APIs can be used for sprocs and views as well, e.g. versioning for breaking changes, adding optional parameters or new result fields for non-breaking changes. Fundamentally I don't think there's much difference between say, a DB schema defined with sprocs/view or an HTTP schema defined with OpenAPI. Both describe an API contract between two remotely communicating processes, the former case just happens to use a SQL dialect/transport to do it.


Interesting that you mention doing this (version numbers) with views too, I didn't think about that ...

... Maybe that could be a way to "perview" a database migration, before running the migration for real. There could be a table 'Some_table_view_v2' that shows how that table would look, after an upcoming data migration. And then v2 app server code, would use that new view. — Then one could be more certain that the data migration, will work fine.

(At the same time, one might need to be careful to let just a small a fraction of the requests, use the new View, if the view is a bit / a-lot slower than the real table.)


>Let's say... you want... 1%... then 5%, then 25%

Honestly though, it feels like the argument you are making here is that it's impossible in general to incrementally roll out database changes? Sure, it's definitely not as simple or easy as spinning up new cloud application instances and diverting http requests, but it's not impossible or infeasible by any means. The problem with databases is that the best solutions are always so subjective and context specific. It's very hard to write out simple, easy-to-understand, "rules," or "concepts," or whatever. Every answer, to every hard problem, is "it depends."

What are we rolling out here? Does the sproc write data or read data? Both? From the same table/schema/server/etc? Does it move data around? How much data? Do you already have read-replicas? How is data replicated to those? Do you have any parallel-write setups with eventual consistency? What parts of the higher level application/platform utilize this part of the database that we want to change? Can we spin up a new DB instance, with the new sproc, and establish data replication/change resolution with the legacy system as we are rolling out the change? Should we move everything to the new system and set up the compat piece as a _forward_ compat on the legacy side instead? etc, etc, etc.

>But actually pausing execution and inspecting data? Good luck.

There are plenty of ways to do this. Copy-paste the code of the proc into your DB IDE of choice, and execute it line by line. If the proc is using temp tables or @variables or anything else you should be easily able to instantiate those locally too and see what the values look like. Use a transaction and rollback and you can run this stuff over and over on prod if you really want to be sure of what it does. As long as you aren't locking stuff in the DB improperly.

There's also the idea of writing sprocs in a more testable/verifiable way. Maybe you have progress tracking tables with status id columns and timestamps. Maybe you split up a long sproc into phases which each write their outputs into "intermediate tables" in some sort of "processing" schema. Maybe you write things to files somewhere to be consumed by some federated monitoring system. Idk, I could think of random ideas all day here. A lot of them probably sound dumb at face value, but I really don't think any of these goals sound infeasible.


It's a completely valid question, so I'll try to give a serious answer, from my perspective. Firstly, I'd like to say that SPs can be quite awesome, for several of the reasons already stated. It is normally one of the few ways one can truly enforce integrity, access controls, and logging at the DB level. (Also, most of my experience is from an MS SQL and C# background, so some of this may not translate over to PostgreSQL quite so well)

1. Verbosity and fitness for purpose: while there is a kind of beauty in pure set-based SQL querying, stored procedures by their nature tend to be a procedural paradigm, and in my opinion, they are not particularly well suited to it. They are insanely verbose by today's standards, and doing anything typically requires several times more code than the equivalent in most modern languages (your mileage may vary with different database engines!). This results in large, monolithic SPs, which are difficult to read, understand, and test. The "standard library" is generally far less extensive than mainstream languages, leading to sub-standard, verbose, slow implementations (string concatenations and manipulations are a good example in MS SQL).

2. Code navigation, dependency management: the inverse of point 1 - in general, the tooling around databases for code searching, navigation, etc, is quite woeful in comparison to non database languages. In MS SQL, dependency management is so unreliable, that the official tooling misses several classes of dependencies, just depending on the type of syntax you happened to use. In your question, you start by saying that if you have the necessary tooling already obtained, it's all good. In my experience, rarely are even "barely good enough" tools in place, let alone anything close to what exists in other environments.

3. Unit testing: since databases tend to be stateful, the very concept of unit testing is a bit of an alien one. People don't tend to write SPs with unit tests in mind, and at least in the MS SQL world, the few "unit testing" frameworks that did exist tended to be incredibly painful to use - to the point that they were generally not worth it. Brittle, expensive integration tests are not the same.

4. Execution plan caching: the way databases handle caching of SP execution plans leaves much to be desired - generally the first set of parameters that an SP gets called with end up determining the execution plan for all future calls. Even in well managed code bases, this ends up being an issue, occasionally needing a DBA to manually go in, identify a misbehaving execution plan, and evict it. Vary by param, and no-cache options do not fully solve this. While the need for such mechanisms is obvious, it is surprising that these have been a pain for 20+ years. Now, execution plan caching also happens for normal SQL too, so my point is rather that the culture around SPs tends to have an unreasonable fear of dynamic SQL, which would be far more appropriate _where the query actually is of a dynamic nature!_

I can imagine people replying "but clearly, you're doing things that you shouldn't be - that's not what SPs are for!". And I agree, in a large part that is my point too. But then, what exactly should SPs be used for?


Thank you so much for writing that up!


I've found the opposite. It can take a long time to get your head into that space but it produces much more maintainable and robust systems in my experience.


I have a similar experience. I've worked with a product that consists of tens of thousands of stored procedures implementing all business logic. Because you are right next to the data you can get away with doing many thing you would have not otherwise.

Also it acted like a barrier, the core team who understood the business well worked on the stored procedures, keeping things fast and stable.

Another team was doing the tech to interface with the database and UX/UI was then implemented in various technologies in parallel and changed/got recycled every few years to whatever was in fashion.

The difference in culture between the two teams was huge, it seemed to work out well in their case.


It also saves a lot of extra work. Often with more conventional systems you have to pull the data out. Move it into a structure native to the language doing the processing.

In the database you don't need to do any of this. The data is there. The type information is there.


It is great: your code runs right near the data it’s working with.

If it’s the pits, you need a better suited development environment. I use emacs and interactively code inside the DB. Python, Lua, JavaScript, PGSQL, whatever.


I'm a journeyman in this space, and don't know emacs well. Can you help me unpack your comment here? You write python in the database directly?


not OP but likely:

1) coding via repl - e.g. one emacs buffer has stored procedure code, send functions over via keyboard shortcut to SQL repl buffer connected to DB to evaluate it 'live' (jupyter notebooks are basically a copy of this mode of operation in a more consumer-friendly package)

2) w/r/t python in db, as OP mentions pgsql, pgsql has support for in-db extensibility using a variety of languages, of which python is one: https://www.postgresql.org/docs/current/plpython.html probably referring to this


Thanks!


Last time I worked with stored procs with an Oracle 8 database, over 20 years ago. PL/SQL was awful.


I work with one extremely large application written in Oracle stored SQL procedures. It is a pain, but pretty efficient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: