Everything was controlled by a single Postgres database that made very heavy use of stored procedures, that called other procedures, that called others. It was one giant program inside the database. It took me a while to comprehend what he'd done but it was really great. The database ran everything and all those functions made sure that audit logs were kept, that the calls were allowed for the user ID being passed in, and some of these procedures made external calls to APIs including getting things like SSL certificates.
It was a magnificent monolith inside a database.
I worked on the periphery of the database (it was truly Lee's domain) and he'd tell me what output to expect or API to create and I'd code to his spec. and we'd just hook it up.
If any single artefact represents what he did at Cloudflare, it's that database. And he used to code it on a laptop we called "The Beast" because it was so crazily heavy and overloaded with memory etc. that he'd carry around a mini, test Cloudflare wherever he went.
The interesting thing is that such cases can also actually be the huge nightmare you write about, in the sense that there are tradeoffs in the original design that originally made sense but become harder to deal with later, but also in the sense that their nonstandardness can be hard for more typical engineers to work with. (This is often referred to as the 'hiring problem', though that can be misleading.) The trouble with the typicalness of more typical engineers is that they're steeped in a technical culture which they regard as the proper and obvious way to do things, but which really is mostly just the set of conventions they're used to. Those two things are easy to conflate when all your peers agree on what's proper and obvious (for example that you should never build a complex system entirely in Postgres stored procedures, or whatever the weird thing is). Those conventional assumptions are at least part of what makes such systems a 'nightmare', because if you're working on a system that feels 'wrong', then you fight it rather than blend with it, and that makes it harder to maintain. Everything that's objectively wrong with it will loom larger in your experience than it would if you didn't have the feeling that it should never have been built that way in the first place. Worse, many things that aren't objectively wrong with it, in the sense that they aren't intrinsic to the system or the domain, end up blowing up into major issues because people find they can't work with this system in the way that they're used to, and it's too hard to abandon ways that you're used to—it always feels like the system is at fault. Meanwhile, everything it got right will fade into the background as you assume that any system that had been built 'properly' (i.e. the way you'd have done it) would have gotten all those things right too, at the same cost or better.
It's hard for the later engineer who experiences the 'nightmare' to fully appreciate that if it weren't for that weird, obviously 'wrong' system, they wouldn't have their job in the first place (nightmare and all), because the business that succeeded was built on the weird system.
I don't know anything about this case other than what John posted above, but I've seen something like it in other situations, and in fact I would say that PG and YC have this in common with the case of Lee and Cloudflare as John describes it—as remote from one another as Postgres procedures and custom Lisps must be in every other respect.
What I wish we could teach younger engineers is: when you run across a system like this, pause. Then, pause. Then pause again. The creativity at the core of it can be remarkable. Often you see things that one person did which would have required a large team in other hands, and therefore would never have happened otherwise. Maybe that's one reason why these systems tend to be found at the start of companies: more conventional approaches wouldn't have worked at a stage when hiring a team wasn't an option yet; and on the other hand, the teams you do hire at the later stages tend to have more conventional engineers whose minds (or rather, whose conditioning) can't grok why you'd ever have built something so weirdly.
Usually the later engineers bond with each other around how much they dislike the 'legacy' system, and lobby to replace it, and usually get their way in the end. That's a cycle-of-life thing, but it's also sad when we prefer our conventions and assumptions over seeing, and honoring, the creativity in what came before. Sometimes masterpieces get paved over because of that. Of course, not every nightmare system is an unappreciated masterpiece. But some are, and the bias in our industry is extremely strong toward perceiving them as nightmares. I've seen people take chainsaws to masterpieces without realizing that that's what they were doing, and it makes me sad to think of it. But it makes me happy to read what John wrote to honor Lee, because that's how I'd want someone to write and feel about something I'd built, if I built something like that.
Oh, you reminded me of something fun. I had a lot of freedom under Lee when I first joined Cloudflare (that's partly how Cloudflare became a Go user) and one of my first jobs was writing a new WAF service... I made an entire working version in Arc. I almost made all our customers write rules using it.
Yes, in some similar situations, some programmers will see things that way. But have some faith and respect for individuals' abilities to reason and apply their experience. As you implied, these kinds of systems are often optimized so that one person could run them.
So it's often a trade off — "this would not be easy for a medium or a large team to work on, but it's the easiest thing for me as an individual to maintain by myself that gets the job done." And that's OK. And yes, it is natural to shift over time. But this is precisely because you now want 40 developers maintaining it instead of 1.
Overengineering is real. But architectural tradeoffs that don't work well for large teams are real too.
Just a fun fact: Parts of postgres (before the SQL days, I think) used to be in lisp. To this day we still have some weird function/macro names stemming from those (e.g. lcons).
* Once upon a time, parts of Postgres were written in Lisp and used real
* cons-cell lists for major data structures. When that code was rewritten
* in C, we initially had a faithful emulation of cons-cell lists, which
* unsurprisingly was a performance bottleneck. A couple of major rewrites
* later, these data structures are actually simple expansible arrays;
* but the "List" name and a lot of the notation survives.
However, in the example I'm thinking of, it was much more dysfunctional. The advantage of stored procedures there was that the organization's release process was so broken that it was much faster for sales and marketing to get features added by going through Carl, the DBA, instead of submitting the request to the dev team. This was because Carl was able to add the logic by sending one query rather than going through the broken months-long release process.
Carl was actually a real nice guy, although hard to get a meeting with due to his great importance and busy schedule.
I'm sure when you have a brilliant engineer like this you can make it work though.
In practice, the aversion to stored funcs/procs mostly arises from political concerns. Most devs don't understand databases very well, and in most companies, independent DBA teams and independent dev teams end up burning tons of energy fighting for turf.
One of the advantages of a startup is that the smart people are free to utilize the best tools in the toolbox for the problem at hand, rather than having to conform to the social mores around "don't use that tool because it's under the control of the other team and we'll never be allowed to touch it again".
(Tooling could be improved to make the development and debugging process more seamless.)
I mean, if you stick to a rigorous procedure of automatically exporting your stored procedures to files and committing them to git with messages... how is that different from any other development?
Serious question -- I've done tons of database work but only a little bit with stored procedures, so wondering if I'm missing a crucial detail here.
"An elegant weapon, for a more civilised age..."
That being said, there are lots of factors that influence how maintainable a specific stored procedure is. Team size, business needs, company culture, etc. Stored procedures get a bad rep from the places where they are train wrecks. YMMV.
How do you do that with a sproc? With normal application code, you can have the load balancer send an arbitrary percentage of traffic to the new instances of the service. A sproc binds the code and the data together. But you only have one database, so there's no way to do a gradual deploy of the code. That would be like doing a gradual deploy of a schema migration. It's only possible if you have a lot of databases.
Also unless you're using MS SQL server, debugging sprocs is a pain. The only workaround I've seen is to add a debug or loglevel arg to the sproc. But actually pausing execution and inspecting data? Good luck.
> How do you do that with a sproc?
Wrap the real sproc (well, the set of real sprocs, since old and new will sometimes coexist) with a wrapper sproc that probabilistically directs to the appropriate working sproc based on a table value (or anything else the sproc can read) that is updated to reflect the current stage in the conversion process, including any monitoring-based reversals.
By using the DBs error reporting features to report back errors to the application, including information about which ultimate implementation was handling the request.
It's possible to add more epicycles to try and correct for these deficiencies, but the truth is that sprocs are dangerously double-edged. Yes, your code runs next to your data, but to get parity with application code (canary deploys, debugging, logging, etc) you must reinvent lots of tooling.
> there's no way to do a gradual deploy of the code
There could be version numbers?
Also, for an API schema, only exposing views + stored procedures and hiding tables in private schemas makes the SQL easier to refactor. This is something we recommend at postgrest.org: http://postgrest.org/en/v7.0.0/schema_structure.html#schema-...
Sharing in case this is useful to you.
Thanks for sharing, Postgrest looks like intended for people building a web app, and who don't want to write application server code, instead, they install just a database?
Actually, it'd be nice if https://postgrest.org homepage explained some different use cases :-) I had to think for a little while, still not sure if I understand all cases when Postgrest can be useful.
Edit: Now I see there're explanations here: https://postgrest.org/en/v7.0.0/ " Motivation
Using PostgREST is an alternative to manual CRUD programming" etc, if I just scroll down ab bit. — I didn't do that until now, I probably stopped reading at the "Sponsors" text & images previously, or I thought it was a documentation page.
This is quite cool: "Creating new views happens in SQL with known performance implications. A database administrator can now create an API from scratch with no custom programming".
It looks a bit like GraphQL also: https://postgrest.org/en/v7.0.0/api.html#resource-embedding
Nice that it's written in Haskell, was my favorite language long ago :- )
(PS. http://postgrest.org redirect to: http://postgrest.org/en/v6.0/ which is not the latest version (& not https), maybe you'd like to redirect to: https://postgrest.org/en/v7.0.0/ instead)
Besides usual webapps, I've also used it for building an IoT project(sensors logic in the db) and a GIS project(postgis/pgrouting). Some users have also used it for open data projects(OpenAPI is provided).
It's also great for internal admin apps and there are ready-made libs like https://github.com/Wellers0n/react-admin-postgrest-client.
Those are some of the use cases off the top of my head. In general, I think it's good for cases where you'd like to use the database as a "processing engine" instead of a "dumb storage".
The docs definitely need a lot of improvement. Thanks a lot for your feedback!
... Maybe that could be a way to "perview" a database migration, before running the migration for real. There could be a table 'Some_table_view_v2' that shows how that table would look, after an upcoming data migration. And then v2 app server code, would use that new view. — Then one could be more certain that the data migration, will work fine.
(At the same time, one might need to be careful to let just a small a fraction of the requests, use the new View, if the view is a bit / a-lot slower than the real table.)
Honestly though, it feels like the argument you are making here is that it's impossible in general to incrementally roll out database changes? Sure, it's definitely not as simple or easy as spinning up new cloud application instances and diverting http requests, but it's not impossible or infeasible by any means. The problem with databases is that the best solutions are always so subjective and context specific. It's very hard to write out simple, easy-to-understand, "rules," or "concepts," or whatever. Every answer, to every hard problem, is "it depends."
What are we rolling out here? Does the sproc write data or read data? Both? From the same table/schema/server/etc? Does it move data around? How much data? Do you already have read-replicas? How is data replicated to those? Do you have any parallel-write setups with eventual consistency? What parts of the higher level application/platform utilize this part of the database that we want to change? Can we spin up a new DB instance, with the new sproc, and establish data replication/change resolution with the legacy system as we are rolling out the change? Should we move everything to the new system and set up the compat piece as a _forward_ compat on the legacy side instead? etc, etc, etc.
>But actually pausing execution and inspecting data? Good luck.
There are plenty of ways to do this. Copy-paste the code of the proc into your DB IDE of choice, and execute it line by line. If the proc is using temp tables or @variables or anything else you should be easily able to instantiate those locally too and see what the values look like. Use a transaction and rollback and you can run this stuff over and over on prod if you really want to be sure of what it does. As long as you aren't locking stuff in the DB improperly.
There's also the idea of writing sprocs in a more testable/verifiable way. Maybe you have progress tracking tables with status id columns and timestamps. Maybe you split up a long sproc into phases which each write their outputs into "intermediate tables" in some sort of "processing" schema. Maybe you write things to files somewhere to be consumed by some federated monitoring system. Idk, I could think of random ideas all day here. A lot of them probably sound dumb at face value, but I really don't think any of these goals sound infeasible.
1. Verbosity and fitness for purpose: while there is a kind of beauty in pure set-based SQL querying, stored procedures by their nature tend to be a procedural paradigm, and in my opinion, they are not particularly well suited to it. They are insanely verbose by today's standards, and doing anything typically requires several times more code than the equivalent in most modern languages (your mileage may vary with different database engines!). This results in large, monolithic SPs, which are difficult to read, understand, and test. The "standard library" is generally far less extensive than mainstream languages, leading to sub-standard, verbose, slow implementations (string concatenations and manipulations are a good example in MS SQL).
2. Code navigation, dependency management: the inverse of point 1 - in general, the tooling around databases for code searching, navigation, etc, is quite woeful in comparison to non database languages. In MS SQL, dependency management is so unreliable, that the official tooling misses several classes of dependencies, just depending on the type of syntax you happened to use. In your question, you start by saying that if you have the necessary tooling already obtained, it's all good. In my experience, rarely are even "barely good enough" tools in place, let alone anything close to what exists in other environments.
3. Unit testing: since databases tend to be stateful, the very concept of unit testing is a bit of an alien one. People don't tend to write SPs with unit tests in mind, and at least in the MS SQL world, the few "unit testing" frameworks that did exist tended to be incredibly painful to use - to the point that they were generally not worth it. Brittle, expensive integration tests are not the same.
4. Execution plan caching: the way databases handle caching of SP execution plans leaves much to be desired - generally the first set of parameters that an SP gets called with end up determining the execution plan for all future calls. Even in well managed code bases, this ends up being an issue, occasionally needing a DBA to manually go in, identify a misbehaving execution plan, and evict it. Vary by param, and no-cache options do not fully solve this. While the need for such mechanisms is obvious, it is surprising that these have been a pain for 20+ years. Now, execution plan caching also happens for normal SQL too, so my point is rather that the culture around SPs tends to have an unreasonable fear of dynamic SQL, which would be far more appropriate _where the query actually is of a dynamic nature!_
I can imagine people replying "but clearly, you're doing things that you shouldn't be - that's not what SPs are for!". And I agree, in a large part that is my point too. But then, what exactly should SPs be used for?
Also it acted like a barrier, the core team who understood the business well worked on the stored procedures, keeping things fast and stable.
Another team was doing the tech to interface with the database and UX/UI was then implemented in various technologies in parallel and changed/got recycled every few years to whatever was in fashion.
The difference in culture between the two teams was huge, it seemed to work out well in their case.
In the database you don't need to do any of this. The data is there. The type information is there.
1) coding via repl - e.g. one emacs buffer has stored procedure code, send functions over via keyboard shortcut to SQL repl buffer connected to DB to evaluate it 'live' (jupyter notebooks are basically a copy of this mode of operation in a more consumer-friendly package)
2) w/r/t python in db, as OP mentions pgsql, pgsql has support for in-db extensibility using a variety of languages, of which python is one: https://www.postgresql.org/docs/current/plpython.html probably referring to this