Even the first "myth" that was "busted" had me cringing, putting all the stored procedures into individual files and using a source control system to manage those. The person I replaced had done the same thing with 500+ stored procedures and still had never bothered to actually update the files. He just updated the stored procedures on the development server. Partly because he was a massive fuckup, but also because it was "out of band" management.
I ended up moving the code out of SourceAnywhere (an all around terrible version control system and company) and placing the code into Github, and cleaning up the database so that it could actually be dumped to an SQL file and then loaded from that file as an NUnit test, run by the CI server that I set up.
Even with these huge strides forward, it has taken me months to clean up and fix these problems, while still adding new features to the existing code and database. The experience has left me feeling that the low quality of code that crap coders can get away with in stored procedures, and the lack of testability still leads me to strongly avoid stored procedures because I ended up having to fix them with sweat, tears, and blood.
SQL Server is just plain awesome at this stuff. Schema is all scriptable (it even prompts you to let it script itself if you mess with things in the modeling tools). That means that you get automagic change scripts that tuck nicely into source control and are just waiting to be sucked in by your automated build and deploy system of choice.
Procs and everything else are all just scripts, which get their own files. So just like you have individual files for all 500 of your classes, you'll have 500 .sql files for those procedures. It's not difficult in any way.
It sounds like you ran into problems because you inherited a system from people who went out of their way to mess things up. Messed up things are hard to work with, regardless of what technology they live in. Try not to blame the database for the fact that you can build ugly things with it.
I love this quote:
UNIX was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. - D. Gwyn
I know you are attempting to highlight a weakness in my argument by highlighting how my reservations are fairly generic, but by the same token, I have never been lucky enough to see high quality PHP code. I don't think PHP encourages best practices, just like Stored Procedures. Either that, or PHP is just such a large community that the number of bad coders, even though it might be a constant percentage across all languages, but because PHP is so large I've run into a lot of them. Who knows?
While I have no doubt that there are some real superstars that can make beautiful PHP code and write wonderful stored procedures, there is still the 99% of programmers who can't and end up making unmaintainable nightmares.
There is a lot of bad PHP code because there are a lot of people who start hacking using PHP. However, any language somebody first starts hacking in will result in ugly code (you should have seen my Scheme code when I first started with that in college! :).
Unfortunately, the idea of "teaching languages" doesn't really work, because people want to continue using what they've already learned.
(BTW, I don't think you deserved downvotes. You are continuing the conversation, after all, not trolling. I really think we need to make downvoting more painful.)
"(BTW, I don't think you deserved downvotes. You are continuing the conversation, after all, not trolling. I really think we need to make downvoting more painful.)"
It would help if the voting was non-anonymous.
Anything can be mis-used. Whether or not we should use it should depend less on potential for mis-use, and whether it can be used correctly (and in a multi-developer environment, whether or not you can enforce correct usage in some way, either through technical or social means).
Don't delay - www.redgate.com
But "can" isn't the same as "will" -- in practice, there are practical issues that make this painful.
The other point is that using advanced Oracle/DB2 features to accomplish things means that you need to keep some RDBMS high priests around. That means lots of money.
It has a few drawbacks but mostly works ok.
You end up with the equivalent set of scripts in source control, but it seems like they'd be a lot more readable (they won't be all lumped on top of eachother and out of order, etc.). Is there another advantage to what you're doing that makes it worth deviating from the "best practice" way of doing things? Is, for example, the -merge functionality something that you use on a regular basis to roll backward?
Well, DB objects were put into version control much later than other source files, so I wanted a way to guarantee that what's there in source control matches what we actually have in the database server.
You're suggestion is that basically developers maintain the discipline of having changes in the files reflect what they run in the DB, right? I thought its significantly cleaner to have the database as the single source-of-truth and take everything from there.
but it seems like they'd be a lot more readable
Didn't quite get that part. The generated scripts are saved in one-per-file and reflect the name of the table/procedure/etc... that it contains.
Isn't that basically what the 'best practice' way of doing it is?
Merging is just to take the latest set of source control and bring your DB up to date with it.
How do you handle version control? I'm certainly interested in a better way to do it.
1. Yes, you can store your packages in SVN or whatnot, but two developers cannot work on the same package on the same database at the same time without stomping on each other's work.
2. Managing database changes: yes, you can analyze explicit dependencies for a given object, except that EXECUTE IMMEDIATE statements are not tracked. Nor, of course, are any queries done by your java webapps or whatnot.
3. "And PL/SQL has object oriented features too." We've moved away from most of the object oriented features because either we encountered bugs (in 10g) or because performance was abysmal.
4. EXECUTE IMMEDIATE -- I love the idea of execute immediate, but in practice, it's has problems. Performance tends to be poor (queries often need to be re-parsed) and as noted above, they break dependency tracking. If over-leveraged, they can also result in highly unreadable code.
5. Unit testing -- I've looked at several PL/SQL unit testing packages and they've always seemed to me to be fairly half baked. Unit testing procedures with side effects is also often inherently difficult because they rely on database state. I'm not against it, but it certainly isn't as easy as using jUnit. That said, if anyone has successfully used a PL/SQL unit testing package, I'd be curious to hear it.
5. Portability -- Irrelevant if you're using oracle specific features (which, you probably should if you're using Oracle).
None of this is to say that I'm against stored procedures. On the contrary, they provide some level of encapsulation, and I've generally found PL/SQL to be fairly transparent to understand. If I'm doing some complicated joins and/or several sequential DML statements with exception handling in a transaction, I'd much rather do it in PL/SQL than from an external language. I just think the article paints a picture that's a little to rosy.
Relating to #5: doesn't your objection to unit-testing code that has side-effects also apply to say Rails/Django/PHP code that also modifies state in a data store? I suppose you could argue that if you are using Rails/Django/PHP, you could mock your entire database access layer (maybe through your ORM), and that this would be very difficult inside PL/SQL. However, mocking the entire data access layer is pretty expensive, in any server-side environment if you want to get some actual testing done (as in nuances of resolving foreign key constraint issues, or how locking of rows/tables must work for your code to run concurrently and deadlock-free).
I am not arguing for or against PL/SQL. Just wondering why you think those points might have different obstacles in different environments.
Yes, but it's trivial to create both developer instances and an integ instance of Rails/Django/PHP so you don't end up trying to both modify code on the production server. Doing that with the db, especially if you want to test against "production scale" data sets, is at the very least going to be expensive.
If you need to bring up a "production scale" server for a developer to go along with their virtual machine with application code, then you have to do it no matter what environment the application code runs in. I imagine modifying data using experimental code on a production system is not really desirable no matter what the code is written in.
Pros: Stored Procedures tend to be fast if you know what you are doing. Stored Procedures are a good way to abstract away the "relational" differences between OO and SQL.
Cons: Stored Procedures are a good way to write a bunch of horrible to maintain code. Most developers spend their time writing front-middle end code and don't write SQL that well so they write horrible SPs. Even worse is someone who really understands SQL and writes almost the entire application in a SP and writes awful front-end code. If I had a dollar for every time I opened up a 2k+ line SP with nested queries, queries written as strings and 'eval'd or an awful number of joins I'd probably be arrested for suspicion of dealing drugs.
In almost every company there are a small group of people who call themselves "DBA's" and believe in writing unmaintainable code for speed and rally against those using ORM's or things that basically remove them from the equation .
Its crazy the amount of pain I've witnessed and felt because all of a sudden I realize I have no clue with a SP is doing or why its written a certain way. And having to sit down next to these DBA's so they can "explain"(mostly fix it because they have no time to explain) it to me.
My last job we finally got the OKAY to try out Entity Frame work(.net ORM) but the DBA literally rallied for weeks, showing how much faster SP's are. We finally came to an agreement where we would use entity with autogenerated SP's as a backend and fall back to SP's on slow calls. I'm still not sure why he was given a say but it is what it is and I don't work there anymore.
FWIW I work on a trading system that happily handles thousands of commits/sec, in Oracle, along with complex calculations of exposure and the like.
Which again was still being built and had no need to be a high performance machine. In fact none of the products that the company served up needed to be high performance. These guys were smart and bored and built huge monolithic systems full of complicated approaches to simple problems.
Edit: Further more nobody wanted to work with this guy on the team. He was an "expert" on everything and ran every single conversation about anything, including areas he had no experience in, into the ground because nobody's opinions were valid because he clearly knew everything. /rant
And when scaling matters, you often need to do the opposite. And scaling is often more important than speed.
Yes, often doesn't scale because the database is an extremely limited resource that generally doesn't scale horizontally.
> or you can move the data to the compute
Which is what I said, which allows you to scale computation horizontally.
> There is no situation ever
You just named one, however I see where you're misinterpreting me or I you; when I said do the opposite, I meant move the computing away from the database.
But anyway, that's what indexes are for in RDBMS, so you still have a point, even figuratively :)
> Myth #1: Stored procedures can't be version controlled
Of course they can, but unlike normal code which runs/builds directly from the file-system, a sproc tends to be written directly on the database server (since you won't get all those fancy refactoring/debugging tools unless you do so). Copying it to the fs before committing your changes is a step that people often forget, or one that requires yet more tools.
> Myth #2: Managing the impact of changes in the database is hard
> Myth #3: Database tools lack modern IDE features
His examples of basic refactoring tools hardly compares to something like ReSharper.
> Myth #5: Code in the database can’t be properly encapsulated and reused, you need an object-oriented language for that
They can be encapsulated, but it's a pain. Even just getting data from one stored procedure to another can involve hackery like opening a connection to the database from inside the calling procedure. See this link: http://www.sommarskog.se/share_data.html - none of these will work in even close to all situations.
No they don't. At least not more than once.
Changing something on your dev database and forgetting to save your change scripts and check them in is analogous to changing html directly on the production web server and forgetting to save it locally and check it in to source control. It's a very bad idea, and nobody would ever actually do that. But if you did do it, your changes would evaporate during the next deploy, and after a few minutes of venting to the sky you'd learn to never ever do that again.
Admittedly in most SQL editors, you can open up the file, then connect the IDE to the database. But it's hardly seamless (particularly if you have multiple databases), and certainly isn't like normal development where the filesystem is exactly what's running, and is the only user visible copy. The tools just all feel extremely immature compared to any IDE I've used.
And to clarify, I'm talking about the dev SQL server, not developing in production.
No matter what you use to modify your database, be it a modeling tool, the table editor that comes with the database, a fancy VS.NET plugin, or a mysql command prompt, you're still producing SQL change scripts. Those change scripts always need to get saved out to a file and checked into source control.
This is especially true if you're developing on a local database with multiple stages between you and production. You need scripts for everything so that your build can push things out to the next environment down the chain and eventually live.
If you're not doing that, I can't imagine the pain you're introducing for yourself.
When dealing with scale problems, the biggest asset is flexibility as you will likely need more than one kind of "database" and your logic layer will be the glue that holds it all together.
Stored procedures are simply too focused a tool and often tied to a particular database to offer any comprehensive solutions.
I've worked 2 places with Oracle stored procedures. At both places, they were a giant nonperformant mess of spaghetti that could never be cleaned up because changing things == breaking things. And since all of the application logic had been written as transactional SQL with lots of joins over normalized tables, it was gridlock heaven inside the DB and adding more cores to the single machine barely even helped.
Are you arguing we should all use COBOL and every other stupid-ass approach, as well?
Good apocryphal quote, it's impossible to convince someone of something if their job depends on it not being true.
"noSQL" is useful for for application layer - where caching and short-term storage are important; but most "noSQL" advocates I've seen seem to think that it should replace an RDBMS, when that couldn't be further from the truth. "noSQL" is useless as a structured backing store where RDBMS are used today, especially for large systems.
Show me an autosharding solution for an open source SQL database that doesn't involve paying tons of money. Until then I'm not convinced I should ditch MongoDB.
So how is there no room for NoSQL databases? I understand that most sites running NoSQL at the moment could just as well (or even better) run on Postgresql, even with 'biggish data', but is for 'enormous' data and near real time analysis of it, NoSQL not a proper solution over 'traditional' RDBMSs. Not trolling; really curious what people have to say.
Do I understand it correctly that Oracle 'Exadata' just uses fiber cables and such to threat the entire rack as 'one node' or does it work differently? How does Oracle scale? Any information about that (tech information, not opinions/interpretations of reality as OP posted)?
Just go with both. MongoDB can't make your data live & evolve the way an RDBMS can. You can't inject intelligence in your noSQL database, but you can sure extract data, a lot of it, a ton of it, with as many concurrent clients you wish.
If you can't find a use for a relational model in any project, then I'm not sure what to say. But if you can, maybe try a postgres & Mongo combo for the best of both worlds?
- Most web languages have a wealth of tools for testing, introspection and instrumentation that don't exist or are hacks in PL/SQL. Unit tests, basic logging, aspects that log function execution time, posting to error tracking & app metrics services, etc.
- PL/SQL is not expressive or concise. String processing, in particular, is a nightmare. This means more code, buggier code & thus reduced developer happiness.
- At scale, developers need to be parsimonious with joins in the data layer, otherwise partitioning is impossible. This is technically possible in PL/SQL, but runs completely against the grain of SQL, so I can't imagine this actually happening in the real world. This is probably my company's biggest source of technical debt.
- SQLPlus doesn't return a non-zero error code when certain errors happen when installing PL/SQL, so release scripts have to install then assert the user_errror table is empty. This is just ugly.
Lots of other things are fairly hand waving. Stored procedures can be version controlled... if you write some code to do it yourself, or buy RedGate's tool for that. Still, I've never worked in a shop that did as good a job at version controlling their SQL as they did their code. It's harder. Stored procedures can be refactored... but I'm not sure what tool is being used, and I don't want to have to evaluate 20 different tools to find the good one. I know I currently have better tools for writing compiled code than I do for writing database sprocs.
There are many valid reasons for doing as little as possible in the DB layer, most of which are listed here, and I just don't see this article as successfully debunking them.
In a world of light-weight HTTP services, micro-frameworks, and high-quality ORM, the making the 'right' architectural decisions seems downright simple -- or at least, it seems easier to get everyone involved to agree on how things should fit together.
I've recently been tasked with rewriting some of our scheduled processing code from what it is now (some very badly-written C#) to a collection of stored procedures and functions. The biggest issue is that the code does a lot of string processing, much of it very unusual. T-SQL seems to lack all of the facilities I require to actually do that job, and my boss won't give me even the leeway to write new .NET code for the database, which would make my job not only easier, but the project more maintainable - I think because a previous programmer poisoned the water with his abysmally bad code. (The company now fears all .NET code. It's that bad.)
I think the matter of maintainability outweighs all of the "myths" this article "busts," but the article addresses none of that.
At Loopt we used MSSQL from the beginning. Things didn't start this way (because I was young and foolish), but in the end:
* Each developer had their own local instance of the DB(s) for development.
* All schema was kept under source control (now using RedGate SQL Source Control, a fantastic product, and earlier as simple sql scripts in Hg/Svn and database projects in Visual Studio).
* Unit tests of the DB were done in code (C#). They set up their own initial state, ran against the local DB instance, and cleaned up after themselves by rolling back transactions.
* Commits in source control triggered automatic SQL updates in dev using RedGate SQL Compare (though for a time we used the SQL tooling in Visual Studio, also command line scriptable, for this task).
* Debugging performance was easy with performance monitor, set statistics (profile|io|time) on, and built in management views. Right click -> "Show what's expensive" kind of easy.
* Debugging functionality was easy with step-in line by line debugging from the C# code calling it.
* In many cases 5-10 round trips were avoided by keeping data local. Lots of network bandwidth was saved as well by only returning final results.
* Used correctly, stored procedures encourage correct handling of parameters and can increase security -- Web server code didn't have read or write access to the raw data. It was constrained to calling the stored procedures (no way to dump all password hashes, for example). Think of it as another layer of defense.
We tried to keep most of the complexity out of the DB because T-SQL is a terrible language. You'd have to be insane to write anything you didn't have to in it. To get optimal performance we occasionally had to do complex optimizations (table hints, indexed views, etc.), but all of that was still far easier than proper cache invalidation. Loopt ran with no cache because the DB worked just fine when tuned correctly.
My recent experience has actually pushed into the more extreme conclusion that even the database task itself is too complex for a single tool, so we use different kinds of databases to accomplish different aspects of the same db task.
If you add the logic layer to this picture it becomes apparent the most valuable asset on has is flexibility, keep your tools simple and focused.
The article says 'if you have millions of users, you should be able to afford decent hardware.' - that is misleading, because the cost of the hardware is miniscule compared to the cost of the Oracle licenses you would need.
My take: Stored procedures are a valid tool that belongs in everyone's toolset. I dislike them for one reason only: I'd rather define as much as possible in _one_ language. Having a codebase that is mixed between languages can be worth the tradeoff, but I'd like to avoid it where possible.
What about the fact that implementing a hash table is an advanced topic in pl/sql. You get dictionary style data tables that can only be indexed by binary integer?
Encapsulation is nice and everything ... but oh yea the entire database is visible to all the code all the time. What does data encapsulation even mean in that environment?
What about memory management? What do you do when your code is using too much ram? It seems much less clear than in java.
What about the fact that I have never been able to measure a significant runtime difference between pl/sql and raw jdbc code? Usually the cost of inserting rows into a heavily indexed table outweighs all other factors.
I'm not opposed to running code in the db but in practice everything feels like it is 10 years behind the state of the art.
This is brilliant timing actually, as I've been wondering lately why stored procedures are so maligned as they do seem to have a lot of advantages. As well as being informative and well-written this post is entertaining, I enjoyed it a lot :)
If you want to do meaningful version control for your DB, you need to do version control for all of those things too. It's not impossible, but it is very difficult, because databases also have data. If you want to add a row to your table, for example. you can't just change the code you used to create it. Instead, you need to write new code that modifies the existing schema by adding columns.
Also, I wouldn't be too much of a purist about the data/logic distinction. Sometimes doing a little processing on the data in a stored proc before returning it makes sense. On the other hand, I would hesitate to put very complex logic in a stored proc. So as usual, there are going to be grey areas. Do what makes sense.
Clearly a sales pitch for Oracle tools.
SP are just fine if your building Enterprise apps being used by a few hundred people.. you're environment is fixed and probably a handful of developers will ever touch the code.
Could you imagine writing a huge open source app using SP? Not every has the same toolset, and pretty much most of them won't have access to Oracle's debuggers/version control/etc.
Also, the argument of "If you have a few million users, you can afford to scale your database up" is complete bullshit. If I were the marketing department for Oracle, this would be the sort of seed idea I'd be trying to place in order to sell more licenses.
productivity and maintainability IMHO should be the most important decision makers when choosing what to use. what tools allows me to be the most productivity as a developer? who easy is it for others I work with to change what I've developed? optimize later.
they are very successful solutions developed in a variety of ways. let's not forget that. arguing over what is best is missing the point.
Could argue "fair use" but since lawyers know you cannot afford to go to court...
Anyone happen to know?
Some of this article is extremely true, a couple things are utter bullshit though.
I'll go with the bullshit first.
1) source control
Source control is mandatory, so you have to do it anyway.
But for any Oracle architecture where there is actual PLSQL work beyond funky table functions, meaning triggers, jobs and their ilk, you need to keep around a large schema with enough consistent data, especially with all the issues one can have with the way Oracle optimizes the execution plan of its queries depending on the statistics of each table. You need this information when you develop, so you need amounts of data that approach the order of magnitude of the production system, which is potentially huge.
It's simply impossible for every developer to enjoy their own instance, unless each dev has his own server, and even then: there's far too much maintenance to be made on an active Oracle database, and once a certain scale in the schema has been reached it's practically impossible to maintain an acceptable level of performance on a frigging desktop, especially when your devs are doing more than just PL/SQL, but, also, you know, work on the other tiers.
So in the end, you have one dev instance, two at most, and you keep versioning of the code just for looks because unless your system is trivial (only reversible transactions), any procedure, job or trigger that is executed on a database will massively change it forever. So in the end if your business logic lies at least partly on PL/SQL code, older versions of sources are only kept as a trace, not for proper versioning.
Besides, there are so many encoding issues and your database is so critical that you can't even risk anyone commiting a file rogue-style because if the sql was edited with textpad, notepad, notepad++, created with touch, right click + new, saved as from toad or came from a copy-paste of an e-mail sent from an ipad, the encoding will be different and will mess up tons of stuff. Things like accents in code comments can be interpreted badly and "eat" the following line feed character, which in turns comments the next line, which can lead to ultimate data corruption if it concerns a piece of code that is invoked by a trigger.
This is a risk that you don't, ever, want to take. So your PL/SQL SVN server is only for the two/three people that are entitled to actually write the Patch. Don't leave anything to chance here. The rest of the devs follow the chain of command of writing something, testing on the dev server, crashing it and eventually ruining it, send their script by email or dropbox or whatever to the devs in charge of the patch, who then use SVN to version it properly.
2) Unit testing
Unit testing is cool and all, but if all it does is essentially compile scripts. You can only unit-test functions if they are isolated enough, and isolation is all relative when the code is in the database. Let's unit test a select table function: first you disable triggers on this table, then you insert test data, then you do your test, then you delete the inserted data, then you reactivate the triggers. Happy? No! Why? Because it will take me a week to write all the test case inserts & delete!
I'm very caricatural here, but this is what happens in real cases. Cases where the unit test is very hard, much, much harder to design than the code it's supposed to test itself, not because of bad logic or whatever, but because in the end, it's mostly data that's required, and sometimes a simple select unit test requires a metric ton of one-line table inserts and deletes, and there can be a factorial quantity of test cases for tables with more than a couple dozen columns.
"Just peer review" is the sensible and agile way to do it. TDD is not.
The rest of the article is mostly spot on. I loved the power of PLSQL for this particular system (railway undertaking system). I lack distance still, but I have no idea how we would have handled it had the customer said "no oracle, use [insert NoSQL tech of your choice]". The kind of calculations that were required was: "okay this train here is 5mins late and that train there is 6mins late, we have 6 thousands concurrent wagons circulating in the system that need to be rerouted before any train, anywhere, leaves its station, just warn me asap if this incurs any delay in the deliveries of high priority merchandise within the week, but if possible, fix it automatically yourself".