I also very much liked the idea of using `Range` headers for pagination (which should be out-of-band but rarely is).
I'm not convinced that this is the future of web development, but it's a nice refreshing view that contains a few very practical ideas.
Even if you don't care about this at all, spend the 12 minutes to watch the introductory presentation.
(1 - In fact with me it's pretty much a matter of policy: any external system should access data via views named for the external system. Then, as inconvenient as it sometimes is, Postgres' dependency mechanism will keep me right about which fields in which tables are depended upon by which external systems.)
I'm also unsure of using the DB's authentication system. While approachable, I'm at a place where imho, for most systems there is a conflation of users, accounts, and logins. IMHO, a user may only have a single account, but a modern system should support multiple logins... Supporting social/jwt/oauth style logins in addition to local database backed logins is generally the best option for public facing sites/applications.. and even then jwt/oauth will allow for more seamless integration with internal SSO options.
Then again, this type of approach may work well if you want to be able to use it as an additional abstraction of your database access from a UI backing API.
It is up to the developer to use this responsibly. I don't think the creator is suggesting to drop PostgreSQL instances directly onto the net. This would be ideal in a layered backend, and in fact, makes systems way more composable.
It's like all those web framework inventors didn't read past chapter 2 of their database manuals. So they wrote a whole pile of code that forces you to add semantics in another language elsewhere in your code in a language that makes impedance stark. PostgreSQL is advanced technology. Whatever you might consider doing in your CRUD software, PostgreSQL has a neat solution. You can extend SQL, add new types, use PL/SQL in a bunch of different languages, background workers, triggers, constraints, permissions. Obviously there are limits but you don't reinvent web servers because Apache doesn't transcode video on the fly. Well, you do if you're whoever makes Rubby on Rails.
The argument that you don't want to write any code that locks you to a database is some stunning lack of awareness, as you decide to lock yourself into the tsunami of unpredictability that is web frameworks to ward off the evil of being locked into a 20 year database product built on some pretty sound theoretical foundations.
Web developers really took the whole "let's make more work for ourselves" idea and ran with it all the way to the bank.
You'd have to pay me a million dollars a year to do web development.
I worked on a rails app that handled a billion requests per day. The problem isn't performance of the web framework, those are easy to load balance and split into C or cache when you need it. The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three.
And I know, I worked on a codebase in 1999 that did this because of the "performance gains". It ended up bricking the project due to inability to iterate.
Your argument has a non sequitur right here. A mess of [foo] is a mess; the layer it is in does not matter; the language it is in does not matter. A mess of application layer code is equally effective in preventing scale, security and effectiveness.
The original post is right. Web developers treat their databases poorly. A database is an interface to your data that maintains integrity. Maintaining integrity almost always means stored procedures, as some validation is not expressible as relational integrity and basic type validation.
Now, if you are at the point where your database fully guarantees integrity of data going in and coming out, a REST interface is a small step away. This project is very welcome.
 The typical web developer treats a database as a data store. It is also a data store, but a well designed database is much more than than.
GP is correct. Methods for scaling/optimizing the application layer are clear and well-known. Scaling the data layer is a huge challenge. This is why the market is filled with snake oil databases promising linear scalability and perfect consistency/reliability, etc.
Cassandra, HBase, CouchDB etc even MongoDB have built in scalability as a first order priority from day one and have been largely successful at it e.g. iCloud, EA Online, PSN. Databases like this are a nightmare to work with for smaller datasets but work incredibly well with larger ones.
It's always a shame to see HN act like you scale vertically and magically every problem is solved.
When this is seen (and IME it's a pretty minority opinion) I think it's there as a reaction to the massive overuse and hype regarding a lot of newer-gen DBs. There's absolutely no doubt that there are good uses for them, but those cases are pretty niche compared to the level of their uptake.
Its only in the past couple years they really started mentioning the fact it was "tunable consistency" blatantly rather rather than burying it in a couple places in the manual.
"Than (what will certainly be given the expressiveness and level of abstraction they provide) a mess of stored procedures."
Sure, it's difficult to scale ACID. But if what you need is a way to serialize objects, you'll probably be better off with something like Gemstone/GLASS, a document store or some other kind of object database?
If your problem domain actually fits working with structured data, then using an SQL system makes a lot of sense. The obvious example for "web scale" here is Stackoverflow. Sure their architecture has grown a little since it was 2xIIS+2xSQL Server -- but they got pretty far on just that.
I'd wager that because in almost every case I've ever seen it's true. You just don't tend to see every table in a normalized dataset bearing the traffic load.
If that is the case, rolling that particular piece of data out to a more easily scalable store will largely fix the problem, if caching, async writes and buffered writes didn't already.
Everything else can very easily sit in PostgreSQL, avoid race conditions, maintain data integrity, have permissions controlled and be accessed from multiple languages directly without requiring an API layer. Then you can use a foreign data wrapper to let PG query that other data source (mongo, couchbase, redis, whatever) and join the results with the other data in the database just like it's all one bit happy dataset.
As another poster said, a mess is a mess and honestly I don't know why he takes a shot at Rails since Rails has some of the best first class support for leveraging PostgreSQL features these days.
Wrote an entire post about it: http://www.brightball.com/ruby-postgresql/rails-gems-to-unlo...
There is only one database for everything in the business? Of course it doesn't scale. The problem you describe stems from solving every business request by adding yet another table to 'the' database.
It's a monolithic solution. It doesn't matter if you use database features or not. There is no difference between a mess of stored procedures and a mess of business logic classes. It's still a mess.
The problem boils down to the "the database" idea described earlier. There are very, very few normalized datasets that I've ever seen that have write scaling concerns on more than 1 or two tables.
Move those to a separate datastore that is built for it and you've largely solved your problem. Postgres can even connect to outside datastores to run queries against them for sake of reporting.
I always have a caching strategy (usually varnish in front of nginx) with Rails unless it's literally only supporting a handful of users, and anytime I need to support non-cacheable hits like writes to more than 50 or so concurrents I consider swapping in Node or Go or something reasonably performant to handle just those writes.
Lately I've been looking into Elixir as a Rails alternative for APIs for performance and scalability. I am very intrigued by a PostgreSQL based REST API.
The other 95% of your code can be slow Rails. You know those pages where a user adds another email address, or where they report a comment as being hateful, or where they select what language they want the app to be in, or where you have your teams page, or your business partners page, or your API docs and key registration / invalidation.
The database doesn't scale without pain though. You have joins, you're going to need to get rid of them. You have one table on a machine, you're going to need to split it. You have speedy reliable writes, you are going to have to either make due with inconsistency and possibly have a whole strategy to clean up the data after the fact or lose the speediness.
I'm intrigued about shuffling the serialization of JSON to Postgres, but that is different than what the OP was talking about.
I'm not ever going to argue for heavy stored procedure usage but there are definitely times when it makes sense and more times still when using the features in your database instead of setting up multiple different standalone systems for pubsub, search, json data, etc when your database can do it all makes sense.
It's very similar to the "you can always switch the slow parts" point with Rails to move a part to Go. You can do it all in PostgreSQL and then when you actually reach a point where you've grown it into a bottleneck, move it out.
Postgres isn't SQL Server and it isn't Oracle and it isn't MySQL. It's Postgres. It's a tool that you choose because of it fits your needs, not because somebody told you it was a good database. You choose it as part of your stack. If you are using PostgreSQL because you wanted a dumb datastore then you chose the wrong database and should probably reavaluate your options. That's like getting a Lamborghini to make grocery runs.
When you do a 1-to-many join and return the same fields very many times, do the binary drivers optimize that or is it return many times? With JSON serialization (or serializing to arrays), you only get the one row.
> They are a part of the stack, yes, but they aren't what makes the application scale to billions of requests.
These are not just part of the stack, these are critical components within the stack.
They are the core components of Facebook. Normal people understand that the characteristics of Facebook's architecture is unique to just Facebook. They can get away with sharding/colocating data that nobody else can. The rest of us have a tonne of integrated data that requires complex joins (whether at the application or database layer).
We use stored procedures. Not by choice; this is a legacy we're stuck with. It is nothing but an unadulterated disaster of a technology regardless of what you use it for. I'm talking 45,000 stored procedures here. 2000 tables. TiB of data. 50,000 requests/second across SOAP/web/desktop etc. It's hell.
Problems with stored procedures:
1) Performance. More code running in the hard to scale-out black box. You're just hanging yourself with hardware and/or license costs in the long run.
2) Maintenance. All database objects are stateful i.e. they have to be loaded to work. The sheer complexity of managing that on top of the table state results in massive cost. Add to that tooling, version control costs as well. Have you tried merging a stored procedure in a language which has no compiler and very loose verification?
3) Orthoganality. Nothing inside the relational model matches the logical concepts in your application. Think of transactions, domain modelling etc.
4) Duplication. You still have to write something in the application to talk to every single one of those stored procedures and map it to and from parameters and back to collections.
5) Transaction scoping. Do you know how expensive it is to introduce distributed transactions? Well it's a ton more cash when EVERYTHING is inside that black box.
6) Lock in. Your stored procedures aren't portable. Good luck trying to shift vendor in the future when you hit a brick wall.
Now I know it's popular to bash on Rails and I wouldn't use it personally but there are people using the same model on top of other platforms, like us.
Sorry but databases are just a hole to put your shit in when you want it out of memory. If you start investing too much in all of the specific functionality you're hanging yourself.
You've got to be joking, right?
Data is an enterprise's single biggest asset. A robust, consistent and performant store is vital. SPs can be written as garbage like any other logic, but in the right hands they are a perfectly valid tool for providing useful access to complex data.
The key benefit is probably familiarity.
I'm not saying they're an invalid tool, merely just a single tool in a batbelt of a million solutions.
> 3) Orthoganality
You've introduced that by treating a relational store as a "hole to put your shit in". It's not fair to blame the database for that.
> 4) Duplication
Not with the project that is the topic of this thread you don't.
> 6) Lock in
As I mentioned in my original comment, you can be locked to your database or you can be locked to your ORM/DAO/ActiveRecord/DB client library or whatever it is you're using.
Not using database features isn't the key to heaven anymore than using them is the key to hell. I just meant to point out that in my experience they are underused massively.
Orthogonality: I haven't introduced anything here. Very rarely does any conceptual model of reality fit into the relational model. It's more imperative than that. Everything is usually crudely shoehorned into it because it's a compromise that people are barely willing to make or because they don't understand how to model a system properly.
Duplication: there is duplication in there. The versioning is very inadequate and API stability is the key to success on this. Plus also, this is a minor part of the application to consider. It's no different to issuing SQL. The protocol is different, that is all.
Lock in: There is no ORM lock-in past the platform. If you isolate everything properly i.e. use command-query-separation then this is a non issue. It's trivial to replace the ORM. You can even do it piecemeal. We've done it. I yanked out bastardisd ADO and EF out and stuck Nhibernate in. If you couple all your logic into the database, no banana. That luxury goes out of the window.
They may be underused, but when your vendor pulls a 26% price hike on half a million quid's worth of kit, can you afford to bend over and take it?
It's a tradeoff, but not one I'm willing to make on medium to large scale systems where there is a capital risk.
One of the main features of an ORM has been abstraction from the intrinsic properties of that database. ORM was a concept that was popularised by the original Obj-C/EOF/WebObjects back in the day which supported retrieving data from any database you pointed it at. And it fully supported you enhancing it's access layer with database specific features.
1. Switching your database is not easy with or without stored procedures because it will involve down time for the application while the data is migrated, then verifying that it works as expected in the new database with that ORM. You hope for the best, but it's always more complicated to switch a database.
2. The ORM tends to lock you into the application stack. Switching a part of your application from something like Rails to Go when you need to performance tune is significantly easier and more common than switching the entire database backing the whole system.
Beyond those two are the harsh realities of working with large datasets. As soon as a dataset it non-trivially small relying on the application to do core work on it becomes self destructive by adding network latency and in many cases object creation (check some Rails benchmarks on object creation costs). It becomes a big deal.
This is not to say that doing the bulk of work in the ORM is bad or that everything should be done in the database, it's a matter of balance. The only dangerous opinions on the matter are the "purist to the detriment of all else."
Verifying uniqueness, exclusion and maintaining data integrity should be the job of the database in most cases. That is what it's good at. Performing actual business logic on that data should not unless there is a significant performance based reason for it in most cases.
In Postgres the "stored procedure" thing is a little bit different because they're significantly more valuable thanks to the volume of functionality built into PG. Everything is basically a function in PG.
In PG, you can use functions to create indexes and when the function is used in a where clause that index will be used. You can use functions to create constraints, unique indexes and even notify outside process that are listening of changes in the database with pubsub.
PG is a heck of a lot more than just a "datastore" and that's why these discussions are important. If you want a generic dumb datastore...there are databases built for that. PG is built for a whole lot more than that.
Here's a very incomplete summary: http://www.brightball.com/postgresql/why-should-you-learn-po...
2) This is not a reasonable objection, I could replace "loaded" with "compiled" and your non-argument would make just as much sense. The alternative does not make the "complexity" go away, it just distributes across multiple languages in your application and database.
4) Another non-argument against stored procedures. For example, suppose I have a table "time_series(series_id INT, tstamp TIMESTAMP, val NUMERIC)". A common need would be to accumulate all points (tstamp, val) associated with a series_id. Following your logic, you either end up with tons of con the application side sending similar variations of a query that looks like "SELECT tstamp, val FROM time_series WHERE series_id = $x ORDER BY tstamp" or you create one application-level module that acts as an abstraction around a query like that. In the first case, you're doing massive duplication. In the second case, you've essentially made a stored procedure that is distributed across your database and application and all the issues you raised of having to write something to talk to it apply just the same.
5) Again, no. DBs are precisely to place to deal with issues like this as they have means for dealing with things like foreign tables. The application-level alternative just means re-inventing it all yourself and you're probably going to make a lot more mistakes and write a lot more code that way.
6) Non-argument (applies just as well to Rails, Python, Linux, etc.)
While we're on this subject, RDBMS are no better at storing data than any other technology out there. In fact when you start thinking abstractly like this, other tech such as Riak makes sense for a lot of workloads.
The only real benefits of RDBMS' are fixed schema, fungibility of staff, the ability to issue completely random queries and get a result in a reasonable amount of time and the proliferation of ORMs.
 Caveated on insane design decisions like MyISAM storage engine and MongoDB as a whole.
(One might argue, of course, that SQL in itself turned out to be a lousy interface protocol for relational data processing, and that it caused a lot of pain to begin with. But that's a different topic.)
I don't really have a firm opinion either way, but it's not as clear cut as you are making out.
1) Stored procedures aren't that much faster than issuing plain SQL over the connection. The main performance bottlenecks in a RDBMS are cache and IOPS. Regardless of where you execute those, they are all inside that black box after the query is parsed. You also get the added pain of cached query plans which fuck up performance when the query optimiser makes assumptions about table statistics and stores them with a stored procedure. (SQL Server and Oracle at least are guilty of this).
2) The only place I've had SQL injection attacks in the last few years is where people have used dynamic SQL inside stored procedures (sp_executesql) and not escaped them properly. Outside of that, both ORM and proper parameter binding make this entirely irrelevant.
It's completely clear cut IMHO.
That was once the case, but every major DB now caches execution plans for commonly run queries.
How does that help you vs. prepared statements in any typical language?
I've seen SQL statements in SPs that are concatenated (|| in oracle) to varchar fields from a table and I thought that would be just as vulnerable?
But don't let that get in your way of ignorantly generalising about web developers.
Here is interesting talk about database migrations and stored procedures and unit tests: http://www.pgcon.org/2013/schedule/events/615.en.html
Also, DB procedures are not easy to "debug" in the traditional way, but SQL client is basically the first REPL every programmer becomes familiar with. You can easily step stored procedure by running it's commands one by one, unless it's fancy Oracle forall loop with cursor or something (and the cursor select can still be selected as normal).
Also, databases tend to have more strong data types than programming languages in general so putting constraints in DB means, that bad data are not savable in the system.
Exactly, because those things are extremely important to how code gets shipped and delivers value to the business. Stored procedures become a huge risk to future development which ultimately means it's a risk to the businesses ability to deliver value. What happens when you need to change DB vendors because the business has been so successful that you've outgrown a relational database? You have to rewrite the ENTIRE MC portion of your MVC application. Why would someone ever do this?
Avoid state at all costs. Stored procedures are stateful. Schema and migrations is pain enough already.
Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.
Write me a constraint and key arrangement which is unique across two and three columns in separate groups.
No. You're wrong.
What do you mean by that? How is having a bunch of queries in a stored procedure more "stateful" than having the same queries in the application?
> Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.
Postgres gives you metadata about the error, though the error message will still be a generic "CHECK constraint violated" or some such.
> Write me a constraint and key arrangement which is unique across two and three columns in separate groups.
I'm not sure what you want to see based on that description, but surely you're not advocating enforcing unique constraints in the application?
If I have to load the stored procedure into the persistence engine then that step is required. This is no more stateful than queries in the application but it means that the relevant state in both the application and the database engine needs to be reloaded and constantly sychronised. Ergo, two times the work.
CHECK constraint violated is no good for humans. Prevention is better than cure here.
Why shouldn't I enforce unique constraints in the application?
1. Open a transaction
2. Get a user by name from the ORM.
3. Exists? Tell user that the username is already registered.
4. Doesn't exist? Save new User instance.
5. Commit transaction.
Steps 2 and 3 can be as arbitrarily complicated as you need them to be, are fully testable and cheap with anything that uses MVCC.
You should to both. For all the reasons you mention, it's often cleaner to just do it in the application especially when you can use a framework with a simple "validate_uniqueness" flag.
But, what you're describing is also the very definition of a race condition. It's the same reason you don't increment counters by retrieving them, adding 1 to it and then saving the number back to the database and instead pass in an increment command.
Check it in the application but let the database make sure it doesn't get violated in a race condition. There's a significant amount of either/or in this entire conversation (not just you, the whole thread) when the database absolutely can and should be leveraged for certain things.
It's extremism and purism where the problems get introduced (in both directions).
I mean currently my dataset is so small I don't need stored procedures, I barely do anything more than CRUD. Okay I have a bigger GROUP BY query but that is all, and at one point I load a HUGE dataset into my application memory (1000 rows) but that works REALLY REALLY fast in scala and I tried to create a stored procedure around it, but I failed, and the application code uses the dataset to generate a big calculation. Currently I just have a Map<String, Map<String, List<Row>> which is easy accessible and usable for my calculation. I mean I could've done similar with stored procedures but the performance gains are really low.
Preserving data integrity tends to be a much more worthy use case for database logic than retrieval display.
This tightly couples your database to your application. You can no longer guarantee that your database is reliable when used otherwise.
1. Insert new User instance
2. Unique constraint violation? Tell user that the username is already registered.
First problem is that the SQL standard provides no way to make this work portably on any standards-compliant database. So right there you are going to have to code to the database to one degree or another.
So, let's say you want to make this work in Postgres. Now, you'll need to be using Postgres 9.0 at least; otherwise your uniqueness constraint won't be a uniqueness constraint.
Try this, in any version of Postgres. Open up two psql sessions. In one, run a `create table unique (x text);`. Then run `begin isolation level repeatable read; select * from unique where x = 'foo';` in one of the sessions. Repeat those two commands in the other sessions.
Neither session sees 'foo'. So now both can go ahead and run `insert into unique values ('foo'); commit;`. Both transactions will succeed, and you can confirm that there are now two instances of 'foo' in the table.
In fact, `begin isolation level serializable` in PostgreSQL 9.0 or later is the minimum isolation level to make this work. And, you will need retry logic around the transaction in case of a serialization failure. (Perhaps your DB access layer or language would hide this latter detail from you, or perhaps not.)
In PostgreSQL 8.4 and before, serializable and repeatable read were equivalent, and both were still SQL standards compliant. In PostgreSQL 9.0, the repeatable read isolation level stayed the same, while the serializable isolation level was strengthened.
Unless you can accept a certain level of degraded accuracy by using a probabilistic construct such as a Bloom filter, by far the biggest cost of maintaining uniqueness is the index. And you'll need that index whether you use the database or the application to enforce uniqueness.
And, judiciously pushing computation onto a database can actually be cheaper for the database as well as its clients. This scenario is likely to be one of those situations.
Well, sure, an application should respond to DB errors by presenting appropriate messages on the UI, just like any other errors it encounters. You should only see "CHECK constraint violated" if you are bypassing the app and using the DB. Otherwise, you should see something nice provided by the app.
> Why shouldn't I enforce unique constraints in the application?
Because you should do it in the database whether or not you do it in the application, and then once you have, well, DRY.
Stored procedures are no more state than application code is.
What does "unique across two and three columns in separate groups" mean? I get that its something more complex than a simple multicolumn uniqueness constraint, but not what it is supposed to do.
I suspect that whatever it is can be done with PostgreSQL -- possibly using the (relatively) new exclusion constraints -- but I can't quite be sure without more clarity on what you mean.
Why? I have never had problem with any of these. SPs is just imperative code like any other imperative code.
In my experience, every stored procedure that is larger than 2-3 lines is a headache.
Of course, the reality is that people who use them reasonably do exist and are probably in the majority. You just rarely hear them talk about it because I suspect that they hold the same views about Stored Procedures, Constraints and any other DBMS feature as they do with any other software development tool ie. Use the right one for the job.
Sometimes, the logic has to be in database, because it is single point of truth and because many application servers are hard to synchronize with regards to "you get max 3 attempts at login" or "you have to have enough balance to do bank transfer".
Sometimes, lot of your logic is in database, because database can do a lot of things in really fast and practical way, like aggregation and reporting and various data exploration tasks.
Sometimes, the database is just dumb store of object oriented data.
It depends on the application.
You can blame MySQL 4.1 for that :(
Most people who call themselves "web developers" haven't even heard of PostgreSQL, or even if they've heard of it, have no use for it because their usual clients are stuck with MySQL-only web hosts who have only just managed to upgrade to PHP 5.3.
Ditto for any theme or plugin that tries to be compatible with all versions of PHP that WordPress itself supports.
IMHO the term "developer" should not be applied to those that can just ship but rather those who can also build.
It doesn't matter if they are web, desktop, nor low-systems developer actually
Most auto-called web developers are just "web masters"
No True Scotsman puts sugar on his porridge, and No True Developer just installs WordPress.
On the other hand, even Rails and Django encourage you to use the ORM whenever possible, so even a "developer" who builds apps on a modern framework is unlikely to be familiar with advanced SQL features.
Do people really consider ./configure && make && make install and its equivalents to be development now?
Last time I checked the WP install process was something like...
1. download zip, extract
2. change your some config file
3. upload whole folder using FTP
4. go to /install or something, and from there..
5. click, click, click, edit text, click, click, click ...
That was only if the web developer was in hardcore mode, otherwise it was just _log into cpanel to use one-click installer_
We don't need another flame war here.
And yes, most of the web is build on Wordpress/PHP - but you don't need to be a developer to install Wordpress.
I'm just stating what I believe to be a fact: that the majority of web developers in this world never think of PostgreSQL as an option. I don't care whether that's a logical thing for them to think. It's just a fact, whether I like it or not.
If you think I'm wrong about the facts, please feel free to open a phone book in any part of the world other than the Bay Area, call up a decent sample of people who self-identify as web developers, and find out what percentage of them have ever heard of, let alone used, PostgreSQL.
I'm just stating what I believe to be a fact...
It's just a fact, whether I like it or not.
(I happen to agree with your opinion, but the semantics here bug me.)
Anecdotal evidence: I've interacted with dozens of other people who call themselves web developers over the years, and most of them (outside of Silicon Valley) have never used PostgreSQL, nor any advanced features of SQL in any other RDBMS.
Objective evidence: the large market share of WordPress, Drupal, and other content management systems that don't use any advanced database features; as well as the large market share of frameworks such as Rails, Django, and Laravel that encourage developers to stick with the ORM and not care about advanced database features.
PostgREST is great because it lets you kickstart a CRUD application with ease.
I'm mainly a node.js developer nowadays and I'm using some frameworks to kickstart APIs for my clients - and then I jump in and add features.
What I really want is a solution to build a API server which deals with authentication, exposing my models through REST and other boring and repetitive stuff.
In this way I don't have to focus on everything, but just on the specific problem I'm solving.
I don't think there is a valid solution out there right now.
That's why I'm contributing to PostgREST and I hope to see even more features coming out of it (eg: better authentication, maybe with 3rd party logins).
1.) The sprocs become insanely complex because they have to be shard aware.
2.) You slowly start moving more of your code that was in sprocs to your application so now you've got two problems.
As for hiring, put out an ad for an engineer with pl/sql knowledge better yet put out an ad for someone who wants to learn and use pl/sql. Good luck finding enough of those people to get any significant work done.
Regardless of the other points people brought up...
Sharding a database with stored procedures and constraints as you advise is a nightmare because you now have a completely separate deployment process [deploying stored procedures, if you think this doesn't require a deployment process across a sharded infrastructure...I have no words].
Using an internal web framework is much, much easier than maintaining two separate deployment processes. Especially when one of those processes has to take down nodes to avoid some shards having different stored procedures than other shards.
APIs should be about encapsulating business logic. Databases should be about storing data in a reliable, predictable way.
We do. At least some of us, and honestly, it's not something I think about as being exceptional. I don't always use them, but I much prefer having a nice API of SPs to use rather than having to have custom SQL all over the place. DRY applies to writing queries just as much.
And what happens when you start applying complex business rules that needs to scale? So many questions about this approach...
Throw in ETAGs and you can do a lot of cool stuff to make things scale really well.
But sending SQL from the client is dangerous, and in this forum you'll get some interesting looks suggesting that.
More here in the value of services (Yegge's rant): https://plus.google.com/+RipRowan/posts/eVeouesvaVX
A database IS a service. It's just not a 'restful web service'. Making it one doesn't gain you any useful abstraction for SOA.
Quite unclear why you'd layer all the HTTP and marshalling overhead on backend service calls, other than to ensure you always have slow requests.
I do. Most apps we write (government stuff) are CRUD apps that mostly handle data from the user, and then apply some logic to that data.
You can write custom logic in stored procs no problem. It's even a lot faster than doing it in java/php/ruby/.net or whatever because there are a lot less layers between you and the data, and the procedural language you use inside the the db is explicitly made and optimized for this purpose.
"The argument that you don't want to write any code that locks you to a database is some stunning lack of awareness, as you decide to lock yourself into the tsunami of unpredictability that is web frameworks to ward off the evil of being locked into a 20 year database product built on some pretty sound theoretical foundations."
Does Oracle or Sybase own Postgres? Nope.
There's nobody with a legal hand on your throat.
I wouldn't use it a for a "normal" web app, for exactly the reasons you state.
I find it intriguing, but maybe I am just one generation behind and you were to say:
"Been there done that. This strong dependency on the database was really not a good idea in the long run because... "
In the old days, DB vendor independence was considered important, because the Free databases were not considered good enough for Enterprise use. It had to be Oracle or IBM. Even if you were starting out on MySQL, you wanted to retain portability for when you moved to Oracle. Hence building ORMs as an abstraction layer allowing vendor independence and more Enterprise-y Java.
There's also the "demarcation" issue. Traditionally databases were administered by change-averse DBAs who didn't want you stealing their CPU cycles to run your stored procedures, and were generally extremely averse to change.
This is probably just as significant as vendor independence or technical issues. But I remember it a little differently.
In the old days you'd have programmers who did the code and the DBAs who were tasked with making sure the data wasn't broken. They would write SPs, use constraints and triggers etc. Depending on the team programmers might do these as well or instead, but it was the DBAs domain ultimately.
Then programmers wrote ORMs and started doing all that stuff in code and somehow managed to seize control of data validation and semantics from the DBAs.
it does same kinda stuff + capable of loading Node.js modules, compatible with MongoLab's REST API and Firebase's real-time API
Wont this lock you in with very hard coupling between your db schema and public REST API?
As a result, the coupling to the schema does seem to be concerning as moving everything to SQL views appears to be a high amount of overhead if you're already successfully relying on an ORM.
From the applications side, it should be completely transparent to replace base tables with views.
From the database side, assuming you keep the views and the base tables in the same schema, for the initial transition, you just need to rename the base tables, and then create views with the names of the original tables referencing them; they'll be simple views, and so, in postgres, automatically updatable, so you won't even need to explicitly define update logic.
It depends on what the DB does, really - eshop app + db might be the first one, while operational data warehouse would be the second. Usually you end up with something in the middle - like the eshop app, that provides views with summarized data to be ETLed to reporting.
If you want to use Haskell (easy binary deploys, fast and strict type safety for cheaper maintenance, less bugs, easy refactoring) you could have a look at servant.
I'd say if the OPTIONS would return a JSON Schema (+ RAML/Swagger) instead of the json-fied DDL, it would be even more awesome. With a bit of code generation this would be super-quick to integrate in the frontend then.
If you are using this as a web server persistence backend, I would agree with the first, more or less accept the second and reject the third. HTTP + JSON serialisation are way slower for that kind of job.
If you are just exposing the database using only the Postgres, in that case is interesting, however, I have concerns about how more complex business logics would work with such a CRUD view.
CRUD over HTTP (or an "access API") should be a first step, not your end goal.
Last thing: is Denodo open source? It is not listed at "why use Denodo", so I guess not...
Second, I'm an author of a distributed database (VC backed, open-source), so I'd like to respond to some of opinions on databases voiced in this thread - particularly in the branched discussions. If you aren't interested in those responses, you can ignore the rest of my comment.
- "You'd have to pay me a million dollars a year to do web development." Don't worry, most webdev jobs are about a tenth of that. If inflation goes up even a little bit...
- "The problem is scaling your database", I can confirm that this is my experience as well. But there is a very specific reason for that. Most databases are designed to be Strongly Consistent (of the CAP Theorem) and thus use Master-Slave architecture. This ultimately requires having a centralized server to handle all your writes, and this becomes extraordinarily prone to failure. To solve this, I looked into Master-Master (or Peer-to-Peer / Decentralized) algorithms for my http://gunDB.io/ database. Point being, I'm siding with @3pt14159 in this thread.
- "Sorry but databases are just a hole to put your shit in when you want it out of memory", I write a database and... uh, I unfortunately kind of have to agree, probably at the cost of making fun of my own product. You see, the reason why is because most databases now a days are doing the same thing - they keep the active data set in memory and then have some fancy flush mechanism to a journal on disk and then do some cleanup/compression/reorganizing of the disk snapshot with some cool Fractal Tree or whatever. But it does not matter how well you optimize your Big O queries... if the data isn't in memory, it is going to be slow (to see why, zoom in on this photo http://i.imgur.com/X1Hi1.gif ). You just can't get the performance (or scale) without preloading things into RAM, so if your database doesn't do that... well what @batou said.
Overall, I urge you to listen to @3pt14159 and @batou. PostgreSQL is undeniably awesome, but please don't fanboy yourself into ignorance. Machines and systems have their limitations, and you can't get around them by throwing more black boxes at it - your app will still break and so will your fanboyness.
In fact, I think I love any musical reference in software :-)
Some of the best ideas and tools on HN are met with so much negativity it reminds me of Reddit, where the small percentage of people who get off on putting others down so they can feel good about themselves dominate the comments.
Good on you cdjk, this is exactly what I was looking for. Thank you!
I have made the mistake of spending too much time slavishly implementing some pattern only to figure out later that it was just serving my need to implement the pattern, versus just getting the job done with a simple procedural script.
"It provides a cleaner, more standards-compliant, faster API than you are likely to write from scratch."
"If you're used to servers written in interpreted languages (or named after precious gems), prepare to be pleasantly surprised by PostgREST performance."
And it would be great to have 3rd party logins working in Kong
I know it's tempting to do that, but spend some time thinking of your data and what do you want to expose.
I personally used some prefix for that, such as "service_" or "public_". All stored procedures (in PostgreSQL speak: "user-defined functions") that have this prefix are accessed from the client. Everything else is internal. Of course, it would be even nicer if the REST framework would enforce that convention.
This is especially nice with JSON aggregation and SUB SELECTs, where you can directly aggregate your objects and lists of sub objects within the DB query, and generate the whole JSON result directly in the DB.
Should add some header to say that it's JSON, or add a .json file extension for the main page data.
Very interesting project though.
The server sends `Content-Type: application/json` and provides no header related to caching. Browsers that do anything but fetching the resource again are not spec compliant.
Also, the only browser to ever look at the extension of a file in the URL was IE (https://msdn.microsoft.com/en-us/library/ms775147(v=vs.85).a...) and they have long since stopped doing that as all it was doing was cause security issues and screw with web developers.
And yes, the header seems correct, Checking, that's the API demo, the GUI is separate. Thus the confusion.
Yeah, headers are the right way to do it, but a different path is also the right way to do it, and extensions like .json can make that simpler in some cases. Or /api/ prefixes.
Accept: text/html, application/json;q=0.8
If a RESTful client can only accept JSON, it should send
Adding '.json', path specifiers etc is a kludge; it requires web server support to actually work properly (it requires the webserver to send the right MIMEtype in the header). The browsers ignore it (I think).
For example, occasionally you'll click on an image (I see it about once a year) and you get back a stream of weird unicode. This is because the webserver is returning the .jpg as text/html or whatever, and the browser is rendering it as such.
Likewise, when you go on https://raw.githubusercontent.com/resume/resume.github.com/m... you are presented with plaintext because it has returned as Content-type: text/plain, even though the data is actually HTML.
there's no specification that says it must be so.
Browsers shouldn't care about file extensions, ever, but some versions of IE did.
I'd forgotten about the days when we used to add something like &ft=.pdf to the end of querystrings so IE would recognise the file as PDF. I can't remember the other tricks, but there was a whole raft of things you'd do to force downloading of content, or not.