Also, the lack of author identity was frowned upon.
Lastly, the conjecture and attitude towards Microsoft lacks some substance.
Conclusion: The author is free to write whatever he likes, but take this resource with a pinch of salt.
I use both Postgres and MS SQL Server professionally and whilst philosophically I prefer Postgres, for practical reasons I truly prefer MS SQL Server, if only because of its excellent development tools.
The other is the barrier to entry. I was trying to get NAV2015 working for a demo using some data for a project, I spent at least 20 hours trying to figure out who to pay (Azure comes with it installed but no license -- why not just sell a bundled license into the hourly cost?) -- eventually gave up and set up OpenERP in like 30 minutes, spent 3 hours adding the feature I needed and was done. NAV is unquestionably better for this project, but I'm not going to enter into a support contract with a company just to get a license.
I'm guessing it comes down to virtual cores and such, right? Although I'm surprised how many stupid-simple mistakes people make just by not reading even the basics. (Wait MSDN includes several Office keys, so obviously they don't want us to buy Office licenses for administrative assistants.)
And no because you still get nailed -- it's not like your reseller is the one being audited, it's you.
You're supposed to get CALs for printers accessing MS servers.
Don't ever use an MS server for DHCP, every single client needs a CAL. How am I supposed to even predict that if we allow personal devices on the guest network?
It's beyond frustrating - and the products are great, they just make it impossible to actually ever be in compliance and you need to spend an insane amount of time just handling licenses. It's really the reason I avoid MS as much as possible, and I really love their dev tools (Visual Studio is beyond awesome, I love C#.net) its just too much hassle.
I'd be surprised if you get "nailed" if a reseller went over a scenario and licensed you. Worst case is if they can prove maliciousness and fine you. If the reseller calculated things and came to a reasonable (but lower) number I'd be surprised if that's gonna really hurt.
Agreed though that MS's licensing is terribly annoying. But it's a lower Total Cost of Ownership, right?!
So there was not any substancial critique and responce to the specific points he makes?
> Unless otherwise stated I am referring to PostgreSQL 9.3 and MS SQL Server 2014
Which makes his opinions timeless and gives context. A date on it would be welcome, but just from the software version you can figure out if it applies to you or your scenario.
I read this months ago, so just went over it very quickly this time, and the first thing I did was to look for a date at the top and bottom.
- The tools included like Management Studio are just great. This is totally next level to the Postgres tools.
- Using multiple CPU cores for a single query is really helpfull in my scenario.
- Easy continous backup to the cloud.
- Included Integration Services, Analysis services are also easy and enugh for my usage.
- Is running on linux, to it's cheaper. Even when we use SQL Server as Bizspark (for free now), the Azure Windows VM for it costs us much more, than VM with Linux. And of course when we want to cluster our DB, Postgres is even cheaper.
- Great JSON support. There are parts of our project where it's helpfull.
- Better configurability, like WAL, checkpoints etc. We have much better write performance on postgres than in sql server (probably just our case).
The other things really do not much difference. Both DB's can be extended, and extensions may be written in many languages. Both achieve great overall performance, both have strong community and a lot of documentation.
If you can expense the cost of the license you should definitely check it out.
Plus it's completely cross-platform and runs on Linux, Windows and OS X.
Which sometimes made me wonder if, with Postgres becoming more popular by the day, there would be a market for a management tool cheaper than Navicat.
One thing I like about pgAdmin3 is the ability to only execute the query you have highlighted and also the scratchpad. It's nice for exploratory querying. In Navicat, I like switching between various tabs where pgAdmin3 makes me switch between windows.
Both have some hilarious idiosyncrasies. When one locks up for no reason, I generally just alt-tab to the other.
Recently I was using pgAdmin3 and set my production database color to red. I happened to notice that the color didn't repaint back to green when I switched to my dev database. And Navicat seems to consistently hang whenever the network dies or burps mid-action, requiring a command-line kill.
So yeah, it'll have better INSERT performance, but the queries will be slower. There's really no way around that. A large data set on disk that's out of order will always be slower than the one that's in-order.
IMO MSSQL makes the right call for the vast majority of use-cases.
PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support. No partitioned view support. No sane backup/restore process. It's a great database if your primary concern is licensing cost. But if your primary concern is operational cost and even just multi-gigabyte data sets it's really frustratingly rudimentary compared to what MSSQL delivered over a decade ago.
But that's just me. It's free. And I'm thankful for that. I just find it really frustrating that PostgreSQL supports querying on JSON, but doesn't support backing up and restoring the database in binary format.
> No partitioned view support
> No sane backup/restore process
> doesn't support backing up and restoring the database in binary format
http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.h... is half of what you want, but I expect you're wanting something I don't expect.
Which seems like a reasonable baseline expectation if you were to ask a lay-person what they'd imagine a database backup to be. pgbasebackup can't do that without some serious constraints (like, you only have a single database in your PGDATA).
Table clustering is mostly betting on "I think most access is going to happen using this columns" which might be good choice or it might not, but it's not something that should happen implicitly or by the only choice at all.
I think Markus (2) explains this much better than I do, so I'll just link his text.
(1): which is pretty stupid name, btw - Oracle call it "index organized tables" which is much clearer way to describe the concept.
Like materialized views, automatic update support for simple views was introduced in 9.3.
(this makes me realize that postgres can do some pretty cool ETL by itself - and you can express a lot of it in simple SQL terms with foreign data wrappers and materalized views. One on-demand refreshed materialized view can be able to fetch data from many different sources, transform it and provide local access)
MS SQL server, on the other hand, can't do materialized views that update on demand, which makes them hard to use on complex and large datasets, and it makes the database less predictable (inserting one row may be simple operation or it may change 10 tables, you'll never know).
Materialized views in MS SQL have rather bad support for aggregation functions and other computations - you can't compute average in them, for example, as AVG is not supported and doing SUM/COUNT is also not supported because of the division, etc.
I'd say that materialized views in MS SQL are more nice for caching or something like that, but you can't very well use them for analytics or ETL, like you usually see in Oracle, though Oracle can do both on-demand and automatic refresh, of course.
EDIT: the materialized views thing was one of my most amusing experiences with SQL Server. I tried to use materialized view instead of trigger for updating data dependent table in geographic application and I needed to compute average and sums for zoomed out map layers.
The way you add materialized view is by itself confusing - you add a view, then make index on that view, then select from it, but you have to use special keyword so the system uses the materialized view and not just the view.
I tried to use AVG but was told by the server "AVG is not supported, use SUM/COUNT". I tried SUM/COUNT but was told "COUNT is not supported, use COUNT_BIG". I tried SUM/COUNT_BIG and was finally told "using arithmetic in materialized view is not supported".
To this day, I have no idea why one error message suggested doing something that's impossible.
On Oracle, this is also useful because it works kind of like creating and dropping partition, instead of doing delete then insert, and that's better for the DB because of how Oracle deals with tablespace and blocks (and Oracle's truncate is non-transactional, I think), but that doesn't do much on SQL Server vs. postgres discussion :)
Well, you could just leave everything as a heap in MsSQL, someone might throw things at you, but you could do it.
Or you can pad the index to allow for inserts (or do some partitioning), and schedule some rebuilds to augment the fragmentation.
CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool. SQL Server comes with SQL Server Integration Services for that kind of thing.
Ergonomics of dropping and creating tables and stored procedures - the author's example is probably the toughest way I can think of to drop a table. It's easier to check sys.all_objects (which will catch anything - functions, views, procs, etc).
Operating system choice - well, in 2015, if you're going to mention that, you should be thinking cloud services anyway. They're both available in the cloud - and at which point, who cares what OS it runs on?
Goes on and on. I'm a SQL Server guy, and if I was going to make a list of how to choose a relational database platform, here's what I'd probably list:
* Up-front cost (license, maintenance)
* Ease of finding staff, and their costs
* Ease of finding tooling, and its cost
* Ease of finding scalable patterns (proven ways to grow it)
I don't think either platform is a loser on those counts - it's hard to go wrong with either one.
Obviously Windows would be IIS, which means overall the OS choice is a nice one. The fact that you can choose which one to host with, which means you get something comfortable.
Have you seen Windows Core and PowerShell? An increasing number of admins are doing just that - it's how we work at Stack Overflow, for example.
SQL Server is probably the best case, since the UI is a wrapper around TSQL and they expose the scripts at any point.
Then there's fun stuff. Like Invoke-WebRequest going super slow because it has a terribly implemented progress bar. That's right: out of the box, you can't download files with the shipped downloader, because it shows progress (like curl) and increases transfer times by a couple orders of magnitude. Come on.
I just do SQL Server consulting for 'em from time to time. They're ridiculously sharp guys, so they don't need me much anymore.
Why? CSVs are portable and much easier to deal with (when exported correctly, which PostgreSQL does and MS SQL does NOT).
They were usually processed using Oracle external tables though.
Just finished a multi-terabyte data warehouse project where all data is loaded using SSIS and ControlM and it works great! The dev process was also easy and trouble-free.
Anyways, yes, it's very powerful and an excellent tool but I always have to describe it as awfulsome. SSISDB has made great improvements to the setup and promotion of objects which previously could be a pretty big headache (especially when you're not the one deploying etc). Functions for the various tasks/settings can be hidden and it can take a while to convince newcomers that the options are available, you just have to...poke around a bit. It comes with a substantial amount of controls.
The metadata/conversion of data can be frustrating but automation greater improves this with a quick controlling framework and the sys tables, it also makes moving data across hundreds of tables a breeze. It is very fast and very stable and I'm glad it's getting improvements and having focused development.
I'm so tired of this. Just because you don't already know Powershell and are too lazy to learn doesn't mean it doesn't exist. A know-nothing Windows user might as well say, "If I want to select and move arbitrary files on Windows, I can point and control-click in seconds. How would you do that in the Linux CLI?" In both instances, it comes across as ignorant to anyone who actually knows the ecosystem being derided.
It's been a while, but I am pretty sure all you have to do is put GO before/after the CREATE PROCEDURE. I'm absolutely positive there's some way around it, because I've run many, many such scripts on SQL Server without manual intervention.
EDIT: Yes, I just fired up a VM and ran this and got the expected results with no errors.
CREATE DATABASE HackerNews;
CREATE TABLE dbo.Test (id int IDENTITY(1, 1), name varchar(20));
INSERT INTO dbo.Test (name) VALUES ('Amezarak');
CREATE PROCEDURE dbo.sp_QueryTest
SELECT * FROM dbo.Test;
In my personal opinion, MSSQL (including the tooling around it) is awesome and possibly one of Microsoft's best products. I actually regret not getting to use it anymore since a) my current job doesn't use it and b) I'm not shelling out for a license for my side projects. Postgres is definitely my next pick, though, and both are miles ahead of MySQL. I understand that MySQL is "good enough" for most people, but it's always painful going back to it and inevitably remembering almost all my favorite features don't exist. I'm stuck with it on a side project and it's frustrating.
There's some other things like that in the post, some coming down to what seems like syntactic sugar and feels more like a difference that doesn't really need to be noted (like the types/dropping piece).
I work at a MSSQL shop, and all of us know and are convinced that PG is better. Most of us use PG for our side projects and some of the dev's don't even use windows that much, with some custom MSSQL plugins we've built for linux. However, the problem still exists, of how do you port a ton of Databases over to Postgres? We're a multi-tenancy shop, so close to zero downtime is very important, and it would get really complicated if we ran multiple production versions of our app, one with a PG adapter and one with a MSSQL adapter.
A cursory Google search will show that you aren't going to get a ton of help converting them, not to mention the overhead of switching 20 developers from MSSQL to PG overnight.
This website is however excellent at convincing people to use PG over MSSQL. Perhaps, given the direction that Microsoft is going, they'll open source MSSQL overnight and it will become something competitively similar to PG in the long run.
TempDB, at least last I looked, doesn't really scale too well.
Just an observation.
If you're already a licensed MSSQL customer, I'm not sure what advantages PostgreSQL could really have compared to it's slower performance and much higher operational costs.
MSSQL doesn't have a featured called materialized views, it has a feature called indexed views which happens to implement a fairly robust version of the general DB pattern called materialized view.
PG has less mature materialized views (which are called materialized views) starting from 9.3, with additional features in 9.4, and which can be expected to mature further over at least the 9.5 and 9.6 cycles, from what I've seen various places. Certainly, SQL server is ahead on materialized views, but its not an "SQL server has them, Postgres doesn't" kind of thing.
Personally, I saw materialized views that refresh on demand used much more than materialized views that refresh automatically, even though Oracle can be both.
When would anyone really need PG's version of this feature over creating a table from a query? Out of all the usages of "I need an indexed/materialized view", PG covers essentially none of them, at least that I can think of.
What specialized syntax for materialized views and related configuration provides is some combination of clarity of intent, developer convenience, performance improvements over the most naïve update implementations, and additional information for the planner to use to optimize queries.
Postgres current level of support for named-as-such MVs provides basically the first two benefits for certain situations (and a little bit of the third in certain situations via REFRESH MATERIALIZED VIEW CONCURRENTLY.)
The backup/restore procedure in PostgreSQL (last I looked, latest I've used is 9.2) is just statement generation. It's not a binary backup. So it's bound by INSERT performance.
Which is rage inducing when you have even just tens of millions of rows.
MSSQL'97 could do a backup/restore in a small fraction of the time. And if you just wanted to move data from Server A to Server B? You could link servers and just SELECT INTO.
CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.
When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. Also, setting the table's FILLFACTOR storage parameter to less than 100% can aid in preserving cluster ordering during updates, since updated rows are kept on the same page if enough space is available there.)
When a table is clustered, PostgreSQL remembers which index it was clustered by. The form CLUSTER table_name reclusters the table using the same index as before. You can also use the CLUSTER or SET WITHOUT CLUSTER forms of ALTER TABLE to set the index to be used for future cluster operations, or to clear any previous setting.
In practice, you can recover many of the advantages of clustered indexes in PostgreSQL (along with the disadvantages) with a covering index eligible for https://wiki.postgresql.org/wiki/Index-only_scans.
What if I fat finger a table drop for Client A? With pgbasebackup the best I could do is set up an entirely new server and jump through some hoops. With MSSQL I could just restore the single client database to a new database, very quickly, and SELECT INTO the missing data from clientdb_a_backup to clientdb_a.
It's just a lot simpler, a lot quicker, and a lot more flexible.
PostgreSQL has better performance for some loads (as far as I know there is no clear winner in performance), better GIS support, writable CTEs, excellent built-in JSON and array support, an amazing CLI, and personally I think PostgreSQL has nicer built-in types and functions.
PostgreSQL supports the RETURNING clause, allowing UPDATE, INSERT and DELETE statements to return values from affected rows. This is elegant and useful. MS SQL Server has the OUTPUT clause, which requires a separate table variable definition to function. This is clunky and inconvenient and forces a programmer to create and maintain unnecessary boilerplate code.
So I have the equivalent of the following in one of my projects:
UPDATE sometable SET someField = @parameter
OUTPUT Inserted.field1, Inserted.field2
Between the overloading of "WITH" to seemingly every new feature they add, to the way semicolon terminators are only sometimes required (so I just automatically start all CTE declarations as ";WITH" to be sure it works), to the ways they break encapsulation by incorrectly scoping cursor names and limiting INSERT...EXEC so that it can't be nested, programming it at an advanced level is an acquired taste at best.
1) I asked about a specific thing in the article. I think the author is wrong.
2) If 1 holds true I think that the article might be questionable - at least I don't trust the rest now. I am no expert on All Things SQL Server and if I spot a flawed point as a random dev in something that was supposedly written by someone working with databases for a decade , maybe people that actually know a lot more spot .. more flaws.
1: I'm a dev for longer than that and certainly know my way around sql and databases, but focus matters. The author claims "I know a fair bit about these things" and considers databases his main area of expertise it seems.
But I just tested what you claimed here and it definitely works exactly the way the RETURNING clause would.
Don't get me wrong, I have 'political' problems with MSSQL, but those shouldn't be disguised as technical ones.
Well, not anymore. Now I just refuse to do shit. "We can't change it" turns into "we contacted the original developer and made him fix his broken shit" when I turn into a complete pain in the ass.
And I don't need a certification to understand the crazy MS licensing (I discovered recently that yes there is a MS certification program for their licenses as it's so byzantine).
For what it's worth, Atwood chose Postgres for his Discourse product. The FAQ says the product is "Uncompromisingly open source." And "There is only one version of Discourse--the awesome version. There's no super secret special paid commercial version with better or more complete features."
Actually, if you drop the constraints first, you don't have to worry about the table-drop order. I do that on both Postgres and SQL Server, because I use a tool that generates the specific scripts for me.
I don't think manually writing DDL scripts is any way to manage an RDBMS in the modern era, especially if you care about "repeatability and auditability" like he claims.
Queries, yes, but schema constructs, no. SQL is just a terrible language for it. I'm not saying you have to use a full ORM, but every SQL engine I've ever encountered will let you do a lot of nutty things and won't complain about it, or won't complain until you actually try to query it.
>This is an advantage for MS SQL Server whenever you're running a query which is CPU-bound and not IO-bound. In real-life data analytics this happens approximately once every three blue moons. On those very rare, very specific occasions when CPU power is truly the bottleneck, you almost certainly should be using something other than an RDBMS. RDBMSes are not for number crunching.
As a data analyst, the tools to be comparing shouldn't be RDBMSs.
>As I said in the banner and the intro, I am comparing these databases from the point of view of a data analyst, because I'm a data analyst and I use them for data analysis. I know about SSRS, SSAS, in-memory column stores and so on, but I haven't mentioned them because I don't use them (or equivalent features). Yes, this means this is not a comprehensive comparison of the two databases, and I never said it would be. It also means that if you care mostly about OLTP or data warehousing, you might not find this document very helpful.
As for this part, data warehousing, OLAP services, and reporting services (lower case on purpose here) are a very large sub-domain within data analytics. I am not saying that these are everything in analytics, but especially from an enterprise standpoint, these make up the bulk of it. From a tooling and full-stack standpoint, Microsoft is quite strong in this segment.
I hate how MSSQL has gone back on their word to let customers benefit from CPU enhancements. They mocked Oracle for charging by type and core... And now they do the same.
Also, multiple result sets was a sorely missed feature when porting stuff to PG. But record types made up for it.
Of course now, the dominating factor for a lot of people is "Will a hosting provider (Azure, AWS, Google) just run this for me, automatically giving me perfect backups and restore and HA?" SQL Azure, as I understand, not only does backups, but allows you to restore to arbitrary points in time. Sure it's just keeping txlogs, but that sounds hot when sold like that. For many cases, I can see ditching the privacy issues of "cloud" to get those features with zero capex or management overhead.
It does, but they take about hour to hour. Restore takes about that much time. Changing performance level of DB takes similar time.
I'm not sure hot quick Amazon or Google is, but I know lot of ops guys who are sorely disappointed by Azure slowness.
I tried Google Compute Engine on a whim and I'm totally in love (despite a deep distrust of Google). Everything is fast. VMs load in seconds. And it's simple - no inane UI, no crazy leftover bits from being a PaaS. No idiotic design for SSD. And as a kicker, it's half the price for compute. Oh, and SSH client right there in the portal? It's such a small thing but really made me happy.
It's just that Google doesn't do startup outreach and give us cash and court us. Unless you're in an "established" incubator or yc or something. Whereas MS is super friendly and does everything they can.
OTOH, when not on a Microsoft stack, it has been a constant source of hard to track down problems and bugs. I remember trying to use Microsoft's JDBC driver a few years ago in a Java web app and running into all kinds of nasty and unbelievable bugs in the actual driver they shipped.
This is pretty ridiculous and quite a bit insulting to the many people who do work for vendors. I work in a team that has a number of engineers supplied by vendors and they are generally fantastic. Highly qualified, more than happy to assist with tasks that aren't to do with their product and they really care about the overall project outcomes.
Open source has forced vendors to make sure that every project that uses their products are a success.
> On the other hand, commercial software is often designed by committee, written in cube farms and developed without proper guidance or inspiration
Again more nonsense. Not every open source project is some beacon of perfection and neither is every commercial product some poorly designed piece of junk. Anyone that believes otherwise is just being disingenuous.
Someone really needs to explain to me why PostgreSQL users in particular seem to always want to bash the competition in order to justify their technology choice. It's been going on for years against MySQL/Oracle first, then MongoDB/NoSQL and now SQL Server. It's odd.
I think it's less about PostgreSQL, and more about the particular products:
1. Oracle has always been a piece of crap from a technology standpoint. There are good reasons businesses use it, but they don't have to do with a robust, core product.
2. Early versions of MySQL were a joke. It's pretty good now, but for a long time, it was the butt of many jokes for good reason. It was very easy-to-use, and fast if you didn't need data consistency, but it didn't quite work right.
3. Stonebraker aside, I haven't heard much bashing of NoSQL. It's used in many places where it's the wrong tool for the job (JOINs are useful), but it's great where it is the right fit. It's just that MongoDB, in particular, isn't great if you have data integrity or performance requirements you care about.
4. I don't even know where to start on SQL Server. Seriously. The only reason to use that dog is if you're tied to a Microsoft-only shop.
There are lots of great technologies out there -- Cassendra, Google's Bigtable, memcached, modern versions MySQL, etc. -- which I've never heard people mocking. Conversely, users of most of those tend to make fun of the broken databases just as much as PostgreSQL users do. It's just that you see the PostgreSQL users doing more mocking simply because there are more of them out there.
Footnote: I'm developing on a MySQL+MongoDB stack right now. MySQL is great, but MongoDB is a bad joke.
"Open source products have support from people who've been told they need to 'contribute' to open source, and since they can't code, they just try to look at code and answer questions on a mailing list until they get an interview at Facebook".
(a) Not all vendors are the same, (b) Not all situations are the same, (c) Open source isn't exactly world renowned for porting back fixes to older releases.
Making broad generalisations is never helpful.
This caused a lot of BT employes a lot of pain you had to get a very senior manager to sign off on missing your y2k deadline which was the year before y2k - and you probably got dinged on your apr for it
Very smug and condescending statement. Feels like an insecurity on the author's part.
IMO, completely unsuitable for web apps where DBA is not sitting around 24/7.
Is PostgreSQL better at this?
I think it's better to skip that and interface with the database using something more powerful - let the database handle getting the data that you want the best way possible, then process it using something better suited to the job.
MySQL doesn't do a great job of protecting your data compared to Postgres, MSSQL, Oracle, etc.
I can do that to Postgres all day and the on-disk data integrity is preserved.
Out of those things, postgres has a DBA tool out of the box. Yeah... nice comparison.
I find it particularly funny that he cites Dunning-Kruger. DK effect is not meant for others. It is meant for self-evaluation.