Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] PostgreSQL vs. MS SQL Server (pg-versus-ms.com)
145 points by vince_refiti on April 30, 2015 | hide | past | favorite | 135 comments

The last time this article came up the consensus was that the author was pretty biased towards Postgres and had little to no experience with actual MS SQL Server use.

Also, the lack of author identity was frowned upon.

Lastly, the conjecture and attitude towards Microsoft lacks some substance.

Conclusion: The author is free to write whatever he likes, but take this resource with a pinch of salt.

I use both Postgres and MS SQL Server professionally and whilst philosophically I prefer Postgres, for practical reasons I truly prefer MS SQL Server, if only because of its excellent development tools.

My biggest frustration with MS is their licensing is infuriating. Doing cleanup during the audit took a ridiculous amount of resources and every company gave us a different answer on what needs to be licensed differing by 1000s of dollars sometimes.

The other is the barrier to entry. I was trying to get NAV2015 working for a demo using some data for a project, I spent at least 20 hours trying to figure out who to pay (Azure comes with it installed but no license -- why not just sell a bundled license into the hourly cost?) -- eventually gave up and set up OpenERP in like 30 minutes, spent 3 hours adding the feature I needed and was done. NAV is unquestionably better for this project, but I'm not going to enter into a support contract with a company just to get a license.

That's neat though. Shop around for the cheapest reseller, let them make the mistakes or miscalculation, enjoy?

I'm guessing it comes down to virtual cores and such, right? Although I'm surprised how many stupid-simple mistakes people make just by not reading even the basics. (Wait MSDN includes several Office keys, so obviously they don't want us to buy Office licenses for administrative assistants.)

See, even you're a bit confused by MSDN -- You're not even supposed to use the MSDN license for your own everyday use is what I've been told. So if you have outlook for email and you're using an MSDN license, that's not correctly licensed. MSDN is ONLY for development. Also it's one MSDN license per developer apparently.

And no because you still get nailed -- it's not like your reseller is the one being audited, it's you.

You're supposed to get CALs for printers accessing MS servers.

Don't ever use an MS server for DHCP, every single client needs a CAL. How am I supposed to even predict that if we allow personal devices on the guest network?

It's beyond frustrating - and the products are great, they just make it impossible to actually ever be in compliance and you need to spend an insane amount of time just handling licenses. It's really the reason I avoid MS as much as possible, and I really love their dev tools (Visual Studio is beyond awesome, I love C#.net) its just too much hassle.

No I'm not confused - I was giving an example of stuff I've heard from clients. MSDN makes it really clear it's for dev not day-to-day ops. But people don't get that.

I'd be surprised if you get "nailed" if a reseller went over a scenario and licensed you. Worst case is if they can prove maliciousness and fine you. If the reseller calculated things and came to a reasonable (but lower) number I'd be surprised if that's gonna really hurt.

Agreed though that MS's licensing is terribly annoying. But it's a lower Total Cost of Ownership, right?!

Ah sorry, yes I was one of those people -- we had like half an office using MSDN licenses. Just didn't know better (they give us 10, we'll buy another MSDN license when we run out!).

>The last time this article came up the consensus was that the author was pretty biased towards Postgres and had little to no experience with actual MS SQL Server use. Also, the lack of author identity was frowned upon. Lastly, the conjecture and attitude towards Microsoft lacks some substance.

So there was not any substancial critique and responce to the specific points he makes?

There was ample breakdown of the article and lots of credible resources pointing towards places he was wrong.

And lack of publication date, what was true in 2010 might not be true in 2015. How do we know which versions he is comparing? Both PostgreSQL and MS SQL make very significant progress every year.

He says upfront that he's comparing PostgreSQL 9.3 and MS SQL Server 2014

Actually he says in the first line of the third paragraph:

> Unless otherwise stated I am referring to PostgreSQL 9.3 and MS SQL Server 2014

Which makes his opinions timeless and gives context. A date on it would be welcome, but just from the software version you can figure out if it applies to you or your scenario.

Thanks, you're right.

I read this months ago, so just went over it very quickly this time, and the first thing I did was to look for a date at the top and bottom.

I've been SQL Server user for many years. And for last two years I started also using PostgreSQL. Today I use both of them in my project, and as a developer/dba I see pros and cons:

MS SQL: - The tools included like Management Studio are just great. This is totally next level to the Postgres tools. - Using multiple CPU cores for a single query is really helpfull in my scenario. - Easy continous backup to the cloud. - Included Integration Services, Analysis services are also easy and enugh for my usage.

Postgres: - Is running on linux, to it's cheaper. Even when we use SQL Server as Bizspark (for free now), the Azure Windows VM for it costs us much more, than VM with Linux. And of course when we want to cluster our DB, Postgres is even cheaper. - Great JSON support. There are parts of our project where it's helpfull. - Better configurability, like WAL, checkpoints etc. We have much better write performance on postgres than in sql server (probably just our case).

The other things really do not much difference. Both DB's can be extended, and extensions may be written in many languages. Both achieve great overall performance, both have strong community and a lot of documentation.

The write performance comes down to lack of clustered indexes on PostgreSQL.

So yeah, it'll have better INSERT performance, but the queries will be slower. There's really no way around that. A large data set on disk that's out of order will always be slower than the one that's in-order.

IMO MSSQL makes the right call for the vast majority of use-cases.

PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support. No partitioned view support. No sane backup/restore process. It's a great database if your primary concern is licensing cost. But if your primary concern is operational cost and even just multi-gigabyte data sets it's really frustratingly rudimentary compared to what MSSQL delivered over a decade ago.

But that's just me. It's free. And I'm thankful for that. I just find it really frustrating that PostgreSQL supports querying on JSON, but doesn't support backing up and restoring the database in binary format.

> No view update support


> No partitioned view support

Inheritance (ish)

> No sane backup/restore process


> doesn't support backing up and restoring the database in binary format

http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.h... is half of what you want, but I expect you're wanting something I don't expect.

I want to be able to say: BACKUP clientdb_a, get a clientdb_a.bak file, and be able to restore it without statement generation.

Which seems like a reasonable baseline expectation if you were to ask a lay-person what they'd imagine a database backup to be. pgbasebackup can't do that without some serious constraints (like, you only have a single database in your PGDATA).

You are correct about clustered indexes (1) if and only if the table clustering is selected well. Which is not always the case - especially with Azure SQL which doesn't allow for non-clustered tables or tables clustered by anything other than primary key.

Table clustering is mostly betting on "I think most access is going to happen using this columns" which might be good choice or it might not, but it's not something that should happen implicitly or by the only choice at all.

I think Markus (2) explains this much better than I do, so I'll just link his text.

(1): which is pretty stupid name, btw - Oracle call it "index organized tables" which is much clearer way to describe the concept.

(2): http://use-the-index-luke.com/sql/clustering/index-organized...

> PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support.

Like materialized views, automatic update support for simple views was introduced in 9.3.

PG now has materialized views, but they appear mostly useless. Their the equivalent of creating a table from a query. They don't update on every change to the source data. 9.4 I think added a way to force an update, but it just re-reads the entire source. To do it right you still have to use triggers.

Materialized views that refresh on demand are are far from useless. In my experience they are often used on Oracle for both "analytics cache" where you don't need the newest data and the select is very complex and slow, or as parts of ETL processes to decouple data transformation from loading, merging or exporting.

(this makes me realize that postgres can do some pretty cool ETL by itself - and you can express a lot of it in simple SQL terms with foreign data wrappers and materalized views. One on-demand refreshed materialized view can be able to fetch data from many different sources, transform it and provide local access)

MS SQL server, on the other hand, can't do materialized views that update on demand, which makes them hard to use on complex and large datasets, and it makes the database less predictable (inserting one row may be simple operation or it may change 10 tables, you'll never know).

Materialized views in MS SQL have rather bad support for aggregation functions and other computations - you can't compute average in them, for example, as AVG is not supported and doing SUM/COUNT is also not supported because of the division, etc.

I'd say that materialized views in MS SQL are more nice for caching or something like that, but you can't very well use them for analytics or ETL, like you usually see in Oracle, though Oracle can do both on-demand and automatic refresh, of course.

EDIT: the materialized views thing was one of my most amusing experiences with SQL Server. I tried to use materialized view instead of trigger for updating data dependent table in geographic application and I needed to compute average and sums for zoomed out map layers.

The way you add materialized view is by itself confusing - you add a view, then make index on that view, then select from it, but you have to use special keyword so the system uses the materialized view and not just the view.

I tried to use AVG but was told by the server "AVG is not supported, use SUM/COUNT". I tried SUM/COUNT but was told "COUNT is not supported, use COUNT_BIG". I tried SUM/COUNT_BIG and was finally told "using arithmetic in materialized view is not supported".

To this day, I have no idea why one error message suggested doing something that's impossible.

What's the difference between refresh on demand and just creating a table based on a SELECT? It seems like some minor syntactic sugar (which is great, I'm all for that), but not much of a feature.

You can read the materialized view while it is refreshing, you just get the old version of the data. You could do that by doing delete and insert as select in a transaction, but that's not practical for tens or hundreds of thousands of rows.

On Oracle, this is also useful because it works kind of like creating and dropping partition, instead of doing delete then insert, and that's better for the DB because of how Oracle deals with tablespace and blocks (and Oracle's truncate is non-transactional, I think), but that doesn't do much on SQL Server vs. postgres discussion :)

Or just insert into a new table then change the name in a transaction?

>The write performance comes down to lack of clustered indexes on PostgreSQL.

Well, you could just leave everything as a heap in MsSQL, someone might throw things at you, but you could do it.

Or you can pad the index to allow for inserts (or do some partitioning), and schedule some rebuilds to augment the fragmentation.

I use covering indexes for most queries. It works Great for reads, and still write performance is better in postgres (on Linux).


That's not really anything to do with sorting. At least not covering indexes as I'm familiar with them.

The closest thing to SQL Server Management Studio I've found is Navicat. It's not as solid as Management Studio, and it's incredibly expensive, but it is much better than most of the other tools out there.

If you can expense the cost of the license you should definitely check it out.

I have found DbVisualizer [1] to be a superior tool and it supports almost every database in existence, Postgres and SQL Server included.

Plus it's completely cross-platform and runs on Linux, Windows and OS X.

[1]: http://www.dbvis.com

That's pretty slick, and it's much more competetively priced than Navicat.

If you are on Mac check out https://eggerapps.at/postico/

This looks pretty nice, thanks!

I've been a fan of Aqua Data Studio for years and it doesn't seem to get mentioned often. Cross platform and supports a myriad of dbs. http://www.aquafold.com/aquadatastudio.html

Agreed on Navicat. pgAdmin doesn't quite cut it just yet.

Which sometimes made me wonder if, with Postgres becoming more popular by the day, there would be a market for a management tool cheaper than Navicat.

Postgresql have such an amazing documentation and CLI that I prefer to use it over pgAdmin.

This is why I prefer postgres. Jumping into the CLI is easy and much quicker than mousing around some clustered up feature overloaded GUI.

Last I tried Navicat it didn't offer much over pgAdmin. The UI was more streamlined but that wasn't reason enough to switch to it. What features in navicat are must have and not found in pgAdmin3?

I use both Navicat and pgAdmin3 these days.

One thing I like about pgAdmin3 is the ability to only execute the query you have highlighted and also the scratchpad. It's nice for exploratory querying. In Navicat, I like switching between various tabs where pgAdmin3 makes me switch between windows.

Both have some hilarious idiosyncrasies. When one locks up for no reason, I generally just alt-tab to the other.

Recently I was using pgAdmin3 and set my production database color to red. I happened to notice that the color didn't repaint back to green when I switched to my dev database. And Navicat seems to consistently hang whenever the network dies or burps mid-action, requiring a command-line kill.

Idiosyncrasies. :)

AquaDataStudio was my choice at the time. Expensive, but worked on pretty much all the DBMS I had to use at the time (which included oracle, pg, teradata, mssql, mysql to name a few).

TOAD DBA Suite for SQL Server is nice too, very expensive though

Both TSQL and pgsql miss stuff. But yesterday I wanted to turn a (customerid, key, value) table into a JSON object, one per customer. Simple one liners in other systems, but a mess in TSQL, requiring "FOR XML PATH" hackery. There's also an unsupported/deprecated/undocumented trick of updating a var inside a select (select @a = @a + column).

what I find sad is how much better Sql Server Management Studio is compared to the competition and yet how completely effin' frustrating it is. If They spent 10% as much effort on SSMS as they do on Visual Studio it would be ridiculously amazing.

Yeah, like the Outlook team, someone could learn how to make the UI continue to pump messages, while waiting for remote operations to complete...

EMS SQL Manager for PostgreSQL is much better than SQL Management Studio IMHO. Same goes for their MSSQL tool.

If I was going to pick a relational database system, I'm not sure these would be the criteria I'd use:

CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool. SQL Server comes with SQL Server Integration Services for that kind of thing.

Ergonomics of dropping and creating tables and stored procedures - the author's example is probably the toughest way I can think of to drop a table. It's easier to check sys.all_objects (which will catch anything - functions, views, procs, etc).

Operating system choice - well, in 2015, if you're going to mention that, you should be thinking cloud services anyway. They're both available in the cloud - and at which point, who cares what OS it runs on?

Goes on and on. I'm a SQL Server guy, and if I was going to make a list of how to choose a relational database platform, here's what I'd probably list:

* Up-front cost (license, maintenance)

* Ease of finding staff, and their costs

* Ease of finding tooling, and its cost

* Ease of finding scalable patterns (proven ways to grow it)

I don't think either platform is a loser on those counts - it's hard to go wrong with either one.

I mean you have a point on the OS, being in the cloud is a definite bonus. However caring what OS it's running on comes down to something he was brief about in his article. For me, I can manage an Linux server no issue with just a bash prompt, I can't honestly say the same about a Windows Server. Other people might argue Apache and running that on a Windows server is kind of pointless? I don't know, just spit-balling that one.

Obviously Windows would be IIS, which means overall the OS choice is a nice one. The fact that you can choose which one to host with, which means you get something comfortable.

> For me, I can manage an Linux server no issue with just a bash prompt, I can't honestly say the same about a Windows Server.

Have you seen Windows Core and PowerShell? An increasing number of admins are doing just that - it's how we work at Stack Overflow, for example.

I've used PowerShell quite a bit actually, even wrote a few script a couple years ago to work with Active Directory. I still can't see myself managing a server from there. I know configurations and everything are possible through it, like the registry and such. However config files are just so much more friendly to me. Which is probably why I would pick Linux on that. Permissions are also a little bit more basic and harder to screw up in Linux IMO.

It's try-able. But tell me how to setup and configure IIS ARR as a reverse proxy without using the GUI.

SQL Server is probably the best case, since the UI is a wrapper around TSQL and they expose the scripts at any point.

Then there's fun stuff. Like Invoke-WebRequest going super slow because it has a terribly implemented progress bar. That's right: out of the box, you can't download files with the shipped downloader, because it shows progress (like curl) and increases transfer times by a couple orders of magnitude. Come on.

Oh, do you work for SO Brent?

> Oh, do you work for SO Brent?

I just do SQL Server consulting for 'em from time to time. They're ridiculously sharp guys, so they don't need me much anymore.

Fully agree! This article is based on merits that I would hardly consider if making a choice between any relational engine. It is even more disturbing that the author claims to be a data analyst. CSV support.......R U serious? Who in the right mind would spend hundreds of thousands of $$$ for SQL Server Enterprise Edition licences and regret it because PostreSQL can do it better for free? I'm using both and they're both great but not for or in spite of the reasons outlined here. Author needs to educate herself/himself better before publishing this nonsense.

> CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool.

Why? CSVs are portable and much easier to deal with (when exported correctly, which PostgreSQL does and MS SQL does NOT).

Yes. I have seen more than few 100k+ employees companies using CSV for data transfer between their internal systems.

They were usually processed using Oracle external tables though.

Can you drop a recommendation for your favorite startup-friendly ETL tool?

Pentaho if you have no or little money or SSIS (comes with SQL Server licence)

SSIS is the most frustrating, worthwhile bundled app I've seen. It's gotten significantly better but for what is essentially a giant XML writer they sure do like to hide all the options and configurations. I love the API though, so handy.

I don't think so. I used it since it was DTS, not SSIS and whilst Informatica is better at all things ETL, SSIS is great for what it can do out of the box. Besides, if it can't, you can always do it via custom C#or VB extension.

Just finished a multi-terabyte data warehouse project where all data is loaded using SSIS and ControlM and it works great! The dev process was also easy and trouble-free.

I can't tell if I know you now...

Anyways, yes, it's very powerful and an excellent tool but I always have to describe it as awfulsome. SSISDB has made great improvements to the setup and promotion of objects which previously could be a pretty big headache (especially when you're not the one deploying etc). Functions for the various tasks/settings can be hidden and it can take a while to convince newcomers that the options are available, you just have to...poke around a bit. It comes with a substantial amount of controls.

The metadata/conversion of data can be frustrating but automation greater improves this with a quick controlling framework and the sys tables, it also makes moving data across hundreds of tables a breeze. It is very fast and very stable and I'm glad it's getting improvements and having focused development.

I tried Pentaho, but our ETL tasks are being written by engineers (at least right now), and it seemed much clunkier than just writing ruby scripts. Is it worth it in the long run?

If you just need to move data from A 2 B than no. If your transformations are complex and will grow overtime than yes, the time and learning curve are justified IMO.

The whole thing took about a minute to write and a second to run. It confirmed that some of his folders had a problem and told him which ones they were. How would you do this in Windows?

I'm so tired of this. Just because you don't already know Powershell and are too lazy to learn doesn't mean it doesn't exist. A know-nothing Windows user might as well say, "If I want to select and move arbitrary files on Windows, I can point and control-click in seconds. How would you do that in the Linux CLI?" In both instances, it comes across as ignorant to anyone who actually knows the ecosystem being derided.

I work in the Windows sphere at work and I really like Powershell. It has a ton of great features out of the box (especially for parallelism) as well as solid bread and butter operations. Although it's a bit verbose and not as mature as Bash it's essential to know for any Windows developer.

> In MS SQL Server, a CREATE PROCEDURE statement cannot appear halfway through a batch of SQL statements. There's no good reason for this, it's just an arbitrary limitation. It means that extra manual steps are often required to execute a large batch of SQL. Manual steps increase risk and reduce efficiency.

It's been a while, but I am pretty sure all you have to do is put GO before/after the CREATE PROCEDURE. I'm absolutely positive there's some way around it, because I've run many, many such scripts on SQL Server without manual intervention.

EDIT: Yes, I just fired up a VM and ran this and got the expected results with no errors.


CREATE TABLE dbo.Test (id int IDENTITY(1, 1), name varchar(20));


INSERT INTO dbo.Test (name) VALUES ('Amezarak');


CREATE PROCEDURE dbo.sp_QueryTest AS SELECT * FROM dbo.Test;


EXEC dbo.sp_QueryTest

In my personal opinion, MSSQL (including the tooling around it) is awesome and possibly one of Microsoft's best products. I actually regret not getting to use it anymore since a) my current job doesn't use it and b) I'm not shelling out for a license for my side projects. Postgres is definitely my next pick, though, and both are miles ahead of MySQL. I understand that MySQL is "good enough" for most people, but it's always painful going back to it and inevitably remembering almost all my favorite features don't exist. I'm stuck with it on a side project and it's frustrating.

The GO statement is an artifact of SQL Server Management Studio. It's not an actual T-SQL construct. So if you are executing scripts in an automated way (like a continuous deployment scenario), you have to have some way to manually split up the batches.

GO is also recognized by sqlcmd, so you're (potentially) still good with automation. I did actually run the above script in SSMS, but I ran most of the ones at my old job automated.

You wouldn't perchance know a good online resource for recommendations on automating SQL Server deployments would you? Yes, you can google and piece things together, but I've yet to find a well written fairly comprehensive resource. (As the author notes, one would think perhaps MS would provide such documentation, but afaik they don't.)

It's all on MSDN, somewhere. TFA was incorrect that there isn't documentation available on SQL Server and T-SQL, but it's incredibly well documented, it's just MSDN's typical terrible discoverability.

Yes, you can absolutely do that in MSSQL as long as there's a batch separator (it's not implicit at a line break,;, or something else. It's not uncommon to write large deployment scripts with proc creations/alters.

There's some other things like that in the post, some coming down to what seems like syntactic sugar and feels more like a difference that doesn't really need to be noted (like the types/dropping piece).

CREATE PROCEDURE puts a schema lock on the procedure. Didn't realise it doesn't release it! SQL Server only allows one Sch-M lock on an object at a time.

GO creates a lot of headaches actually. If you are trying to do something in a transaction it breaks it

why would you submit and terminate a batch halfway through a transaction? that doesnt really make a whole lot of sense so i cant really take this criticism seriously.

This website is great. However, this doesn't actually touch on the real issue.

I work at a MSSQL shop, and all of us know and are convinced that PG is better. Most of us use PG for our side projects and some of the dev's don't even use windows that much, with some custom MSSQL plugins we've built for linux. However, the problem still exists, of how do you port a ton of Databases over to Postgres? We're a multi-tenancy shop, so close to zero downtime is very important, and it would get really complicated if we ran multiple production versions of our app, one with a PG adapter and one with a MSSQL adapter.

A cursory Google search will show that you aren't going to get a ton of help converting them[0][1], not to mention the overhead of switching 20 developers from MSSQL to PG overnight.

This website is however excellent at convincing people to use PG over MSSQL. Perhaps, given the direction that Microsoft is going, they'll open source MSSQL overnight and it will become something competitively similar to PG in the long run.

[0] http://www.convert-in.com/mss2pgs.htm

[1] https://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_Pos...

PostgreSQL doesn't have clustered indexes, materialized views, partitioned views, or a sane backup/restore procedure.

If you're already a licensed MSSQL customer, I'm not sure what advantages PostgreSQL could really have compared to it's slower performance and much higher operational costs.

Postgres absolutely has materialized views (I don't know what makes a backup / restore procedure "sane", but having had to do it for both SQL Server and Postgres I would definitely call Postgres's "saner").

It has a feature named materialized views, but it isn't close to being the real feature that MSSQL calls it.

> It has a feature named materialized views, but it isn't close to being the real feature that MSSQL calls it.

MSSQL doesn't have a featured called materialized views, it has a feature called indexed views which happens to implement a fairly robust version of the general DB pattern called materialized view.

PG has less mature materialized views (which are called materialized views) starting from 9.3, with additional features in 9.4, and which can be expected to mature further over at least the 9.5 and 9.6 cycles, from what I've seen various places. Certainly, SQL server is ahead on materialized views, but its not an "SQL server has them, Postgres doesn't" kind of thing.

SQL server can't do materialized views that refresh on demand, while postgres can't do materialized views that refresh automatically.

Personally, I saw materialized views that refresh on demand used much more than materialized views that refresh automatically, even though Oracle can be both.

Well I can't think of a scenario where I want a single-updated materialized (or indexed) views. Because I can already do that by creating a table from a SELECT statement. So I don't see the point or use of what PG has implemented. I asked in IRC and got the same answer. It's like saying a read-only table supports updates cause you can drop it and rebuild it with the new data.

When would anyone really need PG's version of this feature over creating a table from a query? Out of all the usages of "I need an indexed/materialized view", PG covers essentially none of them, at least that I can think of.

Anything you can do with materialized/indexed views (including most conceivable update models) you can do with the right combination of creating tables with SELECT statements and the right triggers to rebuild and/or update the tables.

What specialized syntax for materialized views and related configuration provides is some combination of clarity of intent, developer convenience, performance improvements over the most naïve update implementations, and additional information for the planner to use to optimize queries.

Postgres current level of support for named-as-such MVs provides basically the first two benefits for certain situations (and a little bit of the third in certain situations via REFRESH MATERIALIZED VIEW CONCURRENTLY.)

re: Materialized Views: I stand corrected! Thanks. (CREATE MATERIALIZED VIEW was introduced in v9.3)

The backup/restore procedure in PostgreSQL (last I looked, latest I've used is 9.2) is just statement generation. It's not a binary backup. So it's bound by INSERT performance.

Which is rage inducing when you have even just tens of millions of rows.

MSSQL'97 could do a backup/restore in a small fraction of the time. And if you just wanted to move data from Server A to Server B? You could link servers and just SELECT INTO.

No clustered index? What about CLUSTER?


CLUSTER instructs PostgreSQL to cluster the table specified by table_name based on the index specified by index_name. The index must already have been defined on table_name.

When a table is clustered, it is physically reordered based on the index information. Clustering is a one-time operation: when the table is subsequently updated, the changes are not clustered. That is, no attempt is made to store new or updated rows according to their index order. (If one wishes, one can periodically recluster by issuing the command again. Also, setting the table's FILLFACTOR storage parameter to less than 100% can aid in preserving cluster ordering during updates, since updated rows are kept on the same page if enough space is available there.)

When a table is clustered, PostgreSQL remembers which index it was clustered by. The form CLUSTER table_name reclusters the table using the same index as before. You can also use the CLUSTER or SET WITHOUT CLUSTER forms of ALTER TABLE to set the index to be used for future cluster operations, or to clear any previous setting.

Yes you can manually re-order a table, with locking, but that's really not the same thing as enforcing in on INSERT/UPDATE.

Clustered index is something it would be nice for Postgres to have, but it would be a lot of work to implement and it's certainly not always a performance win. Looking up by that index is very fast, but looking up by a secondary index may have to traverse a second btree (it cannot point directly at data on disk because its location is tied to the clustered index). And of course there's quite a bit of overhead for transactions and modification operations.

In practice, you can recover many of the advantages of clustered indexes in PostgreSQL (along with the disadvantages) with a covering index eligible for https://wiki.postgresql.org/wiki/Index-only_scans.

That's really just for setting up replication IME (and even then, that's pretty complex compared to MSSQL).

What if I fat finger a table drop for Client A? With pgbasebackup the best I could do is set up an entirely new server and jump through some hoops. With MSSQL I could just restore the single client database to a new database, very quickly, and SELECT INTO the missing data from clientdb_a_backup to clientdb_a.

It's just a lot simpler, a lot quicker, and a lot more flexible.

can you be more specific with the "just statement generation" allegation? it looks like you're talking about `pg_dump --format=plain`, which is the default but far from the only option.

I'm using that for short-hand yea. But it was my experience pg_restore is no faster outside of basic parallelization.

Both databases have a bunch advantages over each other. There is no obvious winner.

PostgreSQL has better performance for some loads (as far as I know there is no clear winner in performance), better GIS support, writable CTEs, excellent built-in JSON and array support, an amazing CLI, and personally I think PostgreSQL has nicer built-in types and functions.

One thing that PG does that SQL Server doesn't do by default is it is built with MVCC in mind. If you want MVCC on SQL Server, you need to use snapshot isolation, and for snapshot isolation SQL Server needs to use TempDB.

TempDB, at least last I looked, doesn't really scale too well.

Just an observation.

Can someone explain this one to me?

PostgreSQL supports the RETURNING clause, allowing UPDATE, INSERT and DELETE statements to return values from affected rows. This is elegant and useful. MS SQL Server has the OUTPUT clause, which requires a separate table variable definition to function. This is clunky and inconvenient and forces a programmer to create and maintain unnecessary boilerplate code.

So I have the equivalent of the following in one of my projects:

    UPDATE sometable SET someField = @parameter
    OUTPUT Inserted.field1, Inserted.field2
    WHERE ...
Now, either I don't get the limitation the author describes or SQL Server can do that - returning information from the affected results. Works with DELETE as well. We can argue that 'inserted' is a crappy name here, but..

As a long-time SQL Server (and C#/.Net) developer, I can say that while SQL Server is a great database engine, TSQL is perhaps the worst language ever invented.

Between the overloading of "WITH" to seemingly every new feature they add, to the way semicolon terminators are only sometimes required (so I just automatically start all CTE declarations as ";WITH" to be sure it works), to the ways they break encapsulation by incorrectly scoping cursor names and limiting INSERT...EXEC so that it can't be nested, programming it at an advanced level is an acquired taste at best.

But that's not the point of this subthread. I wouldn't consider myself as a fan of SQL Server (or TSQL for that matter). I am not defending it.

1) I asked about a specific thing in the article. I think the author is wrong.

2) If 1 holds true I think that the article might be questionable - at least I don't trust the rest now. I am no expert on All Things SQL Server and if I spot a flawed point as a random dev in something that was supposedly written by someone working with databases for a decade [1], maybe people that actually know a lot more spot .. more flaws.

1: I'm a dev for longer than that and certainly know my way around sql and databases, but focus matters. The author claims "I know a fair bit about these things" and considers databases his main area of expertise it seems.

Honestly I don't know where he got the idea, but as someone who worked for five years as a SQL Server database admin/developer, I for some reason also thought you could only do OUTPUT Inserted.field1, ... INTO @TableVar

But I just tested what you claimed here and it definitely works exactly the way the RETURNING clause would.


As somebody working on postgres, I don't find the comparison to be done in a particularly fair, or helpful, way. MSSQL is a pretty good database and deserves to be fairly evaluated.

Don't get me wrong, I have 'political' problems with MSSQL, but those shouldn't be disguised as technical ones.

I get the feeling this guy hasn't had to deal with CSV in-the-wild. RFC4180 is a false prophet. I've never had a client give me CSV data that was actual, good CSV, thus I've always had to write a custom parser. Every. Single. Time.

Well, not anymore. Now I just refuse to do shit. "We can't change it" turns into "we contacted the original developer and made him fix his broken shit" when I turn into a complete pain in the ass.

Postgres is better than SQL Server because with Postgres I always have the "awesome edition" : http://blog.codinghorror.com/oh-you-wanted-awesome-edition/

And I don't need a certification to understand the crazy MS licensing (I discovered recently that yes there is a MS certification program for their licenses as it's so byzantine).

I remember Jeff Atwood talking about this in a Stackoverflow podcast. With Postgres or other free software, you not only don't have to lay out cash, you also don't have to spend time trying to understand arcane licensing rules. (And worrying that you might have violated one of them.) Whether the time (and time is money) saved balances out the cost of the software when the commercial product offers better functionality will vary with the situation.

For what it's worth, Atwood chose Postgres for his Discourse product. The FAQ says the product is "Uncompromisingly open source." And "There is only one version of Discourse--the awesome version. There's no super secret special paid commercial version with better or more complete features."

>> PostgreSQL supports DROP SCHEMA CASCADE, which drops a schema and all the database objects inside it. This is very, very important for a robust analytics delivery methodology, where tear-down-and-rebuild is the underlying principle of repeatable, auditable, collaborative analytics work.

Actually, if you drop the constraints first, you don't have to worry about the table-drop order. I do that on both Postgres and SQL Server, because I use a tool that generates the specific scripts for me.

I don't think manually writing DDL scripts is any way to manage an RDBMS in the modern era, especially if you care about "repeatability and auditability" like he claims.

Queries, yes, but schema constructs, no. SQL is just a terrible language for it. I'm not saying you have to use a full ORM, but every SQL engine I've ever encountered will let you do a lot of nutty things and won't complain about it, or won't complain until you actually try to query it.

When developing on a Microsoft stack with Visual Studio, I have always found SQL Server to be the most painless DB system. You can't beat the Visual Studio integration and tools. Also, their SQL Server drivers for .NET do a bunch of transparent optimizations like automatic connection pooling behind the scenes so you don't have to worry about this kind of thing.

OTOH, when not on a Microsoft stack, it has been a constant source of hard to track down problems and bugs. I remember trying to use Microsoft's JDBC driver a few years ago in a Java web app and running into all kinds of nasty and unbelievable bugs in the actual driver they shipped.

Sigh. It's about as profound as writing an article on whether Linux vs Windows -- which one is a better OS? I skimmed the article for some reason and it is full of questionable statements. One thing that jumped out at me is the part about using multiple cores to run a query. We are in 2015 ladies and gentlemen. A RDBMS must be able to parallelize queries. Argument about never being CPU bound is laughable. I am, as we speak, running a performance test on a 36 core box and I already know the result: I NEED MORE CORES. Analytics is moving into SSD world, where data processing will be CPU bound once again.

It took a while, but he got there (emphasis mine):

>This is an advantage for MS SQL Server whenever you're running a query which is CPU-bound and not IO-bound. In real-life data analytics this happens approximately once every three blue moons. On those very rare, very specific occasions when CPU power is truly the bottleneck, you almost certainly should be using something other than an RDBMS. RDBMSes are not for number crunching.

As a data analyst, the tools to be comparing shouldn't be RDBMSs.

>As I said in the banner and the intro, I am comparing these databases from the point of view of a data analyst, because I'm a data analyst and I use them for data analysis. I know about SSRS, SSAS, in-memory column stores and so on, but I haven't mentioned them because I don't use them (or equivalent features). Yes, this means this is not a comprehensive comparison of the two databases, and I never said it would be. It also means that if you care mostly about OLTP or data warehousing, you might not find this document very helpful.

As for this part, data warehousing, OLAP services, and reporting services (lower case on purpose here) are a very large sub-domain within data analytics. I am not saying that these are everything in analytics, but especially from an enterprise standpoint, these make up the bulk of it. From a tooling and full-stack standpoint, Microsoft is quite strong in this segment.

HA and clustering. MSSQL makes this dead easy. Point and click (and it'll display the script for you to learn/reuse) and you're done. Async, sync, HA (with automatic fail over), mirroring, several types of replication - and it just works and is easy. If PG ever ships with an out-of-the-box shared nothing system, yee haw! Maybe they could package up DRDB and heartbeat into one easy script and monitoring system or something.

I hate how MSSQL has gone back on their word to let customers benefit from CPU enhancements. They mocked Oracle for charging by type and core... And now they do the same.

Also, multiple result sets was a sorely missed feature when porting stuff to PG. But record types made up for it.

Of course now, the dominating factor for a lot of people is "Will a hosting provider (Azure, AWS, Google) just run this for me, automatically giving me perfect backups and restore and HA?" SQL Azure, as I understand, not only does backups, but allows you to restore to arbitrary points in time. Sure it's just keeping txlogs, but that sounds hot when sold like that. For many cases, I can see ditching the privacy issues of "cloud" to get those features with zero capex or management overhead.

> SQL Azure, as I understand, not only does backups, but allows you to restore to arbitrary points in time.

It does, but they take about hour to hour. Restore takes about that much time. Changing performance level of DB takes similar time.

I'm not sure hot quick Amazon or Google is, but I know lot of ops guys who are sorely disappointed by Azure slowness.

True. Azure is slow slow slow. Even their new portal is just laggy (JS/layout overload). But the service, yikes! Create a new VM, wait minutes. Then find out it failed. Or try to add a port mapping, and wait a minute, unable to do another operation cause one is in progress. It's infuriating. And the VMs take forever to start. And the SSD options are late, under performing and downright janky (they actually tell you to do RAID on the SSDs, since they cannot figure out how to scale storage for you.)

I tried Google Compute Engine on a whim and I'm totally in love (despite a deep distrust of Google). Everything is fast. VMs load in seconds. And it's simple - no inane UI, no crazy leftover bits from being a PaaS. No idiotic design for SSD. And as a kicker, it's half the price for compute. Oh, and SSH client right there in the portal? It's such a small thing but really made me happy.

It's just that Google doesn't do startup outreach and give us cash and court us. Unless you're in an "established" incubator or yc or something. Whereas MS is super friendly and does everything they can.

> Commercial products have support from people who support it because they are paid to. They do the minimum amount necessary to satisfy the terms of the SLA.

This is pretty ridiculous and quite a bit insulting to the many people who do work for vendors. I work in a team that has a number of engineers supplied by vendors and they are generally fantastic. Highly qualified, more than happy to assist with tasks that aren't to do with their product and they really care about the overall project outcomes.

Open source has forced vendors to make sure that every project that uses their products are a success.

> On the other hand, commercial software is often designed by committee, written in cube farms and developed without proper guidance or inspiration

Again more nonsense. Not every open source project is some beacon of perfection and neither is every commercial product some poorly designed piece of junk. Anyone that believes otherwise is just being disingenuous.

Someone really needs to explain to me why PostgreSQL users in particular seem to always want to bash the competition in order to justify their technology choice. It's been going on for years against MySQL/Oracle first, then MongoDB/NoSQL and now SQL Server. It's odd.

Agreed. One could just as easily write something like

"Open source products have support from people who've been told they need to 'contribute' to open source, and since they can't code, they just try to look at code and answer questions on a mailing list until they get an interview at Facebook".

> Someone really needs to explain to me why PostgreSQL users in particular seem to always want to bash the competition in order to justify their technology choice. It's been going on for years against MySQL/Oracle first, then MongoDB/NoSQL and now SQL Server. It's odd.

I think it's less about PostgreSQL, and more about the particular products:

1. Oracle has always been a piece of crap from a technology standpoint. There are good reasons businesses use it, but they don't have to do with a robust, core product.

2. Early versions of MySQL were a joke. It's pretty good now, but for a long time, it was the butt of many jokes for good reason. It was very easy-to-use, and fast if you didn't need data consistency, but it didn't quite work right.

3. Stonebraker aside, I haven't heard much bashing of NoSQL. It's used in many places where it's the wrong tool for the job (JOINs are useful), but it's great where it is the right fit. It's just that MongoDB, in particular, isn't great if you have data integrity or performance requirements you care about.

4. I don't even know where to start on SQL Server. Seriously. The only reason to use that dog is if you're tied to a Microsoft-only shop.

There are lots of great technologies out there -- Cassendra, Google's Bigtable, memcached, modern versions MySQL, etc. -- which I've never heard people mocking. Conversely, users of most of those tend to make fun of the broken databases just as much as PostgreSQL users do. It's just that you see the PostgreSQL users doing more mocking simply because there are more of them out there.

Footnote: I'm developing on a MySQL+MongoDB stack right now. MySQL is great, but MongoDB is a bad joke.

So Oracle backport fixes now do they I remember having trouble getting my Oracle based system through y2k back in 98/99

Not sure what you're point is.

(a) Not all vendors are the same, (b) Not all situations are the same, (c) Open source isn't exactly world renowned for porting back fixes to older releases.

Making broad generalisations is never helpful.

I used to work for a MAJOR MAJOR Oracle customer and they would not back port y2k fixes to relatively recent oracle products ie the version before.

This caused a lot of BT employes a lot of pain you had to get a very senior manager to sign off on missing your y2k deadline which was the year before y2k - and you probably got dinged on your apr for it

One of the major attractions of MS SQL Server 2014 is memory optimized tables (Hekaton engine) which removes locks completely from the database through snapshot versioning of rows. Does postgresql has anything similar to that?

This is the second time this site has appeared on HN, and both times I've mis-read the title as "PostgreSQL vs MySQL", which I think would be a much more interesting comparison. If you're already working in an MS shop, then MS SQL will work best with what you have. If you're working in an open source environment, then you'll be better off taking advantage of what Postgres or MySQL have to offer. Postgres and MySQL are so different in terms of their features that comparing them would make an interesting article.

> I replied "well there are 1.5 billion Muslims and 1.2 billion Catholics. They can't all be right". Ergo, a billion people most certainly can be wrong. (In this particular case, 2.7 billion people are wrong.)

Very smug and condescending statement. Feels like an insecurity on the author's part.

After seeing the title I assumed the verdict would be that MS SQL is way better. If I wrote a comparison that's how the result would end up. I've used MS SQL from 1998 and PostgreSQL from 2001 and have found MS SQL much easier to use. Different strokes for different folks I guess.

It's a one-page propaganda site (from someone from Shepperton, GB). I'm a big fan of PostgreSQL but this isn't helpful.

I use MSSQL on a current project and really don't like it's query optimizer. It relies too much on statistics which means that slightly complex queries get random behavior really quickly, especially in the beginning as the database if growing.

IMO, completely unsuitable for web apps where DBA is not sitting around 24/7.

Is PostgreSQL better at this?

How would you suggest optimizing queries except on stats? Do you find instances where the stats are incorrect? If you're trying to do small things in a big table, have you used indexing to help target the small sets/fields you're looking at?

Not really. Postgres' planner is also statistics based. Yes that approach has problems, but the contrary, where you still use a plan that was appropriate for three rows with a couple million isn't good either. It's just a hard problem.

Does anybody use PG/PLSQL? I wonder how it compares in practice with Oracle PLSQL.

As others have mentioned, they are similar but there are some gotchas. An example is hierarchical data: pgsql lacks start with/connect by. This means you have to write recursive queries using Common Table Expressions (with...as). Not a dealbreaker but it does mean some extra work.

Its quiet close I recall being impressed that i could reuse some Oracle Pl/SQL on Postgress

If you know PL/SQL, all you need is an example of the boilerplate that goes at the top and bottom of a PL/pgSQL procedure and you are good to go.

I was expecting something more convincing like benchmarks, PostgreSQL has a lot of nice features and that's is a good thing for development, but just for development.

Ah, but you can't publish benchmarks about any of the RDBMSs for adults (Microsoft and I think Oracle too) as per their license agreement. So never expect these types of articles to have any substance.

I was looking for some earlier today, as I often do when one of these discussions comes up, and when I didn't find any, I suspected that might be the case. It's a shame, as it's another case where a company works to restrict user information, which makes the market less efficient.

So I guess this is when libertarian says: if only there was no state to help stupid corporations enforce these ridiculous restrictions on their customers!

I am inclined to think that you should use an RDBMS for doing what it does best - things like storing and retrieving data efficiently, facilitating the integrity of data while serving multiple users, and enforcing relational constraints. If you're serious about doing analytics and you try to do it in the db you are going to quickly find yourself hampered by the expressive limitations of SQL.

I think it's better to skip that and interface with the database using something more powerful - let the database handle getting the data that you want the best way possible, then process it using something better suited to the job.

I expect better from the front page of HN. I like PostgreSQL and MSSQL Server and to a much lesser degree mySql and Oracle. But in my experience I use the tool that exists and makes sense for the project. Since most of my work is enterprise its MSSQL or Oracle. For personal projects I used PostgreSQL and got on quite well. None of the tools prevented me from getting the data I needed. There may be scaling issues or use cases that can favor either platform. If so, pick the tool for the job, move on. If you feel compelled to do so, you should document your use case to help others out.

It's very click-baity, but to provide some anecdotal evidence I've never had corrupted data/tables in Postgres, but dealt with it very regularly in MySQL under almost identical environments.

MySQL doesn't do a great job of protecting your data compared to Postgres, MSSQL, Oracle, etc.

Go ahead, kill -9 your production MySQL servers. I dare you.

I can do that to Postgres all day and the on-disk data integrity is preserved.

It makes little sense to put it like that, since for MySQL it greatly depends on the storage engine used. I've found InnoDB to be quite robust, but I'm not sure how it compares to what Postgres uses.

let me first state i'm a huge postgres fan, i implement whenever i have the option. it's a great database and i'm a firm believer in open source.. however if a company is willing to shell out the coin to run MSSQL, i'm not complaining. MSSQL is very robust, very fast, and very easy to administer. i hate giving MS their props but they did a good job with it..

PostgreSQL has bad C# drivers, other than that I would choose it over MSSQL.

Huh... a person who compares databases with regards to analytics and doesn't mention OLAP. Doesn't mention ETL. Doesn't mention reporting. What kind of comparison is that? I often read reviews of databases which dismiss MSSQL completely. And that's fine when it is about building applications. HOWEVER, MSSQL comes with very featurefull ETL tools, very featurefull MOLAP and HOLAP and very featurell reporting suite. It also comes with some data mining algorithms for which I am not qualified to talk, but I've heard the functionality is pretty nice for a starting point. It also comes with an IDE for the people who do those things and a DBA tool for people who administer the databases.

Out of those things, postgres has a DBA tool out of the box. Yeah... nice comparison.

I find it particularly funny that he cites Dunning-Kruger. DK effect is not meant for others. It is meant for self-evaluation.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact