SQL Server has some neat features, and yes, it's more expensive than a free tool (and yet, wildly less expensive than Oracle, its main competitor). What Postgres doesn't have is the entire BI suite of tools that comes with SQL Server: Analysis Services, Integration Services, and Reporting Services. It also comes with MDM, Data Quality Services, a complex event processor (StreamInsight), etc, etc. It's not just a database.
I like SQL Server, and I use it all the time--but again, I do BI. That being said, if I were creating a web application, I'd use Postgres in a heartbeat. SQL Server's defaults are aimed at different applications than Postgres is.
That being said, this post was intended for the web app crowd.
To answer your questions, SSRS pre-2012 didn't work for non-IE users. In 2012, it works like a charm, though. As for debugging in SSIS, if you're talking about script tasks, again, they fixed it in 2012. Debugging data flows is pretty easy, too--I've never really had a trouble except for the script tasks.
As with any Microsoft product, there are a plethora of quirks that you just have to know about (or a quick googling will tell you), but I've never felt my dev time was stunted because I was using SSIS instead of Informatica or Talend or something else. These things all have their quirks, so none really bother me over the next.
Anyway, I'm not here to fly the banner of SQL Server. Microsoft, sadly, doesn't pay me royalties. I use it, it's a decent tool that does what I need it to do, but I do work on too many different platforms to really care about one or the other. My point was to give context and other info about what SQL Server is.
It's a good list for web apps, though. Certainly things you want to think about when building a web application. Especially pessimistic concurrency. I think a lot of web devs have no clue what concurrency models are. I cannot tell you the amount of `with (nolock)` arguments I've seen over the years.
Am I mistaken?
Other than the way you load data in, what's the difference, though? It's just an implementation detail.
You can periodically build a read-only cube that has pre-computed answers to any number that any pivot-table could read from that star schema.
Then, when the multinational supermarket CEO queries for his company's total profit by product line over the past five years, he gets an instant answer.
The other way, the CEO must wait for weeks to get that answer from the RDBMS - or the cube must be replicated the cube inside the RDBMS somehow using tons of materialized views.
I think that's Microsoft's advantage in the database market: their system has all the tools out-of-the-box.
And the licensing's really not that expensive - especially compared to the heart-attack inducing prices coming out of Oracle, IBM, SAS, Teradata, Kx etc etc etc
SQL Server really is a great database but is it worth the Microsoft tax? I've come to realize that its just not. For my own projects, I am starting to go with Postgres. I still have doubts on the performance side but based on my understanding these have largely been addressed in the latest releases.
I think using Postgres on a linux box and ASP.NET MVC as the web platform is a good compromise. Are you doing this, or have you entirely moved to Ruby on Rails (or something else)? I would love to get your thoughts on this.
I think your statement around Microsoft's reporting and BI offerings is harsh but isn't entirely off the mark. The tools are good for the simple cases but for real BI, Microsoft can't stand up to Oracle's (and others) offerings. I am sure there are companies that base their entire BI on the Microsoft stack but based on my experience, its just not common. A lot of the systems that hook into the BI stack tend to be legacy or non-Windows data sources, so in a lot of companies, purely using Microsoft isn't viable.
It's also a dangerous thing to do as Microsoft tends to use SharePoint as a glue and I would be extremely hesitant to bring SharePoint in an environment (unless its for its out-of-the-box functionality).
Then again, you really only want to do this if you already have SharePoint, since that's the web presence (it's a module in SharePoint called PerformancePoint). But SharePoint has sold a metric shitton of licenses (I don't know why). You're right to be hesitant to bring SharePoint in, but for those who have, it works all right. Even for "real BI" (whatever that means).
The backend stuff works well, and there are a lot of tools that can pick it up--my personal favorite being Tableau. Also, PowerView in Excel 2013 is pretty nifty, and it's what I've been waiting on since Microsoft acquired ProClarity in 2006.
On the SharePoint side of things, while I believe MS did oversell its capabilities, the problem was (could still be) that most places thought it was plug and play. It seriously wasn't. I don't know if things have gotten better but documentation those days was severely lacking. Add to that enterprises that were in the mindset that SharePoint was just another ASP .Net application and their in house devs could just pick it up without some sort of guidance.
There were a bunch of places where I was brought in to clean up the mess that happened because of those two things. I'm not sure if the landscape has changed or not but SharePoint is a beast unto itself and I've personally seen it implemented well at 30k+ seat organizations and the largest I've done was 50k seat org across 3 continents. It's not cheap to set up or customize but when it is done correctly, it does work.
Others just got it in a licensing transaction... they signed up for the Core CAL, and took care of their Windows, Exchange, SharePoint and SCCM licensing, even if they didn't use all of them. If you owned any two products (Exchange/SMS, Exchange/SharePoint), you actually saved money in the process.
I also see that the term "real BI" is a bit meaningless :).
In terms of the companies using it, it looks like I might be a bit off, but its probably somewhat dependent on the industry.
One company that I am following is http://www.domo.com/. The founder has a great blog (www.joshjames.com) and seems to be trying to shake up the BI space a bit. I would love to know their back office stack.
I'm more interested to know how MS people have integrated PostGres into their environments. Which ORMs work? What are the gotchas, etc?
To be honest we're shifting to java. Our license fees are well into 7 figures for MS stuff for no gain. This is the first step. Some subsystems are moving from asp.net web forms+mvc to spring, tomcat, hibernate, postgres. Cost savings are huge, staff available are orders of magnitude better and reliability is higher as to be honest, windows has been a piece of shit to manage over the last 10 years.
Fortunately we've got a pretty well designed system so we can move it over piecemeal as and when we update major subsystem features.
I think that NodeJS + MongoDB has been the least resistance I've seen so far... though some of its' limitations are really frustrating (non-indexed result limitations). I like Postgres at least as much as MS-SQL Server. And the Entity Framework generators have been close to a blessing. That said, I'm much more of a fan of the DB as dumb storage, and not a big fan of SSIS. So changing from one DB to another is mostly transparent when things are how I prefer them.
There are commercial providers but npgsql- http://npgsql.projects.pgfoundry.org/, works well.
I understand that some people get turned off by the code generation aspect of it, but what in particular didn't you like about it.
The support it has for all the major databases (SQL Server, Oracle, Postgres) and others is really solid.
Also, there is a full Linq provider that works well.
Edit: just noticed someone posted this below. Sorry.
The comment about "thrown together by an intern" seems suspect (though not as much as criticizing SQL Server for defaulting to data integrity. If ever there was a non-complaint, that is it). I've been using SQL Server since 6.5, and generally find the toolset to be excellent, and anything other than "thrown together" (quite contrary, the product moves far slower than I would prefer).
They both have their place. In SQL Server land the things that they limit to enterprise edition generally are functions that only really benefit data warehouses (partitioning and compression being two of the most obvious), and given that they know that the people who use data warehouses tend to have warehouses of cash, it comes with a cost.
Worth noting that SQL Server has had clustered indexes forever -- psql still has a rather hacked variant of it -- along with materialized views. Both are of tremendous value to web developers. Far more valuable than the default transaction isolation level.
It uses the standard report viewer but aims to handle multi-pick parameters and reports with lots of optional parameters more intelligently.
Redmond could throw column level encryption into the lower tiers as a security measure, but oh, no!, that's "enterprisy".
They'll hit the rendered output though. Which in HTML4.0's case is horrid (at least in 2008).
The main problem with SSIS and SSRS is that you just can't put it in version control or scale it up inexpensively. Plus to be honest the tooling is shitty and unreliable (says me who spent all day fighting T-SQL debugger HRESULT crashes in SSMS whilst trying to reverse engineer a huge shitty black box SQL sproc into something scalable and testable in Java). Ugh.
We run both massive SQL server and postgres installations.
I am looking at Postgres at the minute and the biggest stop for me is mass data imports from our customers on a nightly basis.
I've been keeping an eye on Postgres since version 9 as it has (reportedly) the same features as SQL but for a lot less.
We currently pay £250 a month per processor license and this shoots up costs a lot when you have a cluster of servers. If it's unavoidable then we carry on paying but if there are free alternatives out there then I'll jump at the chance.
The rest of the DB is used for serving sites, no other BI work done really.
(Quoting the documentation for pg_restore):
Run the most time-consuming parts of pg_restore — those which load data, create indexes, or create constraints — using multiple concurrent jobs. This option can dramatically reduce the time to restore a large database to a server running on a multi-processor machine.
Each job is one process or one thread, depending on the operating system, and uses a separate connection to the server.
The optimal value for this option depends on the hardware setup of the server, of the client, and of the network. Factors include the number of CPU cores and the disk setup. A good place to start is the number of CPU cores on the server, but values larger than that can also lead to faster restore times in many cases. Of course, values that are too high will lead to decreasing performance because of thrashing.
Only the custom archive format is supported with this option. The input file must be a regular file (not, for example, a pipe). This option is ignored when emitting a script rather than connecting directly to a database server. Also, multiple jobs cannot be used together with the option --single-transaction.
Feature set is comparable. We tend to avoid specific platform features as they are a migration risk.
Ha - our SQL license fee is around £60k a machine one off per major drop. We're not dropping that again for 2012. No way. Not when we have 8 machines :)
We are slowly digging ourselves out the hole with dependency injection in our code from a Entity point of view and want to go a similar route with our DB.
We are nowhere near yourselves but we have spent £18k on SQL licenses in the past 2 years. It's a lot of cash and I don't think we see the full benefit to be fair. Coupled with the abundance of other Microsoft licenses we pay out for, more than half our server costs are licenses.
I appreciate your comment, it's taken me a step in the right direction to sorting out DBs out.
Side note for those who don't believe PostgreSQL can outperform Oracle: we needed a custom data type that we were aggregating over. The projected table size was over a billion rows (last I heard it had reached over 4 billion rows). As I recall, in Oracle, you were limited to writing user-defined data types in SQL or Java (now you can use C). PostgreSQL, being open source, allowed the leave to implement the custom data type in C. The query we needed in Oracle took several minutes to run due to the need to constantly serialize/deserialize the data over the aggregation. The PostgreSQl version returned in a few seconds.
We even hired a well-known Oracle performance consultant who did wonderful things to all of our queries ... except for this one custom data type which left him stumped.
We shot ourselves when we first implemented NHibernate by setting our transaction boundary at the wrong place. This caused all sorts of portability problems.
Choose Big Data or Integration downloads.
I have been taking a look at SQL Web Edition in the view of coming away from standard that supports SSIS so it's good I'm getting a few options of other DBs too.
Although a majority of our stack is Microsoft based, I would love to get off the Microsoft train when it comes to the DB. It's a massive cost to our business
Yes, quite a while ago. SQL Server has an outstanding SQL engine, but it had that a couple of versions ago, so they need a new unique selling point. This may be because a SQL engine is now a very mature technology. They always tweak it, but gains here are marginal.
I wouldn't call myself advanced in both, more like an average user from DBA point of view + some more advanced BI things. One of the last tasks which I was doing on SQL Server was creation of a set of SSIS / SSRS packages to replace odd SQL queries scattered around dozens of utilities. I had chosen SSIS for the ETL tool as I had heard lots of positive things about this one and I must say it was really pleasant to use ... to a point when it became tedious trying to deal with wrong export formats of columns. Not mentioning the fact that when I was writing code for Script Tasks, sometimes the code was mysteriously disappearing, that signing of packages setting up agent tasks and all the boilerplate for that one to work was a task on it's own. I was blaming my lack of knowledge at that point about the whole stack - you don't usually master tool that big in a week.
Yes, PostgresSQL is harder to pick up, it's harder for your average and not IT-centric company to implement as a BI tool. Microsoft is providing a ready-made full-fledged and powerful Behemot which can solve most of your problems with a snap of a finger. Unless there's some magic happening and lack of capabilities to have total insight into the tools is driving you furious. Postgres has it's quirks - at least for me it was harder to pick up than other database engines, but I'm quite happy with all the additional features which it provides and additional understanding of the whole system which I'm gaining as I'm learning.
5. SQL Server supports indexes on computable columns 
6. CLR Integration allows you to write a lot of things in C# (or VB.NET) 
10. Unicode is there by default if you use nchar/nvarchar/ntext columns, there's no problems unless you choose to use char/varchar/text instead
> 6. CLR Integration allows you to write a lot of things in C# (or VB.NET) 
You can also use a Mono plugin at Postgres, just like Perl, Python, C, C++, Lisp, Haskel, R-Cran... But it's not that, #6 was about the core functionality of the DBMS changing. For example, you can add new index types at Postgres with plugins, and people do add them. Once somebody wrote a plugin for manipulating SQL queries at what was at the time an object oritented database called Postgres. That's how deep you can go.
> 10. Unicode is there by default if you use nchar/nvarchar/ntext columns, there's no problems unless you choose to use char/varchar/text instead
Yeah, Unicode is used by default if you tell the DBMS to use Unicode... Thus, it's not the default. I'd agree that this is not a big problem.
False. Positively false. Because there is no default. Whenever you create a new column, which incidentally should contain text, you need to choose a type for that column. There is no default column-type. You need to be specific about how your data gets stored. That's why it's called a schema.
And when making that choice, for your column which will contain text, you must ask yourself two questions:
1. What sort of capacity is needed?
2. Are you living in the 1980s or in the year 2013? If your answer is the latter, you always choose Unicode unless you can argue for why it wont ever be needed. That's all.
In 2013 you don't have to argue for the need for Unicode. You have to argue for the right to use antiquated, primitive 8-bit text-types. And then you choose (because you and noone else will have to choose that column-type) Unicode text-types by default.
Anything else is madness. If any tool you rely on to do this for you chooses non-Unicode by default it is broken and needs to be fixed.
There might be a million other things SQL Server should address, but this is not one of them. Nothing is wrong with text-support.
1. Read Committed is not that restrictive to be honest. If you're short on memory it can be an issue but even in a high frequency system it should only lock at the row level. Beyond that, why would you be pushing to read dirty data? That's an issue in and of itself.
8/9. Arrays? JSON? XML? It's not relational data. Either parse it or put it in a different system, no RDBMS is going to handle these very well because they break the 'Relational' part. They might be accomodated, but trying to work with them is not a good idea in the vast majority of cases. I'm promoting SQL here and I still have to point out that SQL handles XML like crap too, try and index an XML blob.
To be fair SQL Server has constructs for this, but they are plenty in types and awful in nature. They are even worse to understand how they work and interact together and how they will optimize your queries.
And I guess maybe that was your point?
Handling XML in SQL was "reasonable" when the option was full fledged, tripple-verbose DOM code in compiled languages. These days though, you have better ways of "shredding" XML into usable data-constructs and the desire to do anything of that sort inside the DB is definitely approaching zero.
Is it because of HStore? Heroku using it? The Oracle buyout of Inno and then Mysql?
Mysql 3.x was quite awful (other than replication), but by 5.x it had started getting good, and now the forks are doing some fantastic work. So it surprises me.
Rather than actually doing that, we migrated all our data to postgresql. The postgres query planner works much better than the mysql one. Since then we've started using some of the postgres specific features, like windowing, which have made complex queries that we didn't think were going to be possible not only doable, but easy and fast (running).
Postgres has some different philosophy about certain things in the devops side of things, but if you don't try to force it to be like mysql it'll be fine.
At this point I wouldn't ever use mysql again (for new projects where I have the choice).
The per-connection cost isn't much of an issue with a connection pool, and replication has improved a lot lately (and is clearly only going to get better).
Aside from ease of getting up and running, IMO there's very little reason to pick MySQL over Postgres nowadays.
Did you actually run into a practical problem here, or is this theoretical? Most people get along fine for a while, and when they have a problem, they use pgbouncer, which seems to solve it.
"extremely painful replication, largely offloaded to third party tools"
Do you have some more specific/constructive criticism here? A lot of work has gone into postgres replication over the past few years, and it's improved dramatically for some use cases. But there are many use cases for replication, so it's possible that you didn't see improvements for your use cases.
FWIW, I started out with MySQL, and then tried out Postgres because of the license (MySQL was not yet GPL). After MySQL was GPLed, I used both for a while, but found the postgres community more welcoming/informative, and found postgres much more pleasant to use. At one point around 2003 I basically threw mysql away completely because I was fed up trying to do simple date manipulation to make simple reports. Postgres made it so easy and it seemed so natural -- it's really hard to use mysql afterward without getting frustrated. I'm sure there are things like that in reverse as well; e.g. mysql has the ON DUPLICATE KEY clause, but to me it still didn't add up to a pleasant/productive experience.
IIRC, at 8.3, Postgress got an order of magnitude bump in speed. As MySQL gains credibility as a "real" db, Postgres is making itself fast.
Postgres has been developed steadily the last decade, and has been a serious competitor to major RDBMSes since version 8.1 (which added two-phase commit, table partitioning, index bitmap scan and shared row locking, among other things.)
I say public exposure because Postgres has clearly been popular among many developers for a long time It's ben beneath the radar for a lot of people, partly due to pervasive myths ("Postgres is slow"), partly due to the fact that, well, a product doesn't sell itself.
I think the reason Postgres has shown up on HN a lot is simply that people are finally discovering it and recognizing its technical brilliance, careful engineering and high performance, and it's snowballing. I myself have used Postgres for web apps since 2003 myself, and I have evangelized among colleagues and acquaintances to the best of my abilities, so I am very happy about this progress.
I don't see why it should surprise you. While I'm sure MySQL 5.0 and its various forks have gotten better over the years, it's hard to argue that Postgres is not technically superior in pretty much every way. Even if you care solely about performance, which used to be MySQL's main selling point, Postgres has overtaken MySQL a long time ago.
> an even more expensive per-connection model than mysql
This is not a practical issue for most people, in my experience. At least nobody is really complaining about it. Rails apps happily pool connections, for example.
If you can't pool connections in the app, there are several excellent, mature connection-pooling proxies around (including a very cool one, pgPool-II, which can also load-balance and replicate at the same time).
> and extremely painful replication
Is it possible that you have not been following the developments? If you don't need per-table/per-database replication, then streaming replication is actually incredibly painless. It is by all accounts awesome. And it scales very well.
If you do need fine-grained replication, there are several mature third-party projects that are fairly painless to operate, including Bucardo and Londiste. (I would not count Slony as "painless", although it's probably the most comprehensive solution right now. Avoid it unless you are a bank.)
There is also a multimaster-replicating fork (PostgreSQL-XC) that is apparently working very well, and implements true distributed ACID replication, as opposed to the kind of "de facto multimaster replication" you set up in MySQL, where you have to make sure primary keys don't overlap between hosts.
Postgres has lots of advantage, but "figuring out how to implement it" is rarely an issue on SQL Server. SQL Server must be the easiest-to-use complete database system on the market.
In particular, incremental updates of the materialized view are not yet a feature. That said, a lot of hang wringing has been going on in the lists on how to build on this work, so it looks like this whole subsystem will receive some attention.
People forget that this is true of every software system ever. No successful system ever starts with every feature it's going to have.
In 9.3 materialized views will only be refreshable by recomputing the entire view and they will be locked while they are being refreshed. To shorten the lock you would have to swap the view with ALTER MATERIALIZED VIEW.
Worth looking at.
Consider this documentation from oracle, a very popular materialized view implementation:
"When changes are made to the master table or master materialized view using DML, an internal trigger records information about the affected rows in the materialized view log. This information includes the values of the primary key, rowid, or object id, or both, as well as the values of the other columns logged in the materialized view log. This is an internal AFTER ROW trigger that is automatically activated when you create a materialized view log for the target master table or master materialized view. "
But I still sigh every time I have to convert perfectly valid UTF-8 before a bulk insert...
Posgresql is free software.
And therefore for example no company can force you into their lock in for their profit and your loss.
1 - At the start of your project's life, you'll want it because it's free as in beer. Except for ease of developping (where Postgres is also great), nothing else matter.
2 - Later, when you need some performance, even if you have a usage patter where Postgres is slower, you'll discover that the price of any of the other DBMS is highter than just adding hardware. Thus, free as in beer wins again.
3 - If you get huge, you'll discover that free software is a big deal, and the limitations of the proprietary DBMS add a huge amount of drag. Free as in speech wins now.
If I get to a point where I actually need the features, that SQL Server or another commercial RDBMS has, then I have a good problem. It means my business is doing well and I may be in a position to consider a changeover.
I used to fret about features that frankly don't really matter when you are starting out. Coming from an Oracle and SQL Server background, I wouldn't touch MySQL but Postgres hits the 80% mark quite easily.
* Microsoft SQL Server 2005 (the release version and service packs) and earlier versions of SQL Server are not supported on Windows Server 2012 or Windows 8.
* SQL Server 2008 only runs on W2012 /Win8 from If you install R2.
* Importing .bak files from SQL2005 to 2012 Express (Not sure about the full version) will not import.
* Copying a database from SQL2005 to 2012Express in management studio will not work either.
* SqlServer 2012 Will not run on Server 2003
* SQL Server 2012 requires a minimum of 6 GB of available hard-disk space.
* SQL Server 2012 will not run on Server 2008, Vista, Win7 without a Service Pack
Why do most companies that scale out (and thus massively benefit from free-as-in-beer software) actually choose MySql over PostgreSQL? Because of the features? Nope...
Why do most companies that scale up (and thus massively benefit from expensive, but highly performant software) actually choose MSSQL or Oracle over PostgreSQL? Because of the tag price of the Enterprise features? Nope...
One question: Does this work for bitwise operators?
Is your workload write-heavy? Is it more read-heavy than write-heavy? How much query throughput are you talking?
At the smaller scale, it doesn't really matter. At the larger scale, your app's characteristics determine how much (if any) you are disadvantaged over running a more performant Postgres+Linux combo.
I'm not even sure there is somebody out there capable of correctly measuring this. You may need to go with a fast biased study anyway.
Also, why are you assuming Postgres is slower?
I'm sure the Postgres team has kept an eye on Windows, but I doubt it gets nearly as much attention as the *nix environments.
Or else Microsoft has successfully squatted a common, generic term as a name of their product. Not a good thing if you ask me.
In the "old days" I never heard of people referring to Oracle as a "SQL Server", nor DB2, etc.
Even now, people talk of Postgres as a "RDBMS", not, "Postgres, a SQL server".
So I'd say that I don't particularly agree with your postulation.
Microsoft SQL Server is an offspring from Sybase SQL Server.