

Things in SQL Server which don't work as expected - dangoldin
http://tech.pro/tutorial/1419/10-things-in-sql-server-which-don-t-work-as-expected

======
moomin
I've worked with SQL Server for 20 years now, and you'd be amazed at how often
the issues listed here actually come up. My Damascus moment was using Postgres
for the first time. I kept discovering that it "just did the right thing" in a
way that SQL Server never seems to.

~~~
VladRussian2
SQL server by default creates tablespaces with non ANSI SQL NULL behaviour.
Exlicitly specifying ANSI SQL compatibility when creating tablespace makes it
"to do the right thing"

Edit: was doing it many years ago so forgot the details (or they changed :),
so just checked - it is "alter database" that can switch various "ansi_null"
default settings into ON from default OFF

~~~
chiph
The default in upcoming versions of SQL Server will use the ANSI NULL
behavior.

[http://msdn.microsoft.com/en-
us/library/ms188048.aspx](http://msdn.microsoft.com/en-
us/library/ms188048.aspx)

------
masklinn
> Oracle and PostgreSQL sort NULL values last by default. Also, those two
> RDBMS support an extension to ORDER BY clause: each expression accepts
> possible modifiers NULLS FIRST and NULLS LAST

The wording may make one think NULLS FIRST and NULLS LAST a is proprietary
feature of Oracle and Postgres. It's actually an optional feature of the SQL
standard (T611 "Elementary OLAP operations")

------
ams6110
If your expectations are learned in one RDBMS and you move to another, you
will always find things that "don't work as expected."

This is why it's a bad idea to assume you can treat RDBMSs as interchangeable
black boxes, unless you confine yourself to the most basic subset of standard
SQL.

~~~
eitland
That would be a good defence if he were talking about leaving out optional
parts.

He is giving examples of violating the standard.

~~~
sequence7
So, just out of curiosity, which other RDBMS doesn't violate the standard?

~~~
eitland
I never said that.

~~~
sequence7
I honestly think it's a very informative article, clearly SQL Server doesn't
adhere to the standard (whichever one you choose) but I'm not aware of any
RDBMS that does so I was curious as to whether there was one.

There's lots of amazing databases around, each has their own quirks and as the
gp says it's generally difficult to drop in replacements due to those quirks.
There's a few listed in the article for SQL Server I'm sure anyone could
easily find where others deviate from the spec too. But anyway getting back to
what you said:

> He is giving examples of violating the standard

That's absolutely true and it's something every other RDBMS does.

~~~
masklinn
> That's absolutely true and it's something every other RDBMS does.

To various degrees, and there's violating the standard (by adding non-standard
construct) and violating the standard (by failing to implement a feature
entirely, or implementing it with different syntax or semantics).

[http://troels.arvin.dk/db/rdbms/](http://troels.arvin.dk/db/rdbms/) is
generally my go-to page for that stuff as it lists standard solutions (with
reference and mention of the features used being optional if applicable) and
alternative db-specific solutions (exclusive or inclusive of the database
accepting the standard solution). Sadly it is not really up to date, and
obviously does not go through the whole standard.

------
ScottWhigham
Lots of good tidbits and examples in there - I really enjoyed reading that.
There are several not-quite-inaccuracies-but-not-really-how-you-would-think
moments in it but, overall, it's very well written. The pictures were annoying
after a while haha.

Just to clarify, this isn't "specific to SQL Server" \- SQL Server was
originally a straight-up "licensed by Microsoft" version of Sybase SQL Server
and thus it's early code base (things that defined the Transact-SQL language)
has almost all to do with Sybase and naught to do with Microsoft. Things like
the plus sign as a string concatenator, the way numbers are handled when
compared to strings - all of that was the way Sybase did it and, for whatever
reason, Microsoft didn't change it (we're talking way back in the late 80s and
early 90s). In fact, there's not a single thing in that article that has
changed or been implemented by Microsoft - nearly every one of the
nitpicks/topics referred to by OP refer to the original choices made by the
Sybase engineers. You could say, "Yes, but Microsoft hasn't changed this
despite having 20+ years with the source code + standards!" and that's a valid
point.[0]

1) TRIM - This has been asked for and discussed ad nauseum in the SQL Server
and Sybase community. It always circles around the idea of VARCHAR though -
the variable character type automatically trims trailing blanks thus, years
ago, developers said, "Ahhhh, screw it - no need for a TRIM() function! They
just should use the LTRIM() function to get rid of the leading blanks." It's
not correct IMO but that's always been the logic.

2) "This behavior violates the standard and is specific to SQL Server." \- I
hate this statement. Which standard? What section? This is a case of it being
easy to say something but, without backing it up with a citation, it implies
that I trust the OP. I dont' know OP - maybe he's right but maybe he's
interpreted "the standard" incorrectly, or is referring to an outdated
standard, or is comparing SQL Server 2000's interpretation of a later-written
standard. There's no way to know - it's just too ambiguous.

3) ORDER BY in subqueries - I laughed when I saw this:

SELECT * FROM (SELECT TOP 100 PERCENT* FROM mytable... ORDER BY id) q

 _In nested queries and inline (single-query) table valued functions, ORDER BY
is not supported unless TOP is used with the SELECT._

Kinda, sorta - it depends on which version of SQL Server you are referring to.
SQL 2012 - yes. SQL 2008 - yes. The "original" SQL Server would have done what
OP talks about - sorting w/ TOP - but, at some point, MSFT changed it to give
the behavior he talks about. I can't remember when - maybe SQL 2005 or, at
worst, 2008 - but this is an example where using specific version names would
be helpful.

[0]
[http://en.wikipedia.org/wiki/Microsoft_SQL_Server#History](http://en.wikipedia.org/wiki/Microsoft_SQL_Server#History)

------
Lendal
This was mostly well done and really makes me want to take a closer look at
Postgres. But a cheap shot came on the last one, "Readers and writers block
each other". Both locking and snapshots are implemented in SQL Server, and you
can pick your poison. Is Postgres or Oracle better? Is either method more
efficiently or intelligently implemented in either of those systems than in
SQL Server?

~~~
thristian
PostgreSQL is 100% pure MVCC, a little bit like the "SNAPSHOT" method
described in the article: updates and deletes always create new copy of a row,
and every row has a visibility associated with it: the oldest transaction and
the newest transaction that can see that row (if the row hasn't been deleted
by any currently active transactions, the 'newest transaction' ID is empty).

There are definitely some drawbacks to MVCC. For example, many people are
surprised how long "select count(*) from X" takes to run, because it has to do
a full table-scan to check which rows the current transaction can actually
see. You need to periodically 'vaccuum' tables, which updates a cached list of
which table rows are deleted, so they can be reused by future update (modern
Postgres will auto-vacuum, but it's another set of knobs for the DBA to
tweak). Plus, the storage overhead mentioned by the article (although that
really depends on the amount of churn in your particular application, and
regular vacuuming keeps it to a minimum).

~~~
gmac
In fact, the latest stable Postgres supports fast, index-only scans for count
queries: [https://wiki.postgresql.org/wiki/Index-
only_scans](https://wiki.postgresql.org/wiki/Index-only_scans).

------
derefr
Ever since I got exposed to Haskell, I've been wondering why RBDMSes--
notionally the things which should have the strongest type-systems of all--
have the moral equivalent of nullable pointers, rather than an Option sum-
type. There are many cases in the use of SQL queries where weird
inconsistencies with NULL could be solved by just distinguishing between
Nothing, Just Nothing, Just Just Nothing, etc.

~~~
adamconroy
The db designer can handle that, just add an fk to a table that defines the
values required for a particular problem domain.

In other words, don't succumb to the laziness of using nulls.

~~~
derefr
> don't succumb to the laziness of

No, see, that's what I was questioning. Why is using NULLs _easier_ , such
that it's the "lazy" thing you can "succumb" to? UI is fundamentally about
_incentives_ \--making good things easy to do, and bad things hard--and an
API, or a protocol, is a UI for programmers.

Sure, instead of a NULLable column, you can have a pair of columns, one with
an ENUM type to represent the sum-typeclass's type-instance, and the other to
represent a value of that type. But that sounds like you're working at the
wrong level of abstraction, fundamentally.

You're using an RBDMS because it lets you speak in terms of tuples, relations,
projections, and other Relational Algebra concepts, rather than in terms of
keys, values, and indexes. I can see doing the pair-of-columns thing in a
NoSQL store--but why did the designers of SQL let this implementation detail
infect my RDBMS?

It's kind of funny, actually. I see developers treating their RDBMS as a dumb
object store... but then implementing a "domain model" layer right beyond it,
basically creating a "model server" accessible over an API, which enforces the
_business domain_ constraints on the objects it sends and receives. In other
words, they've Greenspunned an RDBMS of their own in all but name, with its
own custom protocol and half-assed validation logic. I feel like a large
reason for this is that current RDBMSes just aren't up to representing the
types and constraints and relations that developers need to express, so they
put up their noses and say "business logic doesn't belong in the database!"
...and proceed to build a database for their business logic to reside in.

It's times like this when I almost get an urge to develop my own RDBMS that
actually keeps the Relational-Algebra abstraction clean. No concept of a
"table" or a "view" or an "index"; just relations between projections of an
infinite theoretical tuplespace, where tables are created to store data that
can't be inferred from an existing view, and garbage-collected behind the
scenes when their values are no longer referenced by any projection. Indexes
created gradually when a tracing-JIT-like query planner notices repeated
access patterns. Strong, structural, parametric type declarations on every
piece of data. Constraints that actually allow you to express things like "a
user-to-user 'friend' relationship must be symmetric", rather than just being
a weak declarative dress over triggers. Etc.

Would anyone be interested in such a beast?

~~~
adamconroy
You don't need a pair of columns to replace a nullable column. I was
addressing the second part of your comment not your questioning of allowing
nulls.

Nulls have their place I believe, but they are grossly overused. For example,
if you have a Person table, then something like a nullable DateDied is logical
and acceptable over a sub-type solution. But something like HairColor should
not be nullable to cater for bald people, it should non-nullable and reference
a table that has a 'none' value.

On another note, I don't think you can eliminate the null concept because they
are the implicit outcome of outer joins and therefore means of comparing and
aggregating nulls needs to built in.

~~~
derefr
> if you have a Person table, then something like a nullable DateDied is
> logical and acceptable over a sub-type solution.

But this is exactly _where_ NULLs are bad. "DateDied IS NULL" could mean many
different things, semantically:

\- This person hasn't died

\- We don't know if this person has died (the information was not provided
with the record when imported)

\- DateDied is not an applicable property for this subtype of Person (for
example, if you have an ImmortalPerson subtype.)

There are probably even more, if I wasn't just going off the top of my head.
When you do an outer join, you get _another_ level of NULLability--which would
perfectly map to another layer of "Nothing | Just something"\--but without
that, you get those NULLs mixed in with the other three types of NULLs above.
And then you insert all those into a table, export it, and import it somewhere
else... and nobody knows what any of the NULLs mean.

------
kabdib
This is why you want to use SQL databases quite conservatively. Nearly every
product I've seen that heavily leverages SQL has been an unportable mess.

... then you hit the performance wall, and say, "Well, just /one/ of these
little features won't hurt, right?" And a year later you're singing the blues.

Standards should be executable.

------
fegu
An enlightening and thorough article. Note that it is also not possible to
SUM() a bitcolumn. Why, I will never know.

~~~
adamconroy
Is someone who finds something enlightening qualified to say whether something
is thorough?

------
TomGullen
Good article. Another one that I find counter intuitive is that the sum of an
empty set returns null which goes against set theory (states it should return
0) but apparently this is by design

~~~
aeorgnoieang
Why shouldn't the "sum" of an empty set return 'NULL'? I could understand why
a 'count' of an empty set should be 0 (which it is in SQL Server).

~~~
eurleif

        sum(A) + sum(B) = sum(A ∪ B)
        sum(A) + sum({}) = sum(A ∪ {})
        sum(A) + sum({}) = sum(A)
        sum({}) = 0

~~~
dbaupp
(This assumes either A and B are multisets, or they are disjoint.)

~~~
kragen
Indeed, but in the context of minor changes in how things SQL should behave,
we can assume that everything is a multiset.

------
AlisdairO
As one of the comments notes, ORDER BY in subqueries is sometimes used for
implementing pagination. I only have a limited amount of experience with SQL
Server, but you ought to be able to accomplish the same using window
functions:

    
    
        SELECT blah FROM 
            (SELECT blah , ROW_NUMBER OVER(ORDER BY blah) num from mytable) as subq
        WHERE num < 10
    

Of course, it's anyone's guess how efficiently SQL Server will optimise that
:-).

~~~
masklinn
That is correct: [http://troels.arvin.dk/db/rdbms/#select-limit-
offset](http://troels.arvin.dk/db/rdbms/#select-limit-offset)

Although note that if you're one of the Cursed Few who still has to use SQL
Server 2000, you don't have ROW_NUMBER.

~~~
AlisdairO
That's a useful site, thanks!

------
adamconroy
That was quite dull and for the large part spurious. Order by on an in clause,
aggregating bits, multi column in clause, trim, ordering of nulls, using,..
All of these are of zero practical interest.

Also, all this talk of 'standards' is strange because there is no mention of
what standard is being referred to.

~~~
cafard
Dull perhaps, but spurious? As someone who writes SQL in a couple of dialects,
these are the sort of things that I need to keep in mind.

