
Our SQL interview questions - jitbit
http://www.jitbit.com/news/181-jitbits-sql-interview-questions/
======
edw519
I took a similar test, on-line while being watched. 4 sets of 10 multiple
choice questions: SQL, unix commands, vi, & HTML. Make your 10 choices, click
submit, get your score. It was kinda silly, but what the heck...

It was ridiculously easy and I got all 40 right without much thinking, as many
people here would also, I imagine.

Then I asked, "Why bother with this after reading my resume?"

They answered, "We have to do this. We've interviewed 52 programmers with
resumes similar to yours and no one else got them all right. In fact, the
highest score before you was 32."

Wow. Is this the state of our industry now?

~~~
T-hawk
It's the state of the _bottom_ of our industry. Those 52 programmers aren't a
representative set of the general population. Those 52 programmers are the
ones that can't catch on for any jobs so they keep applying over and over.
It's a sampling bias.

And by the way, this happens for most industries, not just technology.
McDonald's has the same problem. Their majority of applicants fail at tasks
like having the literacy to fill out an application form or getting out of bed
and showing up to work. This doesn't mean the majority of the population is
that deficient, just the majority of deadweight floating around the would-be
job pool.

More generally stated, the worse an applicant is for his desired job, the more
times his incompetence will attend interviews to be seen. Quality performers
in any business get hired quickly and don't stay in the interviewing pool. So
equivalently, any interview pool will consist mostly of bad candidates.

We need a name for this effect so that we can just quote it whenever this
topic comes up, like Dunning-Kruger. Anyone got a good suggestion? Joel
Spolsky was the first to set it out well and become widely read[1] , but
Spolsky's Law is already used for the Law of Leaky Abstractions.

[1] <http://www.joelonsoftware.com/items/2005/01/27.html>

~~~
stonemetal
After looking for a new job recently, I realized the converse is also true.
The worse a company is as a place to work the more often they need to hire.
Therefore the pool of available jobs mostly consists of jobs no one wants. Be
picky in who you hire and be picky in where you apply.

------
codegeek
"List employees (names) who have a bigger salary than their boss"

    
    
        SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN     Employees e2
        ON (e1.BossID = e2.EmployeeID)
        WHERE e1.Salary > e2.Salary
    
    

"List departments that have less than 3 people in it"

    
    
        SELECT d.Name, COUNT(e.EmployeeID) FROM Department d LEFT   OUTER JOIN Employees e
        ON (d.DepartmentID = e.DepartmentID)
        GROUP BY d.Name HAVING COUNT(e.EmployeeID) < 3
    

"List all departments along with the total salary there"

    
    
        SELECT d.Name, SUM(e.Salary) FROM Department d INNER JOIN Employees e
        ON (d.DepartmentID = e.DepartmentID)
        GROUP BY d.Name
    
    

"List employees that don't have a boss in the same department"

    
    
        SELECT e1.Name FROM Employees e1 LEFT OUTER JOIN        Employees e2
        ON (e1.BossID = e2.EmployeeID)
        WHERE e1.DepartmentID <> e2.DepartmentID
    
    

"List all departments along with the number of people there"

    
    
        SELECT d.Name, COUNT(e.EmployeeID) FROM Department d 
        LEFT OUTER JOIN Employees e
        ON (d.DepartmentID = e.DepartmentID)
        GROUP BY d.Name

~~~
ohwp
Curious question: why do you always use table aliases? To keep your query
shorter? When I don't need an alias I just use the full table name for
readability:

    
    
      SELECT
        Department.Name,
        COUNT(Employees.EmployeeID)
      FROM Department
      JOIN Employees
        ON Employees.DepartmentID = Department.DepartmentID
      GROUP BY Department.Name
      HAVING COUNT(Employees.EmployeeID) < 3

~~~
codegeek
Good question. I always find it easier to have the aliases because sometimes,
table names are too long for me to remember. Also, tehre are times when we
join the same table by itself and at that point, I use x1, x2 etc. In general,
aliases always work while direct table names may not work for all cases. So i
just keep it simple.

~~~
jeremysmyth
There's also the cognitive overload of reading long identifiers, multiplied by
the naming conventions of some large corporate databases.

I'd rather see:

    
    
        EmployeeReferences eFrom JOIN EmployeeReferrals eTo
    

...and then see

    
    
        ON eFrom.ID = eTo.ID
        ...
        JOIN xyz
        ON eFrom.Source = ...
    

rather than have to read acres of EmployeeRe-something 4 or 5 times through an
8-table BI join.

Similarly, I've had to deal with (admittedly legacy) tablenames like
A12R18SALE and A12B14PROD. Aliases come in really handy there.

------
btilly
The question _List all departments along with the number of people there_ has
an answer using a correlated subquery, rather than a join.

I have a relevant story about that.

About 9 years ago now, another developer escalated a bug to me. Every time
they ran a complicated auto-generated query, they got logged out of Oracle. No
way! I tried it. Happened to me. Began trying to narrow it down. Ran out of
connections. Got a DBA to unwedge the machine. Began again. Ran out of
connections again. Got the same DBA to unwedge the machine. Received a lecture
about not opening so many connections, replied that I was tracking down a bug
and had no choice. Showed him the bug. He was astounded.

Not long after I finished tracking it down and sent them the fix. Showed it to
the DBA who verified that it had been reported already, and was fixed in the
next release.

The bug was that any time there was a correlated subquery with no records,
Oracle logged you out. My guess is that something, somewhere, followed a null
pointer. The obvious solution was to move to a left join. If I remember
correctly, the way it was being autogenerated made that hard. My solution was
to have a correlated subquery which was a left join on DUAL so that there was
always a record.

~~~
krsunny
Relevant story? You're hired!

------
aidos
These questions are _very_ simple, though I guess they cover a few of the core
concepts. Basic selects, joins, joining the same table twice, left joins and
group by.

I'm most worried by the comment _"(tricky - people often do an "inner join"
leaving out empty departments)"_. That's a basic question and if that's
considered "tricky" you've got a real problem on your hands.

Maybe if you're hiring for a junior position you could excuse someone not
knowing about left joins. If it were for a position that had any sort of focus
on db work I would pass on the candidate (caveat, when hiring juniors I look
for desire to learn above most everything else).

Obviously I'm getting old. "Back in my day" a basic understanding of SQL was
just part of the job. Didn't matter what you worked on - you should be able to
work with relational database. I'm concerned that the attitude of "I don't
need to know that - my ORM does that for me" has become the default outlook.
Over the last few years I've had to convince developers several times that the
complex aggregation they're writing in their script would be easiest solved by
using SQL. Unfortunately, increasingly it seems that newer developers aren't
even aware that these tools are available - or how to use them.

If nothing else, relational algebra is a wonderful and elegant subject that is
worth learning.

Darn kids, get off my lawn! :)

~~~
wmil
> That's a basic question and if that's considered "tricky" you've got a real
> problem on your hands.

If they aren't warned then it's reasonable to assume that every department has
employees. Otherwise why would it exist?

~~~
aidos
Fair point. Experience has taught me to internally question whether or not
each relationship should use a left join or not. The fact that it says "List
all departments" made me think it should be a left join. In fact, to me that
screams "left join".

Really, in a philosophical sort of way, it's the use of the query that
determines which join to use. They want ALL departments, you give them ALL
departments. Using a left join you protect yourself in the future - with a
performance hit. As you say, if there was a guarantee that the data didn't
contain empty departments you'd use an inner join.

Most importantly, a reasonable developer should know to ask the question - if
not of the examiner, at least to themselves. Just making an assumption is not
the right approach.

------
SkippyZA
I have just been on the job market looking for a senior PHP position. There
were so many companies that requested tests from me. Either in the form of
online tests, or tasks which I had to complete and return. While I do
understand the need for them (having had to hire other developers), some of
the requests were quite outrageous.

1 particular company basically wanted an entire application to be developed in
an evening, and I was giving strict instructions to focus on security and not
allowed to use external libraries. After submitting this elaborate task, I was
still criticized on using PDO (which is standard with PHP...).

IMHO, sometimes the lengths employees go through to find a developer are so
ridiculous that they actually drive away people.

------
bluedino
I recently took an SQL skill assessment test from one of the big 'testing'
sites. My first problem with the test was that it was a mix of Oracle and MS
SQL, when my resume said 'MySQL'. And there were questions such as 'What is
the MS SQL equivalent to the Oracle keyword xxxx?' Luckily I've used it enough
to not bomb that portion. To be expected with a recruiter...

Anyway, some of the other questions were pretty silly like "Which of the
following is a DDL command?", and many were SELECT statements with a syntax
error that you had to pick out, and probably the one question that made sense
was about the difference between WHERE and HAVING.

~~~
Gravityloss
If their recruiting is so incompetent, maybe the company is clueless otherwise
as well?

------
gfosco
This is a great way to start... A defined schema and very clear and simple
requirements, I could dash these off really quickly, and then we could discuss
optimization. So much better than asking me to define a join. I miss SQL.
[http://stackoverflow.com/search?tab=votes&q=user%3a35398...](http://stackoverflow.com/search?tab=votes&q=user%3a353988%20%5bsql%5d)

------
Tyrannosaurs
Seems to me to be a solid test, though I might include a small amount of
sample data to nudge them in the direction of some of the potential issues
(empty departments and so on) - maybe I'm just kind like that. ;-)

In interviews I've always been amazed how few people who claim to know SQL can
use GROUP BY, HAVING and aggregate functions (or depending on the question
self joins or sub queries that will allow them to achieve some of the same
things).

My normal question is to present them a table with a level of duplication ask
them to write something in SQL that identifies the duplications and something
that removes them which covers much of the same syntax.

~~~
jcampbell1
Removing duplicates is dependent on the underlying database because the
behavior of deleting while selecting varies. I know MySQL doesn't allow it in
most cases, so you need a temporary table. Identifying them is a fine
question.

~~~
VLM
You can get a list of dupes to eat without a temp table with something like:

select min(id), count(*) as dupecount from yer_table group by
some_hash_identifier_or_whatever having dupecount > 1

And then just iterate thru delete from yer_table where the id = the min(id) as
fetched above.

Or maybe your business logic is to keep the oldest record and zap the newest.
Or based on some column data rather than simply age.

Now the really interesting discussion is how often this happens (like once-
off, or every 10 seconds, or), and how scalable you need it to be. Are you
talking about 100 records or 100e6 records. Also literal duplicates as in
"two" works pretty well but not so good if there's 50 duplicates and you need
to delete 49 of them. Of course for 49 duplicates you could select the
identifier hash and the lone lowest ID and delete all entries with the same
identifier hash where the id isn't the lowest id for that hash...

------
megaman821
If you can only answer these using an ORM, then you really shouldn't put
knowledge of SQL on your resume.

I am not really into giving programming tests to interviewees, but if you
claim to have experience with something you should be able to answer simple
questions about it.

~~~
jfb
I wouldn't treat this sort of thing as dispositive, and if I were doing hard-
core SQL development, I'd dismiss it entirely and start the interview with
much hairier wizardry; but for a generic, gonna write some queries but mostly
live outside the database kind of role, the five minutes or so this sort of
test takes at the beginning of the interview gives me a strong indicator of
how to assess what the candidate actually knows, rather than what is
represented on their resume. It is a guide for the _actual_ meat of the
interview.

------
david927
These are nice, but you need simpler questions.

Hear me out: Take away two questions (the first four are enough anyway) and
add two to start out with that are much simpler. You will be shocked how many
people will fall out at that level.

Years ago, when I first started hiring, a friend told me about this and I
didn't believe him, but I tried it anyway. I was astounded how many people
were _completely bluffing_. It helps expedite the whole process.

------
lkrubner
This comment was very good:

"This is exactly why reliance upon ORMs has had a huge negative impact on
engineering. Most of these are easily solved with Group By, Having, and/or
other aggregate functions, but the ORMs have created this veil of complexity."

If ORMs really simplified the underlying complexity, so I didn't have to think
about it, then ORMs might be worth it, but I have never worked on a large
project where, at some point, I was wholly free of the underlying technology.
If its a project that I work on for a year or more, there is always some
moment when I need to drop down to SQL.

------
300bps
This is much more useful than the typical question I've received at some
companies - "On a scale of 1 to 10, how would you rate your SQL knowledge?"

~~~
mgkimsal
I've had this and have answered something like:

"Compared to most other devs I work with, 7 or sometimes 8. Compared to people
who do SQL for a living, possibly a 4, or a 5 if I'm feeling cocky. It all
depends on what the '10' really represents - the best of the best, or the best
of the people I'd be working with."

Well, something like that. That last bit - I've said it more tactfully in the
past.

~~~
jfb
This is the only way to answer this question.

~~~
mgkimsal
Thanks :)

Said wrongly, it implies that the company has crap people working at it, and
that's generally not a way to get hired. But most companies know they're not
getting the 'best of the best' when they hire - they're getting the best they
can afford in their geographic area during a certain time frame.

------
stiff
Somewhat surprisingly most web developers I know know very little SQL, having
picked it up exclusively by tinkering to get things done any way whatsoever,
even if clumsy or slow. In fact SQL might look deceptively simple at times, at
one point I read an ANSI SQL book so I already had some formal education in
SQL when I started doing webdev, but I only really learnt ANSI SQL at the
university in the databases course, and then I still had to do more learning
about many details of my DB server of choice (postgres), including things like
spatial queries and indexes, full-text search etc., you can get huge speed ups
and infrastructure simplifications by putting those kinds of things directly
in the DB.

Ask people about difference between LEFT JOIN and RIGHT JOIN, or using the
schema from the article, to select all attributes of employees with the
highest salary in their department in pure SQL and you will see how much or
how little people know, in fact many webdevs don't even understand JOINs at
all!

------
arry
Which book would you recommend for learning this stuff? I'm not very
interested in 800-p. gorillas; there surely must be something short, not too
theoretical, and to the point.

~~~
poxrud
I enjoyed SQL Antipatterns by Bill Karwin. Very easy to read and offers some
practical approaches to common issues.

~~~
qohen
Bill Karwin's "SQL Anti-Patterns Strike Back" presentation on Slideshare.com
presentation is worth checking out -- it's 250 slides long, covers 4 kinds of
anti-patterns (in queries, DB creation and the design of both DBs and
applications). And he goes through actual code examples.

Check it out here:

[http://www.slideshare.net/billkarwin/sql-antipatterns-
strike...](http://www.slideshare.net/billkarwin/sql-antipatterns-strike-back)

------
brixon
My first weed out question is asking them to describe a Left Outer Join. They
don't have to get it exactly right, I just want to see if they ever did
anything more than a two table inner join.

For a Web Developer the first weed out question is to tell me the difference
between a GET and an POST. Here all I really want then to know is that a GET
is what generally see in the URL and a POST is commonly what you see in HTML
Forms. I want to see if all they ever did was ASP.NET WYSIWYG web development
or if they actually know something about the internet.

These two questions can be done in a phone screen. The faster that you can
weed out people the cheaper the hiring process is.

~~~
jasonkester
Careful using that terminology. You'll get false negatives on people who still
think in terms of *= syntax, and will tell you that the words "left", "outer",
and "join" are all redundant and probably don't belong in SQL in the first
place.

I used to be one of those guys, but I'm much less grumpy about it these days
so I'd still pass your test. I have a sad suspicion that I'm on the
progressive end of the spectrum when it comes to guys who deeply understand
SQL.

~~~
ajuc
Yep, I've done a lot of PL/SQL programming, and we've always used (+)= syntax
and joins with all the tables after commas and all the joining conditions
after WHERE.

Now I work on different project and we use join syntax, but I could easily
imagine people that do joins all day, and not know JOIN .. ON .. syntax.

~~~
jacques_chester
I came to Oracle after it adopted the ANSI syntax, so that's what I use. So my
experience is the opposite of yours -- when I see the (+) I need to look up
the syntax to remember if it's left or right outer.

~~~
jfb
And then I ask, hey, where's my BOOLEAN? And then I drink.

~~~
jacques_chester
Oh god.

And the lack of a serial/autoincrement/identity type.

So. Many. Effing. Triggers.

And 32-character identifiers.

 _sigh_

~~~
tmzt
It's so much better to use a database that only allows one autoincrementing
value per record, or one TIMESTAMP and then only allows you to have either a
create timestamp or an update timestamp without writing a trigger.

I prefer the way Oracle does it, you may have to do more work but it's more
explicit and flexible that way.

~~~
jacques_chester
I don't. I prefer for the common case to be correctly and automatically
handled for me.

------
xanadohnt
If the position you're filling is directly dependent on more-than-average SQL
experience - creating a DB driver, an ORM, for ex. - then SQL-specific
questions are applicable. But, by and large, this type of specific-knowledge
testing is not very useful. I want to see a developer's general abilities at
problem solving and the source code to back it up. If you have solved complex
problems in C# - and can prove it - then you certainly as hell can solve
complex problems in Go despite not having any experience there yet. Sure, if
I'm trying to fill a Go position and someone has proof they're an excellent
developer _and_ it's in Go then they'll get top consideration.

Being able to write SQL queries from memory has little correlation to a
candidate's level of ability. Personally I consider myself a fairly strong
developer and it hasn't been only until the last year that I can now write
pretty complex joins from memory. And I've been developing for 20 years. Only
because of a recent project and the volume of queries I had to write did my
method change from using a graphical query writer to simply memorizing the
syntax I need. Indeed, this very type of adaptation is something I look for in
candidates.

~~~
pessimizer
These are _very_ simple, though. If you say that you know SQL (or have known
SQL, but are a bit rusty) and you can't at least make a strong showing on
these questions, you are lying.

This is not complex problem solving, this is problem solving. I'm not sure
what memorization has to do with any of this. SQL is a language with only
_maybe_ 10-15 important words. It's not like trying to get around Paris
without a phrasebook as a non-French speaker.

I've found that using graphical query writers (I only know of the one in
access) and ORMs are generally _harder_ than writing the SQL. ORMs are good
for keeping code db agnostic, though.

~~~
xanadohnt
It's called Query by Example and there are quite a few DB management GUIs that
have it. I found it pretty intuitive for setting up joins by right-clicking
join lines and setting the particular properties of that join. Of course now
that I have committed all types of joins to memory it's far faster to write
them simply by hand. But my lacking that memorization 10 months ago wasn't
really insite into my ability (or lack thereof) as a developer.

------
VLM
I like the little schema, it flows right into more advanced discussion about
how you'd deploy indexes based on the design and queries, how you'd expand the
schema in normalized form into supporting an office building seating
assignment for each employee, or even multi-sites for employees using a many-
many table.

One thing I didn't get was one comment on the article that a guy could
struggle thru this with phpmyadmin but not at the console. Maybe he was
kidding or trolling. I recently install phpmyadmin to fool around and I can't
imagine talking about using it, you'd have to click like fifty thousand times
just to implement just a simple query and it would probably take 15 minutes,
yet not reduce the cognitive load at all. How do you talk about GUIs in an
interview? "Click on the icon of the fornicating centipedes, then on the
cthulhu icon, then in the ribbon select the turtle crossing street sign" It
makes talking about regex's seem humane in comparison.

~~~
tmzt
I've switched to Chive DB (chive-project.com) precisely to get away from the
inanities of the PHPMyAdmin interface, it's much easier to just enter the SQL.
(Working on a Chromebook so not using a native application for this.)

------
kamaal
There is a very simple way of testing SQL knowledge. And you don't need any of
this online tests or white board programming stuff.

Build your self a small sqlite database. Nothing much, but sufficient enough
to test the candidates ability write queries. Give him a manual. No internet
connection and now give him problems(a few select queries, joins, inserts and
may be a few tests here and there to test how good the guy is in schema
design). If the guy can write queries after reading the documentation, then
hire him.

If he can't write queries, I mean practically on the computer and show you
results he is not of much use. Even if he can answer all your white board
answers.

This is applicable to any programming interview. If a person can read
documentation well and find his way to write a program to solve a problem such
a person makes a good hire.

------
acjohnson55
I could nail those with the Django ORM, but I'd struggle to write
syntactically correct SQL, not having done it in a while. But it says that
your test machine has MS-SQL; with the machine in front of me, I could
probably puzzle it out the join quirks with a couple minutes of trial and
error.

~~~
tomp
How would you solve the first one using Django ORM?

~~~
dalore
Employee.objects.filter(boss__salary__lte=F('salary'))

Find me employee objects which have a boss salary less than or equal to the
salary.

~~~
PeterisP
Out of curiosity, would the ORM map to the same SQL query? Or would it request
all employee-boss pairs and filter them outside of the DB?

There's a huge performance difference involved.

~~~
freework
Potentially huge performance difference.

For the 99% use case, the performance hit fo the ORM is not significant enough
to matter. Most projects have many tables, but only one table that actually
needs to have any speed optimizations. That one table can go in NoSQL and the
rest can be handled by a ORM.

------
pramodliv1
Nice set of questions.

I would also use some sample real world data to check if my queries scale well
instead of inserting 10 rows and running all queries on it.

[http://stackoverflow.com/questions/57068/good-databases-
with...](http://stackoverflow.com/questions/57068/good-databases-with-sample-
data)

------
jcampbell1
What is the preferred way to aggregate with nulls?

    
    
        SELECT  Departments.name, SUM(COALESCE(salary,0))
        FROM Departments 
        LEFT JOIN employees USING departmentID
        GROUP BY 1
    

The above is how I would solve the last one, but I often feel like I abuse
COALESCE.

~~~
zrail
Aggregates generally do the right thing with null without the coalesce.

~~~
jcampbell1
Thanks. It appears I need to stop overusing coalesce. I was told that sql NULL
means "A value that is not yet known", which nicely explains why 1+NULL, 1 <
NULL, 1 > NULL, 1 = NULL is always NULL. Now I know that AVG(test_scores)
produces the average of the known values automatically.

\- - -

I just did a test, and it appears the COALESCE _is_ needed in this case.
Running an aggregate where all values are null, results in NULL (the empty
department). You need to do something because the total salary of an empty
department is known to be zero.

------
mhd
A sample SQL file with dummy data would be much appreciated, I think. I guess
some of us would like to try their luck.

At least this should be quicker in an interview situation than the usual
"normalize this database" or "given this situation, define a database schema".

~~~
arb99
I made the tables quick to go through them. It could do with more data really
but its got a few rows.

<https://gist.github.com/abkr/5662615>

~~~
petepete
I had the same idea, but in an SQL Fiddle

<http://sqlfiddle.com/#!1/778bb/5>

~~~
mhd
Didn't even know that there was such a thing as SQL Fiddle. That's going to
come in handy, beyond this quiz, thanks.

------
numbsafari
I like this set of questions and the simplicity of it. I think I would only
add one or two simple questions about INSERT, UPDATE and DELETE statements.

Writing a bad SELECT doesn't tend to have the same ramifications as an
incorrect DELETE or UPDATE statement.

------
nahname
Anyone else bothered when primary keys are not given the name 'ID'?

~~~
jfb
I'm bothered by primary keys full stop.

~~~
nahname
What's wrong with identity?

~~~
VLM
A concrete example is some folks like the abstraction of a row as your primary
"thing" and some folks like the abstraction that the data defines a row and
rows don't really exist just the data.

Consider the hated multiple primary key situation where you've got a
autoincrementing prikey and a "real" key where you make an unique index off
"full name" or something. So which is the real conceptual primary key?
Shouldn't you use the full name as the "primary key"?

Problem: What if the business logic of what a distinct user is changes from
unique "full name" to unique "full name" and "telephone number". Whoops now
all your foreign keys need messing with, its just a bad scene. Ditto schema
changes like you finally change from ascii to utf8 or something, now all your
foreign keys need changing (well thats maybe a bad example unless your ascii
datatype enforces 7 bits or you're running into byte length vs character
length limits...) Or you change the length, which changes the truncation
perhaps, which changes your foreign keys. Also you can't just use a rule like
all foreign keys are BIGINT now some are CHAR(20) some are FLOAT who knows.

On the other hand lets say you implement just a prikey. Now you can have
multiple rows with the same data, because you never set up a UNIQUE INDEX.

Generally speaking if you KNOW absolutely KNOW that your schema will never
change, you should probably optimize it to not have multiple keys aka a
primary key and unique indexes, or data definition will never change. Very few
people can guarantee it so they're better off in the real world with imaginary
prikeys.

You can read a lot more about this in "SQL antipatterns" I think chapter 4 or
so, but always keep in mind that beyond noob level of being able to define the
overall issue, short term snapshots will occasionally (but not always)
conflict with longer term thinking.

~~~
dragonwriter
> Consider the hated multiple primary key situation where you've got a
> autoincrementing prikey and a "real" key where you make an unique index off
> "full name" or something. So which is the real conceptual primary key?

If the full name is a _real conceptual primary key_ , you shouldn't have
introduced an autoincrement key. If the uniqueness of the fullname is a
business rule but not a real conceptual restriction (a distinction which can
be hard to make, to be sure), then it makes sense to create the autoincrement
key -- and it is the only real primary key. (That is, the autoincrement key
represents the concept of identity which isn't present in any of the other
data.)

> On the other hand lets say you implement just a prikey. Now you can have
> multiple rows with the same data, because you never set up a UNIQUE INDEX.

No, you can't, because the "prikey", as you call it, _is_ data, and has
meaning -- specifically, it represents identity -- so rows which differ in it
do _not_ have "the same data".

~~~
VLM
The business concept of the "real conceptual primary key" can change, perhaps
dramatically, over time, as the business model changes. A real prikey never
changes.

~~~
dragonwriter
> The business concept of the "real conceptual primary key" can change,
> perhaps dramatically, over time, as the business model changes. A real
> prikey never changes.

This confuses two different concepts:

If the conceptual model changes, then, yes, the candidate keys (including
primary keys) of entities may change between the old model and the new model.
This can be a pragmatic difficulty in migrating between different conceptual
models, but that's a problem inherent in different conceptual models.

The _value_ of a well-chosen primary key of an entity within any given model
should not change, as the primary key should always be a value which
_identifies_ the entity such that a different primary key means a different
identity.

------
lucb1e
Liked them; not too hard but also not too easy. I'd have succeeded on the
interview if I had been given the chance to test them (and if I wasn't too
nervous about it I guess). Never had an interview with technical questions
like this before; are you commonly given a chance to test them?

My database and answers dump (Warning: spoilers!)
<http://pastebin.com/HGBpemHn>

------
slc
The second question is actually trickier than one might think.

The obvious answer - something like

select Name, MAX(Salary) from Employees group by DepartmentId

is wrong.

~~~
twistedpair
Call me foolish, but what about the following makes it undesirable? The
question didn't ask about a null case of a department with no employees.

\-- List employees who have the biggest salary in their departments SELECT
em.EmployeeID, em.departmentId, MAX(salary) as salary FROM employees em GROUP
BY em.departmentId

~~~
slc
In your example, for each row of the result set

* "em.departmentId" will contain one of the distinct values from the "departmentId" column

* "salary" will contain the maximum value of the "salary" column of the table rows whose "departmentId" equals "em.departmentId" of the given result set row.

* "em.EmployeeID" will contain the value of the "EmployeeID" column of one the table rows, whose "departmentID" equals "em.departmentId" of the given result set row, but it is UNDEFINED which one. It IS NOT quaranteed to be the one whose "salary" column equals "MAX(salary)".

See here for examples of how to achieve what is actually needed:
[http://dev.mysql.com/doc/refman/5.0/en/example-maximum-
colum...](http://dev.mysql.com/doc/refman/5.0/en/example-maximum-column-group-
row.html)

As I said, tricky, and, judging from the difficulty level of the other
questions, I suspect that the authors of the article have fallen for it
themselves.

~~~
twistedpair
Thanks for the clarification. 5/6 and dunce hat for me :)

------
pikewood
One initial step I use is to give a description of the problem, then have them
actually design the table structure themselves. I then have a printed copy of
the structure along with some sample data ready, so that I can refer to
specific values (and also to help them visualize the contents).

------
codeka
I find it kind of interesting that the first comment on the article says they
wouldn't be able to answer the questions by they use an ORM. I'm not really a
fan of ORMs, personally, because I don't think it's useful to try and "map" a
relational model onto an object-oriented one.

~~~
lucian1900
There are some really good SQL libraries out there that make it easy to
compose SQL and some of them also include an ORM built on top of the basic
abstraction.

SQLAlchemy is a great example of this, I always get precisely the SQL query I
would've written myself, except it's syntactically correct, easy to compose
and has a chance of being portable.

------
EGreg
It's an interesting debate. While I also feel that developers should know the
underlying SQL, however all that stuff like joins, indexes etc. are actually
very hard to scale beyond one machine. MySQL cluster does attempt to do it
automatically, but even it has limits, and places most stuff in memory. In
short, if I was looking for developers to do sharding, I would actually prefer
to AVOID queries with joins, non-pk lookups etc.

Having said that, I have discovered a heuristic over the years: that if you
are using an ORM, you probably don't want a relational database. You should
learn something like Riak and let it handle the distribution and provide all
the partitioning and availability for you. The CAP theorem shows that you
can't get it all, and most likely you want to use one of those data stores
instead of a relational one.

For regular sites that won't have millions of users constantly using it,
though, a relational db is fine.

~~~
acdha
“that if you are using an ORM, you probably don't want a relational database.
You should learn something like Riak and let it handle the distribution and
provide all the partitioning and availability for you”

These are not the same concept: a relational database makes sense when your
application relies on relations between records. If you need to do lots of
joins across many records, Riak is going to perform horribly because it's
designed for a different problem.

CAP says nothing whatsoever about whether you want a relational or non-
relational database, merely what tradeoffs you'll have to make to satisfy your
business needs.

Using an ORM doesn't factor into this discussion at all other than for
providing a convenient place to implement whatever system you devise to meet
those needs.

~~~
EGreg
As I said, it's a heuristic. If you find that you are telling your developers
to use your ORM, then you probably should have gone with a NoSQL database like
Riak. You can still do joins, etc. but it's in the context of things like map-
reduce, and it makes sure that you can scale despite the joins.

MySQL way: SELECT * FROM a JOIN b ON x WHERE y

NoSQL way: 1) SELECT * FROM a WHERE x 2) Perform join in app layer or stored
procedure.

Like it or not, when you scale you will lose one of the CAP, and NoSQL
databases do the hard task of delivering an eventually consistent data store
to you and letting you express yourself in the RIGHT context, which is not
SQL.

~~~
acdha
You're still conflating different parts of the stack: an ORM has nothing to do
with CAP.

> You can still do joins, etc. but it's in the context of things like map-
> reduce, and it makes sure that you can scale despite the joins.

Either of your examples are commonly implemented in SQL databases, too: this
is a routine MySQL optimization to avoid subselects and, amusingly, one which
an ORM makes significantly easier to implement:

SELECT * FROM a WHERE x; SELECT * FROM b WHERE pk IN (…list of IDs from first
query…);

Again, the SQL vs. NoSQL question is about your data model and access
patterns, not whether you use an ORM or magical thinking about CAP. The line
between the two has become quite blurry since there are things like MySQL-
backed key-value stores or Postgres extensions which allow it to handle
document-store workloads without losing performance or giving up the ability
to do flexible queries. This isn't a question of religion: it's just looking
at your business, assessing how well you know the access patterns (SQL systems
are generally more flexible) and performance requirements and picking the best
solution. Anyone claiming to have a right answer for everyone is wrong.

> Like it or not, when you scale you will lose one of the CAP, and NoSQL
> databases do the hard task of delivering an eventually consistent data store
> to you and letting you express yourself in the RIGHT context, which is not
> SQL.

You've now gone from wrong to very dangerously wrong: there is no scale which
is immune to CAP and NoSQL has no magic for avoiding this. Eventual
consistency is only appropriate for some problems and, as above, can be
implemented on either system. No matter what storage system you choose you're
still going to have to make careful decisions about business priorities and
test carefully.

------
yread
Much better than this question i got asked once:

"Which is faster: select from a table or from a view?"

~~~
clubhi
I don't see why that is a bad question. A reasonable answer is selecting from
a table. Of course it depends on many factors. I often ask questions like this
just to get the candidate to tell me why there is not an absolute answer.

~~~
chris_wot
Unless, of course, the view is a materialized view.

~~~
jfb
If a candidate can talk intelligently about materialized views, I think we're
past the "explain HAVING" stage of the interview.

------
wambotron
This is the perfect interview test. It's easy enough that a candidate can roll
through it relatively quickly, but deep enough to prove they have the
experience they claim to have. Kudos on having a smart interview test!

------
jcampbell1
It would be really interesting to see solutions for MongoDB. These questions
are designed to be easily solvable with SQL, but it would be interesting to
see how this can be solved with a totally different technology.

------
free
They seem to be quite simple. For what profiles are you asking these
questions?. I have used very similar questions for dev ops role.

~~~
arb99
Yeah they seem quite obvious really. Serious question: is this really the
types of questions asked for a dev job interview? (was still interesting to
see.)

>List all departments along with the number of people there (tricky - people
often do an "inner join" leaving out empty departments)

inner join seems the non obvious way to do it really IMO.

    
    
        select 
        departments.name as "department name",  
        (select sum(salary) from employees where employees.departmentid = departments.departmentid) as "department total salary" 
        from departments

~~~
jacques_chester
The empty department remark is the clue. They want all departments; if it's an
empty department they still want to see that it's an empty department.

An inner join will hide that row because there's no equality between a set
(departments) and an empty set (employees in that dept, of which there _are
none_ ). A correctly structured outer join will.

------
pessimizer
Coincidentally, this is also the list of questions that I need to ask anyone
who is recommending a particular NoSQL solution.

------
hgezim
Here's a SQLFiddle to try the questions out :)

<http://sqlfiddle.com/#!2/9a84e>

~~~
_pmf_
That's perfect!

------
jrockway
Plural table names? <irritated hipster sigh>

------
geekymartian
wait, they have somebody doing only SQL ?

~~~
brixon
There is someone that works indirectly for me and that is all they do. A lot
of data warehouse and DTL stuff, so SQL is their life.

------
LekkoscPiwa
<Blunt> I'm not a SQL Developer, but anytime I get questions for which Google
has answers to be found in 5 minutes or less, I'm quite hesitant to work
there.

I respect interviews that go along the lines of: "What would you do if..."
after giving a detailed description of their environment. But then again I'm a
tools/OSes admin, so maybe it makes more sense for my job description. But
anytime guys are too focused on third option you can give to the 'ls' command
and don't ask real world questions related to potential or existing issues
they have/had, I'm not really interested. I can use google, you know, if that
answers your question.

I like questions where the interviewer can see years of experience not the
amount of detail memorized from a manual.

I would imagine questions that are good to start with: how you as a developer
usually start a design of a database? How do you plan it? I got this question
once, it's really good. Can't answer that after reading sql book two days
earlier. In contrast to his question. </Blunt>

~~~
jbverschoor
I don't run a school. Anytime a web-developer does not know basic sql or other
basic knowledge, I refuse to pay salary, and even deduct the 1$ per minute.
That five minutes just cost you $10

~~~
unethical_ban
Not a big fan of on-the-job training, then?

Instead of paying a nominal price for people relearning some primitives for a
specialized area of IT/software dev, nowadays most companies want the new kids
to go to $80,000 worth of college for the same. That way, the employer gets to
skip training, and the student is less likely to leave because he or she has
too much damned debt to be mobile.

Maybe I'm projecting.

~~~
jbverschoor
Actually I am a big fan of it.

But I'm not a fan of people stating that see these kinds of tests as an
attack, or see it as something which is unnecessary. There are a lot of people
out there are unable to answer these questions, even with "google". And most
of them are either unwilling or unable to improve themselves.

I'd rather have a very eager "junior" who loves what he does, and works hard
to understand and develop him/herself than someone "senior" who knows a couple
of tricks and is too arrogant to do these kinds of tests.

You keep people if: * they can improve / if they learn * if the colleagues are
nice * and the product is interesting, aka they have meaning * management is
done reasonably * there's a future for them

this is why startups are popular (learn lots of things, meaningful, good
upside) as well as corporates (carreer path is flexible, good management,
etc).

