
The software development final exam: Algorithms and Data Structures - cperciva
http://www.daemonology.net/blog/2012-10-08-software-development-final-exam-part-1.html
======
mechanical_fish
_If you can't answer the majority of the questions...you're lucky enough to be
working within a narrow area where your deficit doesn't matter._

Where by "narrow" one presumably means "narrow in theoretical scope, but
extremely large in terms of number of people, number of customers, number of
paid hours spent working on things, and amount of impact on the world". Most
of the web was built by people who don't know what B-trees are.

How do these people succeed as much as they do? Because the folks who designed
PostgreSQL _do_ understand what B-trees are, and they have been very
successful at their goal: To encapsulate that knowledge in an abstracted
system that other people can learn to drive without entirely understanding how
it works.

This is not a problem with the web developers, nor is it a problem with this
exam, which at first glance looks reasonable – people with CS degrees should
be able to answer these questions. It is, perhaps, a problem with the concept
of "software developer": It's being asked to cover too big a range. If the
automotive industry were like computing, we'd use the phrase "auto mechanic"
to refer to an amateur with a Haynes guide, a Midas Muffler employee, a member
of a NASCAR pit crew, a Formula One driver, a mechanical engineer at Tesla
Motors, an organic chemist working at a tire company, and Kiichiro Toyoda.

~~~
jrabone
_Most of the web was built by people who don't know what B-trees are._

By number of sites, perhaps. By revenue, I don't think so. The big names that
dominated what the web is to most people certainly do know this stuff, at
least insofar as the practical implications, and they try not to hire people
that don't for development roles. That isn't to say that they don't also just
use Postgres (or Berkeley DB, or whatever) instead of writing their own.

 _If the automotive industry were like computing, we'd use the phrase "auto
mechanic" to refer to..._

I totally agree with this - the field could use a few more job titles (and a
little less grade inflation - SENIOR software engineer on 2 years experience?
Really?) Software Fitter would be a much better description for the skilled
assembly role that a lot of modern software boils down to. CRUD Technician is
going to to need some PR work though!

~~~
kamaal
>>By number of sites, perhaps. By revenue, I don't think so.

>>Software Fitter would be a much better description for the skilled assembly
role that a lot of modern software boils down to. CRUD Technician is going to
to need some PR work though!

I don't know why intelligent people with a little knowledge automatically
think they are going to be rich. This causes a lot of heart burn and pain to
people in later parts of their lives when they figure out things don't work
that way in the real world.

Regardless of whatever you may think about them. You may even think them
undeserving of all the money they earn, you may think they are stupid. The
fact is abstraction enables people to work with a lot of things without
knowing much about them in detail. In fact abstractions are invented for that
very purpose. So that a million people shouldn't waste their time, energy and
effort to relearn things, when all their resources can be better utilized to
know just the abstract details and build things on top of it. This enables
people to build things quickly on what is already available.

Revenues, money et al are dictated by what the world considers valuable at any
time. You might know 1000 algorithms from the book, you might have studied
every other data structure that's out there. But if knowing how to write a SQL
query is what is valuable in the economy right now, guys doing that are likely
to be paid more than you.

This isn't surprising at all. As developers we enjoy a unique position in the
market. As for the money we get, compare this with any other industry, our
peers working there get paid well less. How is is possible that people who
work at the same level of difficult as we do get paid less than us? Because-
merely knowing things and difficulty of tasks doesn't qualify for a better
compensation.

------
js2
I have a CS degree from a respectable CS department. I got straight A's in my
major and never crammed for a CS test. I am sure I could have answered these
questions in 1997. Today, I can answer the first two questions. I think I can
get partial credit on the third. I believe I knew the fourth once. I don't
even remember what bipartite means [see edit below]. And that's with having
implemented a topological sort within the last few years.

I would like to believe that if I came across a problem where knowing this
material would help solve the problem, some part of my brain would activate
and put me in the position of "Oh, I know I don't know this anymore" and I'd
go look it up. Vs "not knowing what I don't know" and just bumbling along
ignorantly.

I guess this is a long-winded way of saying I'm glad to have a CS degree and I
think it puts me a leg up over software developers who haven't studied CS, but
I'm still not sure how applicable these questions are to most developers.

Edit: On most exams I've taken, the fifth question would have been asked like
this: _A bipartite graph is a graph whose vertices can be divided into two
disjoint sets U and V such that every edge connects a vertex in U to one in V;
i.e. U and V are each independent sets. Equivalently, a bipartite graph is a
graph that does not contain any odd-length cycles. Describe an algorithm that
determines if a graph is bipartite._

~~~
rhizome
Frankly, asking the question either way is what I would call prejudicial. What
is a scenario in which bipartite graphs occur, and why not ask how _that_
would be dealt with?

~~~
Locke1689
Any n-dimensional grid (sometimes called a "Manhattan space") is a bipartite
graph. You may find it useful that no odd-length cycles can exist in such a
graph.

~~~
rhizome
What is a practical scenario for odd-length cycles in a grid?

~~~
Locke1689
Ah, sorry, the odd-length cycle was supposed to be a hint for Colin's problem,
not an actual use of bipartite graphs.

I find all the uses of bipartite graphs to be in maximal matching or similar
situations.

~~~
rhizome
Still..."scenario."

~~~
Locke1689
There are many reasons why you would want a maximal matching. In my case I was
working on a distributed system prototype that needed to match block requests
my users to cache servers. This is a maximal matching from cache servers who
have these blocks available to users requesting blocks.

------
amix
While I can answer some of these questions (and could answer them all a few
years ago while attending university) this kind of knowledge is simply not
something that I use on a daily basis. I also doubt that many others
developers use it unless they have very specialized jobs.

These kind of questions should not be the only basis of the software
development final exam. In software development problem solving, communication
with others, knowledge of the tools etc. are much more important than knowing
random knowledge about algorithms. And these skills are much harder to test
than asking the worst-case run time of quicksort.

~~~
topbanana
Yes, this is CompSci, not software engineering

~~~
peteretep
Then perhaps the author shouldn't be presenting them as practical issues, and
claim that people who don't know the answers either shouldn't be programmers,
or work in some tiny niche. You read the context article he linked to, right?

~~~
mquander
Just to get the facts straight, the linked article had this to say:

 _If you can't answer the majority of the questions on these four papers, and
you're working or intend to work as a software developer, you should ask
yourself why — most likely you're either you're missing something you really
should know, or you're lucky enough to be working within a narrow area where
your deficit doesn't matter._

The reader can judge for themselves whether this is claiming that "people who
don't know the answers shouldn't be programmers." I personally find that
characterization rather rude toward the author.

~~~
clinth
> or you're lucky enough to be working within a narrow area where your deficit
> doesn't matter

This implies that non-algorithmic work is a narrow area. In my experience,
it's the _vast_ majority.

~~~
hkarthik
I agree, non-algorithmic work is the vast majority of programming work today
because most of it is related to Application Programming.

The reason for this is that most Application Programming emphasizes the use
and sufficient knowledge of a framework, rather than a full understanding of
CS fundamentals.

While those with CS have a head start in understanding frameworks quickly and
utilizing them effectively, self-taught programmers can catch up rather
quickly if the framework is well written and geared towards them. And fast
majority of frameworks trend towards this.

Application programmers generally need to know the ins and outs of a framework
and this knowledge often carries them further than the fundamentals.

All that being said, after 10 years of application programming, I'm slowly
trying to re-learn much of my CS material out of my own personal interest and
get off the framework treadmill. It's been a fun process to rediscover
fundamentals after learning them over 10 years ago.

But let's be realistic about it, most app programmers have no interest in CS
fundamentals or theory unless it has some direct context to their daily work.

~~~
rhizome
Just as an anecdote from a self-taught, I few years ago I started in on GoF
and got about 100 pages in before I came to the conclusion that a lot of what
I'd read so far and was attempting to incorporate into my knowledgebase is
already a part of the languages I'm currently using.

I know there's a use for these details, I want to learn them, but I suspect
that many times the direct context is built-in, which is pretty much what you
say about learning frameworks, which may be possible to consider as dialects
of a language. In fact, by Googling I see that people have written about
language subsets in the context of Lisp, so full-circle I go.

------
modernerd
_"If you can't answer the majority of the questions on these four papers, and
you're working or intend to work as a software developer, you should ask
yourself why — most likely you're either you're missing something you really
should know, or you're lucky enough to be working within a narrow area where
your deficit doesn't matter."_ [1]

I'm a self-taught front end web developer who didn't have a traditional
computer science education. I'm looking to move into general software
development, and I can't answer any of the questions on the first paper
without cheating.

Can anyone recommend specific online courses or books for the four proposed
papers[2] to fill the gaps in my knowledge without spending three years at
Oxford?

[1]: From the original post at
[http://www.daemonology.net/blog/2012-10-08-software-
developm...](http://www.daemonology.net/blog/2012-10-08-software-development-
final-exam.html)

[2]:

    
    
        1. Algorithms and Data Structures
        2. Computer Architecture and Operating Systems
        3. Mathematics
        4. Networking and Systems

~~~
cperciva
I haven't looked to see if every question is covered by these, but here's some
standard textbooks I like:

Algorithms and Data Structures:

[http://www.amazon.ca/Introduction-Algorithms-Thomas-H-
Cormen...](http://www.amazon.ca/Introduction-Algorithms-Thomas-H-
Cormen/dp/0262033844)

[http://www.amazon.ca/Algorithms-4th-Robert-
Sedgewick/dp/0321...](http://www.amazon.ca/Algorithms-4th-Robert-
Sedgewick/dp/032157351X)

Computer Architecture and Operating Systems:

[http://www.amazon.ca/Computer-Architecture-Quantitative-
John...](http://www.amazon.ca/Computer-Architecture-Quantitative-John-
Hennessy/dp/012383872X)

[http://www.amazon.ca/Design-Implementation-FreeBSD-
Operating...](http://www.amazon.ca/Design-Implementation-FreeBSD-Operating-
System/dp/0201702452)

Mathematics:

<http://www.amazon.ca/Calculus-James-Stewart/dp/0495011606>

[http://www.amazon.ca/Introductory-Statistics-Prem-S-
Mann/dp/...](http://www.amazon.ca/Introductory-Statistics-Prem-S-
Mann/dp/0470444665)

[http://www.amazon.ca/Introduction-Mathematical-
Cryptography-...](http://www.amazon.ca/Introduction-Mathematical-Cryptography-
Jeffrey-Hoffstein/dp/0387779930)

Networking and Systems:

[http://www.amazon.ca/Computer-Networks-5th-Andrew-
Tanenbaum/...](http://www.amazon.ca/Computer-Networks-5th-Andrew-
Tanenbaum/dp/0132126958)

[http://www.amazon.ca/Database-Management-Systems-Raghu-
Ramak...](http://www.amazon.ca/Database-Management-Systems-Raghu-
Ramakrishnan/dp/0072465638)

[http://www.amazon.ca/Distributed-Systems-Principles-
Andrew-T...](http://www.amazon.ca/Distributed-Systems-Principles-Andrew-
Tanenbaum/dp/0130888931)

~~~
modernerd
Thank you. I'll get reading!

Thanks also for proposing the test and posting the papers online. It's hard to
discover what level of knowledge is assumed of software developers if you've
never worked in a large team and haven't entered the field via graduate or
postgraduate study.

~~~
cperciva
Yes, although I didn't get around to mentioning it in my introductory blog
post, "self-taught" developers are part of why I'm doing this.

------
SCdF
I'm not going to put judgement on this statement, but here is (what I'm fairly
sure is) the truth: most people developing software (we're talking, not in
silicon valley, or in the united states, but globally) need exactly none of
this.

My current day job is working for the company that maintains and manages NZ's
Company registry, as well as a dozen or so other registries over varying
subjects.

My previous day jobs were in healthcare and various genres of insurance.

I have a BSc in CS. I have used, and this is the salient bit, _absolutely
nothing of this complexity in my day jobs_. Ever. (personal projects are
another matter, but I don't get paid for those)

Briefly:

\- Big O notation has never been relevant: performance is always improved by
doing less IO, rethinking data structures or more complicated SQL queries,
throwing more metal at it and occasionally actual profiling which finds out
we're doing stupid things (not that those stupid things are ever Big O
related).

\- Quicksort, who cares? I just do sorts in SQL or run Java's .sort() command
(which does QS anyway), see above for perf concerns. I don't have to _know_
about it to use it.

\- Heapsort, who cares? Again, sort performance has never been a concern.

\- Never used graphs, the only "algorithm" I've ever had to professionally
write was a Luhn check and I probably should have used a library for that
anyway.

Again, I don't want to say whether or not this is a good or bad reality, but
the point is, the _vast_ majority of people writing code professionally are
basically writing the same app over and over again:

\- Build web page I can CRUD data with

\- Store data from that web form in a database

\- Modulo some bespoke business rules

\- Integrate with some 3rd party systems.

That's it.

~~~
tsycho
I used to think so too, and yet...

\- My current team created a project that lets you run complicated, multi-
dependency Java code in an non-blocking manner. Internally, the framework uses
topological sorting of a graph of nodes, where each node is a method with
potentially blocking code and the dependencies between the nodes are the edges
of the graph. The framework is able to set up callbacks automatically in the
right order given the above.

\- iOS6 auto-layouts internally use constraint solving on a set of linear
equations so that you can write code to set the layout of your UI elements in
terms of equations (so that you can basically just say that your button should
be at 40px left of center of its superview, and with a 100px padding from the
bottom).

\- I wrote code to improve the accuracy of models at my previous job by
employing various machine learning algorithms (thanks Andrew Ng and Coursera).

These might be isolated examples, but my point is that some of the best code I
have seen uses a lot of math and CS concepts (sometimes in clever ways).

~~~
rhizome
Forgive me from asking from ignorance, but isn't the first point begging the
question of complicated, multi-dependency code?

------
barrkel
Heap operations? Bipartite graph? Implementing these is _vanishingly_ rare.
I'd wager no more than 0.1% of graduates who've seen these things ever have to
maintain code that implements them, never mind writing them green field. And
that comes from a guy who used to maintain a compiler.

You make a convincing case for your insularity IMHO :)

Now the didactic exercise of implementing a handful of data structures and
algorithms like these from scratch has a lot of value. And knowing they exist,
so you can look them up and find a library / implement if necessary, is also
valuable. But knowing them 10+ years after college? The details are simply
irrelevant; other things are more worth knowing off the top of your head.

~~~
ColinWright
It's not about being able to implement them, it's about understanding the
implications. Yes, I can do arithmetic by reaching for a calculator, but if I
had to use a calculator for every single blesséd piece of arithmetic, I would
find it impossible to do any kind of significant algebra.

Having the basics immediately to hand, without having to look them up, is
needed to move on to the next level. Knowing how to recognize a bipartite
graph means that when there's one around you're likely to think - hmm, I
wonder if that's bipartite? recognizing the problem to be solved sometimes
requires that you have a deep and fundamental understanding of the types of
solutions that exist.

People seem constantly to say this - I can look it up on Google, why do I need
to know this? I constantly solve problems my colleagues can't (and _vice
versa_ ) because I have a deep familiarity with a range of techniques that
they can look up any time they want, but don't recognize "in the wild."

~~~
barrkel
There is a difference between knowing what a meal tastes like, and how to cook
it.

What you are arguing for is knowledge of what meals taste like (e.g. two
primary taste dimensions being space and time). I think this is really useful;
so I agree with you.

But what Colin (and I mean OP) is asking for is knowledge of how to cook two
specific meals, themselves rarely asked for.

I personally think CS education is _mostly useless_. Big-O notation and a
survey of the field (a taste test of the meals, if you will) are all you
really need bring along with you from discrete math, in my very humble
opinion.

Cooking some of the meals while learning will teach you the general skill of
following a recipe; a few more applied challenges will develop your
improvisation skills. But the more applied it gets, the less CS it is.

The field is too broad, IMHO, for deeper knowledge of only a handful of things
to be very useful. I know a lot about hash tables, for example, but that's
because they were very useful in my job, and critical for performance. It
would be a waste of time for me to know as much as I know about them now,
coming out of college, never mind 10+ years later. I wouldn't come down like a
ton of bricks on a recent graduate for not being able to name 3 different
collision resolution approaches off the top of his head, never mind someone
10+ years out; even for a job that required their use and impromptu
implementation, as mine did.

~~~
sukuriant
Have you considered that it's teaching you a way to think outside of your
normal range, so that you can comprehend and imagine more complex designs that
are easier to build and maintain?

------
plinkplonk
This is the interesting bit

"on each paper, at least four of the five questions relates directly to
material I have needed to know during my time working on Tarsnap, "

Most software jobs are far more distant from CS that Colin's. And more power
to him for working fulltime on something that leverages his knowledge. (spoken
as someone who spent a decade doing glue-random-apis-together-and-bill -by-
the-hour enterprise software scut work)

~~~
pjmlp
Yet, at least in the European countries I've lived in, you won't get through
HR without a CS degree, unless you're doing your own business.

~~~
arethuza
I'm in the UK, I know people who are _excellent_ developers who don't have
degrees on any kind and who have no problems whatsoever getting employed.

Mind you they have 10+ years experience and can point to major succesful
projects - so whether or not they have a degree or not is rather irrelevant.

~~~
pjmlp
In Portugal, Switzerland and Germany I never knew about the HR department
accepting any candidate without CS degree or related field (e.g Electronics),
in the companies I worked for.

~~~
icebraining
I'm employed in Portugal without a (completed) CS degree, and I had more than
one company to choose from. From what I could tell, the interviews were much
more important than any lines on my CV.

~~~
pjmlp
I'm actually Portuguese, the only people I know that managed to do that, were
guys and girls from my degree doing something on the side during the .com days
for some startups, back in the 90's.

Never saw that in the big companies, but it's been several years that I don't
work there.

~~~
icebraining
Well, the only "big" company I interviewed for was Sybase, but they were
interested (although, I think they mentioned that my salary would be affected
by that).

I chose a smaller software company, though. I much rather earn less but have a
less enterprise-y work environment.

------
andyjohnson0
These questions have very little to do with software development as practiced
by 99% of developers. They relate to computer science, a branch of mathematics
with different concerns.

I've been developing software professionally for 23 years. I have a BSc in
Computer Science [1] and an MSc in a computing-related field. And I have never
needed to be able to answer questions like these to do my job. I have a vague
understanding of big-O notation, and equally vague knowledge of the
characteristics of various algorithms like heapsort, but thats all. If I need
a sort algorithm then the framework or standard library provides a selection,
all written by smarter people than me.

If you are developing a relational database or Google-scale distributed
platform then, yes, you probably need to know this stuff. But most developers
don't. I have to understand how to capture customer requirements, fit them
into systems that already exist, estimate and implement them quickly and
reliably, deal with the practical details of IIS or an RDBMS, maintain code,
etc. I wonder if the author needs to get out more.

[1] Badly named, I learned little real comp sci.

------
jemfinch
I'm seeing a lot of apologists for mediocrity here. They seem to be drawn out
by articles like this one as well as articles about interviewing.

Despite the sentiment here, Colin is right: this is knowledge that developers
ought to have. Not because they need it to do their day job, but because they
_can't avoid learning these topics_ if they truly love the field.

Sure, you can earn a pretty decent paycheck hacking web apps and "enterprise"
software without ever learning any of these things, and you may think, "This
stuff is useless for most everyone." A friend of mine replied to that
sentiment on Twitter: "If you manage a Taco Bell, I'm sure you feel a lot of
the stuff in an MBA is useless for managers too." You can successfully bring
in a paycheck programming without knowledge like this--just like you can
successfully manage a Taco Bell without having an MBA--but you should not
claim to excel in your field in either case.

Call me a snob if you will, but I don't want to work with mediocre programmers
whose knowledge takes them only as far as the nearest web framework any more
than I want to do business with the manager of my local Taco Bell. I want
people with passion for our field, and those people _can't avoid_ gaining the
kind of knowledge that Colin's test asks about.

~~~
columbo
> Call me a snob if you will, but I don't want to work with mediocre
> programmers

> I want people with passion for our field, and those people can't avoid
> gaining the kind of knowledge that Colin's test asks about.

Claiming to own the definition of passion is not snobbish but conceited.

Perhaps I consider programmers mediocre if they lack design experience and
cannot show me how to setup custom guides in illustrator. "I want people with
passion for our field" I don't simply want programmers who end at the compiled
code, I want people who understand design, typography and color theory. A
programmer who cannot do these things is mediocre at best.

Or maybe it is linux that is the bar. It doesn't make any sense for a
programmer to use windows, only real programmers use linux. They should be
able to install/configure openssh, tomcat, nginx, redis, mysql-server. They
need to know bash inside and out, be able to create permissions, user groups
and have a deep and thorough understanding of the kernel. If you are a windows
programmer then I don't have time for you at my company where all we do all
day is build awesome.

Perhaps it is mathematics. Algebra, calculus, trig, geometry. Statistics,
analytics... I hope you have your TI-83 ready for this interview. As only the
most mediocre of programmers will barely be able to survive in this industry
without a MS in Applied Mathematics.

Hire who you want, base your interviews how you want you want. Just don't
assume to be on the upper bar and beneath you lies mediocrity. That's all.

~~~
jemfinch
Your post is largely a straw man and contains snark that certainly does not
raise the level of this conversation.

I'm talking about _passion for programming_. I'm not "claiming to own the
definition of passion," I'm saying that passion _for programming_ manifests in
knowing the answers to questions like Colin's. It simply does.

~~~
xentronium
> contains snark that certainly does not raise the level of this conversation

Calling people that don't agree with you mediocre or apologists of mediocrity
is just rude.

~~~
jemfinch
I did nothing of the sort.

------
minikomi
I'm self taught and I'm in two minds about this. In one way, these are
fundamental questions and, yeah, we should at least know why they're
important. However, I can't help feel the binary nature of the questions is a
bit like asking a Computer Scientist "Which flag do you set on an Android
Intent to make sure an activity only exists at the top of a history stack?"
straight off the bat... It favors those who have prepared specifically for the
question.

In any case, I guess I have a bit of reading to do..

------
kenster07
This is a less extreme way of saying that understanding Newtonian physics will
make one a superior shooter in basketball.

Understanding CS theory is a way of testing intelligence, sure. But the author
betrays his own lack of familiarity with the industry today in implying that
it is required knowledge to be an effective developer.

------
aidos
@cperciva - you're right, I can't answer those off the top of my head like I
could 15 years ago when I studied it.

I don't think that's a problem with the exam system though; I could probably
have answered them 12 years ago. It's just related to the field you end up in
and how much you need to lean on that knowledge in your day to day job.

As it stands, given a couple of hours of reading, I'd easily be able to
refresh my knowledge and answer the questions, primarily because I had a solid
formal training in it initially. Ability-wise, the me of my uni days could
_never_ do the job I'm doing now - well, not without another 15 years
experience :)

~~~
cperciva
As I wrote in my introductory post, I _have_ needed to know most of these
things in my day-to-day job. Of course, it depends on what sort of work you're
doing.

~~~
StavrosK
I don't need to know these things in my day-to-day job either. However, if I
_did_ need to know them, they'd be a Google search away, and I could find them
out in less than a minute (less than the time it takes me to get urllib2 to
send cookies, for example).

On the other hand, I imagine that someone who hasn't been formally trained in
complexity would have a harder time, because they wouldn't be aware of basic
concepts such as complexity, the O notation, etc. Therefore, asking whether
someone knows what O(n) means strikes me as a better performance test than
trivia like the difference between O(2^n) and O(3^n).

~~~
cperciva
_asking whether someone knows what O(n) means strikes me as a better
performance test than trivia like the difference between O(2^n) and O(3^n)_

So far that's turning out to be one of the most useful questions, actually...
but I wouldn't call it trivia.

~~~
StavrosK
Yeah, bad example there. The asymptotic running time of the heap operations,
or quicksort runtimes, for example, though, were more like trivia, even though
you can basically get them if you know the algorithms and reason about them a
bit.

I know the running time of quicksort offhand, but I'd still consult wikipedia
on the off chance I'm wrong.

------
jh3
I have a CS degree and I am employed as an Application Developer.

To me these questions are daunting. Am I the only one here who feels like they
have an inferior CS degree? Reading the comments in HN threads usually makes
me feel a bit dumb, but this post in particular makes me feel like a lot of
money has been thrown at a degree that did not teach me a lot about Computer
Science.

------
ww520
These questions test the students' familiarity of the basic CS topics taught
during the previous year. It's a test on the subject matters of the CS
courses. If you don't use some of these stuff over the years, you will forget
about them. I forgot the properties of the bipartite graph and have to look
them up.

This reminds me of those trigonometry proofs we learnt in high school. I
remember I could go through those complicate trig proofs with ease back then.
Now beside some basic sine/consine stuff due to work, I completely forgot
about the rest of them since there wasn't a chance to use any of them.

Basic course level knowledge are good to learn. Just the test needs to be
relevant and in context.

------
samspot
A lot of these questions look great if you just finished school, but not so
great 10 years later. There are an almost unlimited number of things I can
focus on that will actually improve my performance on the job, and none of
them are memorizing the workings of B-trees just in case I need to use one
someday and the internet is down.

Computer Science degrees are about laying the foundations so that you can
understand things like B-trees when you need them.

------
quonn
First, since I took this class just a year ago, I can answer all of these
questions. Will I be able to do so in two or three years? Probably not. Some
things like heapsort already start to fade away in my mind. But if, even in 20
years, Ijust open Cormen et al. and take a brief look I will remember.

If you have passed the class in university and now feel bad that you can't
answer these questions, don't worry. It does not matter. If you want to work
at Google, work through Cormen again before the interview.

Feel free to pick your favourite field like Compilers or Machine Learning and
put a basic question to the author of the quiz, when you meet him in person.
If he can't answer, pat yourself on the back and feel free to consider
yourself a better software developer.

------
henrik_w
I love clever algorithms, and really enjoyed algorithm and data structure
classes at university. However, despite having worked for over 20 years as a
programmer in several software-intense companies (big and small) in the
telecommunications and VoIP field, I have almost never had a need to know the
answers to these questions.

In fact, it was the second biggest surprise for me when starting out as a
programmer after university. I wrote this up in "Top 5 Surprises When Starting
Out as a Software Developer"
[http://henrikwarne.com/2012/08/22/top-5-surprises-when-
start...](http://henrikwarne.com/2012/08/22/top-5-surprises-when-starting-out-
as-a-software-developer/)

~~~
opinali
Did you ever need to write an algorithm to process some large volume of data,
and you had to make choices related to performance, e.g. to avoid a
combinatorial explosion in some searching problem which trivial implementation
would be a full cartesian product of several dimensions of data (i.e. a bunch
of deeply nested loops)?

Speaking of cartesian product, did you ever wrote a huge database query that
you had to optimize through smart usage of join methods, indexing structure,
caching (eg materialized views), decisions like subqueries versus joins versus
procedural code etc.?

Did you ever had to pick a library collection -- not implement your own
hashmap or tree, but just select the ideal implementation -- to allow optimal
searches in a given dataset?

If answer='Y' to any of these, then you needed that knowledge, and I guess you
do in a regular basis. The thing about CS theory is that it structures and
formalizes these problems so they can be precisely analyzed; used as basis for
further inventions and improvements; reliably predict the result of some
design choice, etc. You have some empirical knowledge of things like
computational complexity, but having _formal_ knowledge would often allow to
replace hunches with objective certainty, or arrive to the optimal answer to
some problems much quicker and more reliably.

At least that's the theory. ;-)

~~~
henrik_w
I absolutely agree that knowing basic CS is a good thing, and it will make you
a better developer. Also, we do internalize a lot of the concepts so they
become second nature. That being said, it surprised me how much code there is
where this doesn't matter.

As for the specific questions: It's definitely good to know big O notation, so
maybe I was a bit quick to dismiss the first question. However, a more
realistic/useful comparison would be between O(n^2) and O(n log n).

For picking a collection to use – definitely knowing when to use a hash map
versus a list. Asymptotic run-time behavior of Quicksort – no. Binary search
tree – that's the one example I mentioned in my blog post, so yes, useful.
Graphs – never needed in the applications I've worked with. Same for using a
heap, never seen/needed.

~~~
opinali
Graphs are an interesting case in this debate. I agree that you very rarely
run into real-world problems that look like a traveling salesman or bipartite
test etc. But the thing about graphs is that they are the foundation of ALL
data structures, and lots of algorithms. Lists, hashing, trees, arrays and
more -- just special cases of graphs. If you can describe anything at all with
a boxes-and-arrows diagram, it's a graph problem ;-) so if you know at least
the core concepts of graphs, including some fundamental techniques (e.g.
advantages of each representation such as adjacency matrix vs. node list vs.
edge list), then you have the tools to work through any data structure
problem; and any algorithm that's focused on searching or manipulation of
specific data structures. And this can take you very, very far.

Once again, of course you can learn most of these foundational concepts more
empirically, or in a more "bottom-up" way by starting with the standard data
structures and algorithms that you use in practice, but gradually deducing
general concepts that unify them. But I wonder if this can ever result in the
same deep insight that some study of graph theory can give.

------
quilby
These questions are way too basic- the first 4 can be answered directly from
the definition of each term, and there is no thinking involved. The 5th
question can be answered with knowledge from 1 lesson in graph theory.

~~~
cperciva
I'm glad I'm not the only person who thinks these questions should be easy.
Alas, evidence so far is that they aren't.

~~~
nollidge
They're only easy if you know those definitions, which makes them trivia
questions.

------
hobbyist
I wanted to ask this question regarding big O notation. When we say f(n) =
O(g(n)) all we mean to say is that f(n) <= c(g(n)) with other constraints. My
question is, why do we have an equal to sign, why is f(n) equal to O(g(n))..
they could have made up a new symbol to establish such a relationship... The
reason why I think so is because I see equal to as a transitive relationship..
So, if a = b and c = b, then a = c .. This clearly doesnt hold when we use
f(n) = O(g(n)) .. n^2 = O(n^2) and n = O(n^2), so n = n^2 which is not making
any sense..

~~~
stonemetal
_why is f(n) equal to O(g(n)).. they could have made up a new symbol to
establish such a relationship..._

It is an equality relationship why would they need a new symbol?

 _f(n) = O(g(n)) .. n^2 = O(n^2) and n = O(n^2), so n = n^2 which is not
making any sense_

You are right it doesn't make any sense. Try n^2 =
O(Selection_sort(n)),O(Bubble_sort(n)) = O(Selection_sort(n)), n^2 =
O(Bubble_sort(n)). Nice and transitive the way it should be.

~~~
crntaylor
It's not an equality relationship. If it was an equality relationship it would
be transitive. But as your parent points out, you have n=O(n^2), n^2=O(n^2)
but not n=n^2, so transitivity doesn't hold.

Another way to see that the relationship can't be equality: the left hand side
is a function, the right hand side is a set of functions. Two things from
different classes can't be equal.

The relationship is simply set membership.

~~~
stonemetal
Ah thanks, I misunderstood which part was being called in to question.

------
DanWaterworth
I'm self taught and working as a developer. I found these questions almost
trivial, I'm genuinely surprised so many developers are having problems with
them.

~~~
buro9
Whilst I agree with you on being self-taught and knowing those answers I will
say that I finally went and got an MSc in Computer Science after 12 years
working professionally as a developer and being self-taught.

Part of the reason was to find out what I didn't know.

What I didn't know was basically some parts of set theory and quite a chunk of
automata theory.

I did find cperciva's questions to be trivial, but I wouldn't say that makes
them trivial. A TV quiz once said, "It's easy when you know the answer.".

Algorithms for someone else might be what automata theory was to me.

~~~
DanWaterworth
I agree, you'll only know the answer if you've taken the time to learn about
the specific concepts being tested.

I suppose what I'm surprised about is that a large number of developers
haven't taken the time to learn (or have since forgotten) the theory around
these particular concepts.

~~~
panda_person
Life isn't a perpetual data structures class.

------
jacques_chester
Colin -- it will probably shock you to learn that I, ostensibly wielding an
honours degree in computer science -- failed algorithms.

(Remember our talk about impostor syndrome? _In my case it's actually true_.)

------
hermannj314
I have no educational training in computer science, but I work as a software
developer. I could not answer any of those questions, literally 0 out of 5.

1\. Is O(2^n) equal to O(3^n)? Why? I have absolutely no idea what that means.

But I did have to answer this question: why does the application crash?
"Oh...that's because the developer that was pontificating yesterday about how
heap operations behave asymptotically forgot to check if the database
connection was open before he called the Save method. Apparently, such arcane
trivia is beneath him. I can fix it."

~~~
radagaisus
from all the questions this is the one you MUST know if you want me (or
anyone) to trust you with a piece of code. It takes one hour (in wikipedia!)
to learn everything you'll need for a day-to-day work complexity assessment
with the Big O notation.

They asked us this questions on our high school final exams, I'm sure you'll
manage.

~~~
hermannj314
> O(2^n) equal to O(3^n)

Yep. So I looked into it and it seems they are not equal. Every f(n) in the
set O(2^n) belongs to O(3^n). While this means that O(2^n) is a subset of
O(3^n), we can see from the trival case f(n)=3^n that O(3^n) is not a subset
O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k
such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore
the two sets are not equal.

Great. I still don't see how I am more qualified to do absolutely anything in
life.

~~~
radagaisus
That's a great textbook summary, now for applications:

You got an array of your friends, an array of people who upvoted this post and
an array of people who replied to this post. Filter all your friends who
upvoted and replied to this post.

Now, before you write the code, how fast/slow will it run? for 1,000 friends?
1,000,000? will the runtime grow extremely fast? why?

Here's a Redis command that intersects two keys -
<http://redis.io/commands/sinter> \- the complexity is "O(N*M) worst case
where N is the cardinality of the smallest set and M is the number of sets." -
do you understand why? do you understand when it will be fast and when will it
be slow?

~~~
hermannj314
I'll give this a shot.

The code I would write would take longer the more friends you have. Basically,
it would grow linearly with the size of all the sets and the number of
intersections found. Something like O(n * M) in the worst case. But I don't
get why later you say that n is the size of the smallest set.

So I don't see how you can beat that. I guess you could sort the lists, but
that take O(n log n) over M lists, and then lookups would take O(log n).

To me it will be fastest when the algorithm terminates the quickest - the
smallest set contains nothing, the 2nd smallest set contains no intersecting
keys, anything like that and will be the longest in the worst case (each set
is a proper subset of the next) and progressively worse as the subsequent sets
get larger.

As for that algorithm running in O(N * M), I am not sure I understood that at
first, but I think I do now. If you hold all other set sizes constant, and can
only vary the input of the smallest array or the number of sets and you will
notice the running time follow O(N * M), but if you increase the size of the
largest set by k, then the algorithm will take k times as long to run (in the
worst case), but if the running time function f(N,M) belongs to O(N * M) then
k * f(N,M) belongs to O(N * M) as well, so the O notation still represents the
running time complexity even if you increase the size of the largest set.

So the running time will still increase with the largest set, but the
O-notation of the algorithm will still belong to the same representation.

~~~
DannyBee
With extra memory, and if you don't care about ordering, multiple set
intersection can be made O(N) where N = size of each set added together,
assuming no duplicates in any single set (which should be true, but it depends
on how loosely you are using the word set) You only have to walk each element
exactly once, and all other operations are (amortized) O(1)

For each element in all sets: Insert element into hash table. If it does not
exist, use key = element, value = 1. If it exists, increment counter. This is
O(1)

When done over all sets, it's O(N)

Walk hash table, the only elements that are in the intersection are those with
value = number of sets. This is O(N)

O(N) + O(N) = O(N). This only works if there are no duplicates within a single
set. It transforms set intersection into a counting problem over the universe
of elements.

You can optimize it as well. You never need to deal with elements that don't
exist in the hash table after the first set is processed, as they can never be
in the intersection. This does not change the asymptotic complexity, except it
means the hashtable will never resize after the first set. So it's
advantageous to choose the smallest set to process first if you can do so
easily. If you use dynamic perfect hashing or something similar, you can also
guarantee all operations after the first set will be O(1), rather than
amortized O(1).

~~~
xentronium
That ("how would you implement ruby array intersection operator & and what
computational complexity does your implementation have") was actually one of
my interview warmup questions. In ruby core it's done exactly like you
described, using a hash table, but I didn't know that until after the
interview. I proposed a slightly worse solution, with sorting both inputs and
then walking them simultaneously, which was O(n*log(n)). Still got the job,
though :)

As a side note, I think that tasks like "implement a data structure with
following operations and calculate computational complexity for each
operation" are much better for exam than trivia questions like "Name the heap
operations used by heapsort and the asymptotic running time of each.", with
all due respect to @cperciva.

------
sushantsharma
I see a common argument in the comments here. Many of us are saying this is
"Computer Science" and not "Software Development". In my opinion, knowledge of
the answers to these questions will make you a better software developer.
Ofcourse, without knowing any of these, you can still develop software that
will work. But if you know answers to these questions (and many other similar
computer science questions), you can make smart decisions on how to implement
the software you are developing in a more efficient manner.

------
yaz
I knew the answers to all of these questions two years ago when I had just
graduated, but I've completely forgotten what heapsort is (despite having used
it a few times in programming contests) and cannot offhand remember what
"bipartite" means (googled, saw the wikipedia image and instantly remembered
maxflow and Ford–Fulkerson although I guess a DFS is good enough to determine
bipartite-ness).

Why is the ability to retain this information at the top of my mind at all
times necessary for me to be a Software Developer?

------
staunch
If it was a simple form that sent an email you'd probably get 100x more
results. People are lazy.

~~~
cperciva
Given the rate at which I'm getting emails, I don't want 100x more results!

~~~
praptak
How many have you got already?

~~~
cperciva
16 in the first 1.5 hours.

~~~
sesqu
Given that you're only getting dozens of replies, why are you asking for
institution? The only useful pattern I can think of would be something like
correlating against university ranking, but that doesn't sound too useful.

~~~
cperciva
I was thinking I'd start with a "places most of us have heard of" (Oxford,
Cambridge, Harvard, MIT, Stanford, etc.) vs. "everywhere else" comparison. I'm
at somewhere over 100 replies at this point, so I imagine I'll have enough
data for that sort of comparison to be meaningful.

------
Zikes
Carl Sagan said "If you wish to make an apple pie from scratch, you must first
invent the universe."

I'm a web developer. I use tools, languages, and frameworks built on many
Computer Science concepts that I don't fully understand, but I would sorely
hate to have to reinvent those wheels every time I need to create a new
application, or to even need to understand their most intricate inner workings
despite the fact that they have been abstracted away for my convenience.

Perhaps this makes me "mediocre" in the eyes of some, but I get my work done
and have never had any of my managers or customers tell me that what I have
delivered is "mediocre" in their eyes.

Several times I've come up against an issue that requires me to push the
limits of my skills and knowledge, and I do not shy away from learning, but it
would do me little good to learn many of the things that are taught in a
typical CS course. I consider myself passionate about my work, and I strive to
learn new things every day, but I don't consider myself to be at a
disadvantage, nor do I find my lack of knowledge in those areas to be a
deficiency.

I am proud of my accomplishments, and of the course I have taken in my career.
Perhaps my skills would be of little use at Tarsnap, but I have no shortage of
work at any number of other companies.

------
panda_person
I remember reading a thread someplace (I think here or at reddit) that
criticized the traditional software developer interview process. One person
made the suggestion that the reason why a lot of companies ask more...bookish
knowledge that seems like its from college homework/final exam, is that they
don't want to admit that what they do is relatively mundane, and doesn't
really require the daily application of the kind of thing you find on your
data structures or operating systems final exam. So, they add complexity to
their interview questions, as well as their development process, to
compensate.

Does anyone know if traditional, "real" engineering fields, (civil, chemical,
mechanical, etc) ask detailed, technical kinds of questions during interviews,
like software engineering is known for? I find it hard to believe civil
engineers get bombarded with all day grill fests by other civil engineers
about technical questions in interviews, but I really don't know.

------
doesnt_know
This is so depressing. I just finished a degree in "Information Systems" with
the hopes of being a developer full-time and I can't answer any of these
questions.

I guess I should be mad at the institution, but that wouldn't be productive.
Where do I start now to start learning what I actually need to know? It seems
like there is so much I don't know.

~~~
artmageddon
This is a very good book: [http://www.amazon.com/Introduction-Algorithms-
Third-Edition-...](http://www.amazon.com/Introduction-Algorithms-Third-
Edition-ebook/dp/B007CNRCAO/)

I will say though, it's really intense on theory, and it requires a good
background in math. A lot of people will recommend this book though. Also,
it's physically heavy :)

~~~
ericras
That book is considered the 'Bible' of this area but I wouldn't recommend it
to anyone who is looking to get started. It's like reading the dictionary.

------
baak
I feel like this test would be better if it were reasonably timed, and you
were allowed to look up the answers. That might be a little bit closer to what
I actually do (especially when I don't know what something is). Calling me
lucky to get by without knowing offhand the definition of a B-Tree, is kind of
rude and also very wrong.

------
emperorcezar
This is a problem in our industry. These are not "Software Development"
questions. These are "Computer Science" questions.

------
andrewf
But you should Google _afterwards_ so you can feel like a right ponce for
confusing Big-Theta and Big-O.

------
stripe
Too bad the author does not care to ask for the more important bits of
developing software but devulges into questions a web crawler could answer.
Creating elitisim (by offensively downgrading any developer who cannot answers
these questions or simply has to look up an answer) does benefit no one.
Developing software is so much more than details of an algorithm, of an
operating system or a network topology. It is about needs, about requirements,
users, customers, support, processes, technology and the list goes on and on.
Simply put, he is not asking questions for software developers but for
programmers.

~~~
MisterBastahrd
That isn't an exercise in elitism. It is an extremely easy test to anyone who
is actually trained in computer science.

If you can't answer the questions, then you should try to fill the gaps in
your knowledge rather than whining that it's elitist.

------
superasn
A little off topic but this domain hosts a hacker news daily updated archive
located here: <http://www.daemonology.net/hn-daily/>

Thanks cperciva for hosting it!

~~~
Erwin
I've settled for <http://hckrnews.com/> as the HN filtering site -- you can
switch between Top 10 / Top 20 and for extreme procrastination, "Top 50%".
With a Chrome extensions you can also track # of comments since you last time
read them, for the stories with interesting discussions.

------
opinali
The idea is good, but the questions not so much. First off, the number of
questions is too small for a subject so large as algorithms and data
structures; I'd rather have something in the 25-40 range. But that of course,
for objective questions which answer is a one-line sentence max (not requiring
writing code or calculation).

And this approach is the second problem. A few questions of this kind are
good; for one, I like Q.1 (trivial if you know the subject, puzzling if you
don't as the proposition is just nonsensical as a mathematical problem). But
some other questions are tests of memorization, sometimes for not too relevant
information. For one thing, you ask to "name" heapsort operations. I have long
forgot those names, BUT I know the fundamental ideas of heaps and heapsort,
and writing the full heapsort (even in paper) is a simple exercise of
coding... but I wouldn't still remember those names.

Also, I would spend significant time writing this code (and also deducing the
complexity if I don't know it by heart), compared to somebody who just crammed
through textbooks. (And yes, it _is_ possible to memorize this kind of
information in a few days of cramming; if those final exams have schedules
known well in advance and they are spread at least 3-4 days from each other,
this pretty much cancels the advantages you mention.)

You seem to be dismissive of "practical programming ability"; I agree that
code as answer to written questions has issues -- coding problems are better
in a whiteboard test, interacting with the examiner. Or with careful
restrictions of language etc. But it seems wrong to have questions that
penalize (due to time restrictions) people who can deduce the answer with some
coding or calculation, and reward people who just memorized the subject... for
one thing, this memory won't last forever, unless you become a professor to
keep "practicing the theory" on a daily basis. But more important, the ability
to deduce the answer is very revealing of somebody's knowledge of the really
fundamental concepts. Show me a CS graduate who knows by memory the bipartite
graph algorithm, and this _may_ be just some guy that has study discipline and
good memory but is clueless about the core ideas of graphs and will have
forgotten everything after a couple years of graduation. Now take somebody who
can sketch the code for bipartite testing, even when given a problem that does
not use the word "bipartite" or even the word "graph" -- so the student needs
to identify the proposed problem as a specific graph problem, and _then_
deduce the algorithm in order to solve the graph problem -- and that's
somebody I would hire, either for a software engineer position or for a
professor position.

------
EliRivers
Software engineering and/or programming is a vast, vast field. Anything you
can express logically, someone somewhere is expressing in software. This "or
you're lucky enough to be working within a narrow area where your deficit
doesn't matter" should read "or you're lucky enough to be working outside this
narrow area so your deficit doesn't matter".

~~~
cperciva
Note: What you're seeing today is just 1/4 of the test. The other three parts
cover different areas.

------
tomku
Question for the author: are these things that YOU knew before working on the
projects that required them?

~~~
icebraining
Pretty sure a PhD in Computer Science, like Dr. Colin Percival (the author)
has to known that stuff. Even a lowly undergraduate like I was had to learn it
by the second year.

~~~
randomdata
I was writing software professionally in high school. Second year undergrad
seems a little late in life to be learning this stuff, if it is critically
important.

------
GlennS
Why do people who like to ask O(N) questions have this obsession with sorting
algorithms?

~~~
cperciva
Because in the 1960s, sorting records was what computers spent most of their
time doing.

At this point, it's mostly a "because that's how we've always introduced
algorithms and complexity" thing.

~~~
sukuriant
It's still what computers often spend a lot of time doing. What do you think
your database engine is doing most of the time? Those indexed columns aren't
just there to smile warmly upon your data.

------
ataggart
I wonder about the utility of an exam that one can ace with five minutes of
googling.

~~~
alexkus
I use these kind of questions in interviews. You either know the answers or
you don't. There's no looking at your 'phone in an interview (at least not one
with me being the interviewer).

~~~
ataggart
Fair enough. I take a different approach in my interviews: try to gauge how
useful the person will be to the team. We have internet access, and a library
with CLRS if someone needs to implement a bipartite graph test. It matters
more to me that they'd know why they'd need one.

~~~
alexkus
We use those kinds of approaches too. I never said the interview consists
solely of closed book tests, they're just one thing I find useful to help
gauge the suitability of an interviewee.

[EDIT] The thing I'm most looking out for is how a candidate answers when they
don't know the right answer.

------
jongraehl
I had to omit proofs to complete in 15 min.

This will obviously not be a representative sample of working programmers (or
even readers of this post). People who know they don't know are less likely to
respond.

------
tomrod
I think I could do the first and the fourth (economist by day, coder by
night). Answers to the others elude me: good tutorials and relative importance
to coding would be greatly appreciated!

------
njharman
That looks like a "Computer Science" final exam. Software dev != CS.

~~~
cperciva
It's a "Computer Science software developers should know" exam.

------
DannoHung
I have a question: These are pretty rudimentary questions and if you have a
background in the literature, you could brush up in about an hour or two on
most of this stuff (aside from memorizing the catalogue of data structures and
algorithms you might find in a text book [which makes me wonder why he's
focusing on _these_ questions instead of an understanding of induction, space
analysis for an arbitrary data structure, or general Big-O/Big-Omega
analysis]), but why don't we actually come across uses for that sort of basic
stuff all that often?

I mean, I don't know if what I do is all that specialist, but I spend a lot
more time working out far less interesting things about how domain knowledge
works than what sort of Data Structure or Algorithm is best for storing it and
using it.

