Hacker News new | past | comments | ask | show | jobs | submit login
The software development final exam: Algorithms and Data Structures (daemonology.net)
221 points by cperciva on Oct 8, 2012 | hide | past | favorite | 198 comments

If you can't answer the majority of the questions...you're lucky enough to be working within a narrow area where your deficit doesn't matter.

Where by "narrow" one presumably means "narrow in theoretical scope, but extremely large in terms of number of people, number of customers, number of paid hours spent working on things, and amount of impact on the world". Most of the web was built by people who don't know what B-trees are.

How do these people succeed as much as they do? Because the folks who designed PostgreSQL do understand what B-trees are, and they have been very successful at their goal: To encapsulate that knowledge in an abstracted system that other people can learn to drive without entirely understanding how it works.

This is not a problem with the web developers, nor is it a problem with this exam, which at first glance looks reasonable – people with CS degrees should be able to answer these questions. It is, perhaps, a problem with the concept of "software developer": It's being asked to cover too big a range. If the automotive industry were like computing, we'd use the phrase "auto mechanic" to refer to an amateur with a Haynes guide, a Midas Muffler employee, a member of a NASCAR pit crew, a Formula One driver, a mechanical engineer at Tesla Motors, an organic chemist working at a tire company, and Kiichiro Toyoda.

Most of the web was built by people who don't know what B-trees are.

By number of sites, perhaps. By revenue, I don't think so. The big names that dominated what the web is to most people certainly do know this stuff, at least insofar as the practical implications, and they try not to hire people that don't for development roles. That isn't to say that they don't also just use Postgres (or Berkeley DB, or whatever) instead of writing their own.

If the automotive industry were like computing, we'd use the phrase "auto mechanic" to refer to...

I totally agree with this - the field could use a few more job titles (and a little less grade inflation - SENIOR software engineer on 2 years experience? Really?) Software Fitter would be a much better description for the skilled assembly role that a lot of modern software boils down to. CRUD Technician is going to to need some PR work though!

>>By number of sites, perhaps. By revenue, I don't think so.

>>Software Fitter would be a much better description for the skilled assembly role that a lot of modern software boils down to. CRUD Technician is going to to need some PR work though!

I don't know why intelligent people with a little knowledge automatically think they are going to be rich. This causes a lot of heart burn and pain to people in later parts of their lives when they figure out things don't work that way in the real world.

Regardless of whatever you may think about them. You may even think them undeserving of all the money they earn, you may think they are stupid. The fact is abstraction enables people to work with a lot of things without knowing much about them in detail. In fact abstractions are invented for that very purpose. So that a million people shouldn't waste their time, energy and effort to relearn things, when all their resources can be better utilized to know just the abstract details and build things on top of it. This enables people to build things quickly on what is already available.

Revenues, money et al are dictated by what the world considers valuable at any time. You might know 1000 algorithms from the book, you might have studied every other data structure that's out there. But if knowing how to write a SQL query is what is valuable in the economy right now, guys doing that are likely to be paid more than you.

This isn't surprising at all. As developers we enjoy a unique position in the market. As for the money we get, compare this with any other industry, our peers working there get paid well less. How is is possible that people who work at the same level of difficult as we do get paid less than us? Because- merely knowing things and difficulty of tasks doesn't qualify for a better compensation.

I have to agree that this question set sits in between "useful CS topics" and "domain knowledge." Although a lot of people are building off of software libraries that use some heavy-duty CS knowledge, they need only be aware of the first-order implications - that there are different types of trees and sorts, that hashes, lists, and trees can expose similar interfaces while having different properties, that algorithm running time can be described with big-O notation. The fine differences are easy enough to research, and a responsible library author will document the expected runtime characteristics of their code.

On the other hand, something I'd expect to see on the final exam - and is already clearly absent based on the outline - is compiler design. If most of the world is doing web development and doing a lot of "slopping around of data between formats", parsing becomes important. Type systems become important. Code generation _may_ be important. And here, again, they don't have to be in-depth questions about the differences between LALR and packrat parsers, or describing the eight sides of the lambda cube. Simply demonstrating awareness of doing something other than mashing out a bad regex for every parsing problem would be an advance over the status quo.

I really wish the OP had linked to the explanation for the exams: http://www.daemonology.net/blog/2012-10-08-software-developm...

The author believes this test is very basic, and to follow your auto mechanic metaphor it would be like asking anyone from a Midas Muffler employee to Kiichiro Toyoda the difference between disc and drum brakes.

(Not that I agree with the author - to me, "basic knowledge" would be "What is big O notation?")

I really wish the OP had linked to the explanation for the exams

I did...

Oh, funny. I just noticed your name and realized you're the author. Sorry about that.

Your explanation is linked in the top of the test, but based on many of the comments here I don't think many people read it. It helped me put the test into perspective.

I have a CS degree from a respectable CS department. I got straight A's in my major and never crammed for a CS test. I am sure I could have answered these questions in 1997. Today, I can answer the first two questions. I think I can get partial credit on the third. I believe I knew the fourth once. I don't even remember what bipartite means [see edit below]. And that's with having implemented a topological sort within the last few years.

I would like to believe that if I came across a problem where knowing this material would help solve the problem, some part of my brain would activate and put me in the position of "Oh, I know I don't know this anymore" and I'd go look it up. Vs "not knowing what I don't know" and just bumbling along ignorantly.

I guess this is a long-winded way of saying I'm glad to have a CS degree and I think it puts me a leg up over software developers who haven't studied CS, but I'm still not sure how applicable these questions are to most developers.

Edit: On most exams I've taken, the fifth question would have been asked like this: A bipartite graph is a graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to one in V; i.e. U and V are each independent sets. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles. Describe an algorithm that determines if a graph is bipartite.

Frankly, asking the question either way is what I would call prejudicial. What is a scenario in which bipartite graphs occur, and why not ask how that would be dealt with?

Any n-dimensional grid (sometimes called a "Manhattan space") is a bipartite graph. You may find it useful that no odd-length cycles can exist in such a graph.

What is a practical scenario for odd-length cycles in a grid?

Ah, sorry, the odd-length cycle was supposed to be a hint for Colin's problem, not an actual use of bipartite graphs.

I find all the uses of bipartite graphs to be in maximal matching or similar situations.


There are many reasons why you would want a maximal matching. In my case I was working on a distributed system prototype that needed to match block requests my users to cache servers. This is a maximal matching from cache servers who have these blocks available to users requesting blocks.

While I can answer some of these questions (and could answer them all a few years ago while attending university) this kind of knowledge is simply not something that I use on a daily basis. I also doubt that many others developers use it unless they have very specialized jobs.

These kind of questions should not be the only basis of the software development final exam. In software development problem solving, communication with others, knowledge of the tools etc. are much more important than knowing random knowledge about algorithms. And these skills are much harder to test than asking the worst-case run time of quicksort.

I don't know these answers off the top of my head, but if I needed to know them in the real world, I could easily find them out in a couple of seconds or minutes with Google and Wikipedia.

The difference between me and someone who is unfamiliar with computer science (but is still good with Google) is that I can read the answers and their corresponding articles and understand what they mean, as opposed to the other person, who would just be able to recite the answers as trivia.

Actually a few years into the industry, I would doubt any body who claims to know these things straight out of this heart(Unless that is his day job).

Seriously do people remember all this math all the years they are building software? Unless you spend a great deal of time revisiting these concepts every other month, there is no way this is all going to be in your head. And if you actually are spending that amount of time learning this kind of math, I would wonder what kind of a programmer you are. A programmer is supposed to build things in this time, gaining mountains of factual knowledge which doesn't have much value in the real world is not very good use of time.

By the way this algo/DS love looks very similar to tool religion. People focus too much on tools used to solve problem, while they should actually be focused on problems.

Those questions strike me as very basic knowledge. Why would I need to revise constantly to keep that in mind?

And yes, you are right, you can get away without knowing that stuff, and still do awesome work. And you can know all those things, and never get anything done.

They're the sort of thing I use on a daily^H^H^H^H^Hfrequent basis. The questions are not asking trivia questions or looking for knowledge, they're asking questions whose answers can be figured out.

How do you "figure out" what a B-tree is? That's a trivia question.

Right, you need to know certain things. But the point of the question is not that you know what a B-tree is a have memorized a fact about the btree. The point is the followup question, which requires you to understand the consequences of using different kinds of trees. That's not the sort of thing you memorize, it's the sort of thing you internalize. If you can't answer the second question, you probably wouldn't know enough about trees to be able to casually think about certain problems you might come across in "systems programming".

My school didn't teach B-trees, though they did teach 2-4 trees. I've never really been in a situation where I've needed to use a B-tree; and, I was an accomplished competition programmer in college.

That said, I looked up on the internet what they were and it was pretty easy to follow.

It would be very cool if there was a programming contest problem that required efficient disk access.

There are some problems that require efficient reading, which is from the disk; and some that require efficient writing --- though that's more of just "buffer your outputs and don't use complex print methods, like System.out.printf"

Yes, this is CompSci, not software engineering

Then perhaps the author shouldn't be presenting them as practical issues, and claim that people who don't know the answers either shouldn't be programmers, or work in some tiny niche. You read the context article he linked to, right?

Just to get the facts straight, the linked article had this to say:

If you can't answer the majority of the questions on these four papers, and you're working or intend to work as a software developer, you should ask yourself why — most likely you're either you're missing something you really should know, or you're lucky enough to be working within a narrow area where your deficit doesn't matter.

The reader can judge for themselves whether this is claiming that "people who don't know the answers shouldn't be programmers." I personally find that characterization rather rude toward the author.

> or you're lucky enough to be working within a narrow area where your deficit doesn't matter

This implies that non-algorithmic work is a narrow area. In my experience, it's the vast majority.

I agree, non-algorithmic work is the vast majority of programming work today because most of it is related to Application Programming.

The reason for this is that most Application Programming emphasizes the use and sufficient knowledge of a framework, rather than a full understanding of CS fundamentals.

While those with CS have a head start in understanding frameworks quickly and utilizing them effectively, self-taught programmers can catch up rather quickly if the framework is well written and geared towards them. And fast majority of frameworks trend towards this.

Application programmers generally need to know the ins and outs of a framework and this knowledge often carries them further than the fundamentals.

All that being said, after 10 years of application programming, I'm slowly trying to re-learn much of my CS material out of my own personal interest and get off the framework treadmill. It's been a fun process to rediscover fundamentals after learning them over 10 years ago.

But let's be realistic about it, most app programmers have no interest in CS fundamentals or theory unless it has some direct context to their daily work.

Just as an anecdote from a self-taught, I few years ago I started in on GoF and got about 100 pages in before I came to the conclusion that a lot of what I'd read so far and was attempting to incorporate into my knowledgebase is already a part of the languages I'm currently using.

I know there's a use for these details, I want to learn them, but I suspect that many times the direct context is built-in, which is pretty much what you say about learning frameworks, which may be possible to consider as dialects of a language. In fact, by Googling I see that people have written about language subsets in the context of Lisp, so full-circle I go.

You've only seen 1/4 of the exam so far. Sure, there's plenty of software development which doesn't involve fancy algorithms or data structures... avoiding computer architecture, operating systems, mathematics, networking, databases, and distributed systems as well is a bit harder.

I see that you're the author. Thanks for replying!

> Sure, there's plenty of software development which doesn't involve fancy algorithms or data structures... avoiding computer architecture, operating systems, mathematics, networking, databases, and distributed systems as well is a bit harder.

This is at odds with the statement in your article that jumped out at me:

"If you can't answer the majority of the questions on these four papers, and you're working or intend to work as a software developer, you should ask yourself why--"

I disagree, and furthermore, adding more topics means that your assertion is going to be less accurate.

Of course, it's better to know more, and I'm glad to have a computer science degree. However, I think that conflating having a computer science background and general software development is confirmation bias. I don't have data to back me up, but I believe that it's more common to simply not need to know what a bipartite graph is, or even a majority set of these topics. Given the theoretical grounding of the question in your first block, I'm guessing that parts 2-4 will be similarly structured.

Computer science is one slice of software development. Software development is a huge, deep field. A software development exam could just as easily contain an exam including the following topics: communicating effectively through email, distilling requirements from customers, writing meaningful commit messages, testing theory, or hundreds of other non-theoretical topics.

What I'm trying to say is that your exam is a good reminder of computer science at the root of many of the fields of knowledge in software, and is fun, but it's not representative of software development. In a given year, most software developers are going to spend more time puzzling through garden-variety implementation bugs than thinking about any two of the questions you list. Which is why they're not "lucky". :)

Yeah, but they are really important to know for interviews. Which of course may or may not have anything to do with the actual life of a developer, but that's life.

whenever prospective employer has this type of CS trivia questions I lose interest to work in that environment regardless if I know or don't know answers. It shows lack of maturity and experience to understand what are actually characteristics to look for to determine good developer. This type of questions might be useful for non experienced candidates but it is fairly counterproductive for experienced candidates

Last year I interviewed a BigCorp which was all about these interviews. The first guy clearly told me that the project was sort of stuck and they needed to sprint to make it happen.

The interview was 99% math. The people who asked me question looked all smart alecs and their questions all seemed to indicate that they wanted to show me how intelligent they are, and how stupid I am.

By the end of the interview it was clear to me why they were stuck. Those people had absolutely no interest in building things. It was all about individual One-upmanship, and showing their little math and puzzle tricks they would have gathered by combined reading on the internet over the years.

I guess it depends if you interpret:

> you should ask yourself why

As referring to "you can't answer the majority of the questions" or "you're working or intend to work as a software developer". I read it as the latter, but I guess it could also be read as the former.

The intention was "why don't you know these things", in the sense of "I really think you should go out and learn / re-learn these areas, because I think it's very likely that knowing them better will help you produce better code".

It depends on the country.

My degree in Portugal was a 5 year degree with lectures from computer science and software engineering.

"If you can't answer the majority of the questions on these four papers, and you're working or intend to work as a software developer, you should ask yourself why — most likely you're either you're missing something you really should know, or you're lucky enough to be working within a narrow area where your deficit doesn't matter."[1]

I'm a self-taught front end web developer who didn't have a traditional computer science education. I'm looking to move into general software development, and I can't answer any of the questions on the first paper without cheating.

Can anyone recommend specific online courses or books for the four proposed papers[2] to fill the gaps in my knowledge without spending three years at Oxford?

[1]: From the original post at http://www.daemonology.net/blog/2012-10-08-software-developm...


    1. Algorithms and Data Structures
    2. Computer Architecture and Operating Systems
    3. Mathematics
    4. Networking and Systems

I believe that

  Computer Systems: A Programmer's Perspective

is superior to the widely mandated

  Computer Architecture: A Quantitative Approach
for software engineers and programmers. The former has less hardcoded numbers than the latter and more timeless principles.

For what it's worth, Rosen's introductory text on discrete mathematics is pretty good as well. (This sort of math crops up all the time in formal CS, for those of you who might otherwise wonder wtf discrete has to do with anything.) http://www.amazon.com/s/ref=nb_sb_ss_i_0_13?url=search-alias... (The study guide that is available to go with it is worth the money, imho.)

The Graham, Knuth, Patashnik book on concrete mathematics (a bit of a play on words for continuous and discrete) is a quality work as well but written to a considerably higher level that may or may not be pleasant to contend with depending on the reader's background. http://www.amazon.com/Concrete-Mathematics-Foundation-Comput...

Thank you. I'll get reading!

Thanks also for proposing the test and posting the papers online. It's hard to discover what level of knowledge is assumed of software developers if you've never worked in a large team and haven't entered the field via graduate or postgraduate study.

Yes, although I didn't get around to mentioning it in my introductory blog post, "self-taught" developers are part of why I'm doing this.

> I haven't looked to see if every question is covered by these, but here's some standard textbooks I like:

The good thing is @cpervica has suggested standard textbooks, as opposed to suggesting obtuse texts unreachable to a beginner.


I think a beginner might be confused by Sedgewick's terse coding style.

Computer Architecture and Operating Systems:

I haven't read either of the books. I would add http://bcs.wiley.com/he-bcs/Books?action=index&itemId=04... to the list. Like most of the OS books, you won't get a full picture of the OS from this book, but you will know the topics which will help you understand a real OS. Also most of the non-recent book won't cover flat memory model http://en.wikipedia.org/wiki/Flat_memory_model That is not to say knowing about segmentation is going to hurt you.


I haven't read any of the books. From TOC, the cryptography and the statistics book looks good. I would recommend skipping cryptography maths for starters. If you are a beginner, an application level understanding of cryptography is what you need.

Networking and Systems:

Tanenbaums books loves history and theory. It's a nice book nevertheless. Once you are done with it, you should read Stevens' book on TCP/IP http://www.kohala.com/start/tcpipiv1.html

If you find time, you should read the second volume as well: http://www.kohala.com/start/tcpipiv2.html

For databases, I read this book:


This is not a very interesting book, but it does cover the fundamentals very well. You will need a bit of will power to read it.

For algo/datastructures, CLRS (http://www.amazon.com/Introduction-Algorithms-Thomas-H-Corme...) is the gold standard.

For OS's, Tanenbaum (http://www.amazon.com/Modern-Operating-Systems-Andrew-Tanenb...) is popular.

'Math' is broad - if I can recommend only one book to cover all of Math I'd probably say 'The Road to Reality' (http://www.amazon.com/The-Road-Reality-Complete-Universe/dp/...). More practically (for the subset of math most programmers are likely to care about), you'll do fine with one good discrete math book and one linear algebra book. Throw in one each on Stats, Abstract Algebra, Calc (up to ~diffeq), and Real Analysis (in roughly that order) if you're a bit more ambitious ;-)

I just wanted to add that I felt Tanenbaum's Modern Operating Systems was a tad light on some of the core topics (especially when trying to get a good grip on the issues behind concurrency). I personally recommend the book by Silberschatz: http://www.amazon.com/Operating-System-Concepts-Abraham-Silb...

I also think for understanding how operating systems work, nothing beats writing your own! I learned most of the concepts by building a toy OS during the better part of my undergraduate studies. I highly recommend this for people who like coding and are afraid of jumping into the theory too quickly. For example, analyzing memory allocation algorithms is never as interesting as when you have to pick one for your own kernel!

Coursera has several algorithm classes. I took Design and Analysis of Algorithms I (https://www.coursera.org/course/algo) from Stanford last spring. I thought it was really good, reviewed it here: http://henrikwarne.com/2012/05/08/coursera-algorithms-course.... There is also another algorithm course from Princeton on Coursera.

Class central (http://www.class-central.com/) is a good resource for seeing what is about to start.

I'm also on the same boat and taking Udacity courses. I have noticed, however, as Front-End development progress and gets more complicated, that general CS knowledge will be of great use.

I took the CS101 last month. The only new thing I learned was Hashes. However, I think it was well worth it. Hashing is probably one of the most important concepts in CS. Also the course was entertaining. It helped me also review old code I wrote. I figured two cases where I can use hash functions to speed up code.

I'm taking the Software Testing, and Design of Computer Programs this month. If you are free (no job/freelancing/errands...), you can do each course in 7-10 days. Granted, you'll be fully committed to them. That is, you can* do these 3 CS years in just 3 months.

* depends on your mental abilities; but I think as a self-taught Front-End, you'll tackle it.

You can check the courses taught on coursera:

  - Algorithms part I
  - Algorithms, design and analysis. Part I
  - Computer Architecture
  - Calculus : single variable
  - pre-calculus
  - introduction to logic
  - introduction to computer networks

Highly recommend this free online course given by Robert Sedgewick from Princeton.


He covers all the questions in this, except the last one regarding graphs.

How do I access this course?

The website says: "Enrollment is Closed"

Try this link: https://www.coursera.org/course/algs4partI, if that doesn't work, just log in to coursera, then find Courses, then look for Sedgwick.

damn I didn't know am so lucky for last 35 years or maybe life is different than exams ?

I'm not going to put judgement on this statement, but here is (what I'm fairly sure is) the truth: most people developing software (we're talking, not in silicon valley, or in the united states, but globally) need exactly none of this.

My current day job is working for the company that maintains and manages NZ's Company registry, as well as a dozen or so other registries over varying subjects.

My previous day jobs were in healthcare and various genres of insurance.

I have a BSc in CS. I have used, and this is the salient bit, absolutely nothing of this complexity in my day jobs. Ever. (personal projects are another matter, but I don't get paid for those)


- Big O notation has never been relevant: performance is always improved by doing less IO, rethinking data structures or more complicated SQL queries, throwing more metal at it and occasionally actual profiling which finds out we're doing stupid things (not that those stupid things are ever Big O related).

- Quicksort, who cares? I just do sorts in SQL or run Java's .sort() command (which does QS anyway), see above for perf concerns. I don't have to know about it to use it.

- Heapsort, who cares? Again, sort performance has never been a concern.

- Never used graphs, the only "algorithm" I've ever had to professionally write was a Luhn check and I probably should have used a library for that anyway.

Again, I don't want to say whether or not this is a good or bad reality, but the point is, the vast majority of people writing code professionally are basically writing the same app over and over again:

- Build web page I can CRUD data with

- Store data from that web form in a database

- Modulo some bespoke business rules

- Integrate with some 3rd party systems.

That's it.

You're correct that most programmers don't have to write sort functions by hand but there are certainly a few issues.

Big O notation , or at least a Big O way of thinking is definitely useful when doing things like optimising queries and data structures etc.

For example I've solved performance issues in the past with DBs with poor indexing strategies (or no index at all). When I read the DB manual and it talks about different types of indexes and says "doing X will cause Y or doing Z will cause A". Knowing something about Big O notation and related things means that I can immediately intuitively understand what they key differences between Y and A would be.

Thinking about performance in terms of growth also helps me guess which parts of the application require the most optimisation. For example which parts have a roughly fixed n and which parts n will grow with time.

Abstractions leak, especially when you are dealing with parallel processing which will become increasingly important as time goes on. How you store and think about your data is going to have huge implications for just about everyone in software in the coming years when having to scale sort operations etc over hundreds of discreet CPU cores becomes the norm.

I think a degree in CS is not intended to teach you "what you will need to be a successful software developer today" but more "here are the tools to think about whatever the problems may be for software developers in 10 years time".

I agree that maybe there is going to be a gap between what is required for a CRUD implementer vs what is required for someone who designs DBMS systems for oracle and that employers want people qualified in the latter to perform the former.

OTOH I feel that this is something that the free market will correct over time and is already beginning to do so. I know a few programmers who are productively employed who are not well versed in CS theory but are still payed well because they can bring other skills to the table.

I used to think so too, and yet...

- My current team created a project that lets you run complicated, multi-dependency Java code in an non-blocking manner. Internally, the framework uses topological sorting of a graph of nodes, where each node is a method with potentially blocking code and the dependencies between the nodes are the edges of the graph. The framework is able to set up callbacks automatically in the right order given the above.

- iOS6 auto-layouts internally use constraint solving on a set of linear equations so that you can write code to set the layout of your UI elements in terms of equations (so that you can basically just say that your button should be at 40px left of center of its superview, and with a 100px padding from the bottom).

- I wrote code to improve the accuracy of models at my previous job by employing various machine learning algorithms (thanks Andrew Ng and Coursera).

These might be isolated examples, but my point is that some of the best code I have seen uses a lot of math and CS concepts (sometimes in clever ways).

Forgive me from asking from ignorance, but isn't the first point begging the question of complicated, multi-dependency code?

The tricky thing about problems best solved by applying CS concepts is that they're never worded that way. By way of analogy, when I was in junior high or early high school (before I'd taken calculus, at any rate), I participated in a math contest. After the test I remarked, "That test didn't really have any calculus in it!" Only later, when I actually learned calculus, did I realize the calculus problems were never, "Apply the derivative and integral to this set of functions," but "These two quantities and their slopes are related to a third quantity by these rules. Write an equation for the third quantity."

Graph theory has also proven useful. I created a simple topological sort to ensure that objects in my dataflow-based visual programming system targeted at home automation are run in the correct order.

Heap operations? Bipartite graph? Implementing these is vanishingly rare. I'd wager no more than 0.1% of graduates who've seen these things ever have to maintain code that implements them, never mind writing them green field. And that comes from a guy who used to maintain a compiler.

You make a convincing case for your insularity IMHO :)

Now the didactic exercise of implementing a handful of data structures and algorithms like these from scratch has a lot of value. And knowing they exist, so you can look them up and find a library / implement if necessary, is also valuable. But knowing them 10+ years after college? The details are simply irrelevant; other things are more worth knowing off the top of your head.

It's not about being able to implement them, it's about understanding the implications. Yes, I can do arithmetic by reaching for a calculator, but if I had to use a calculator for every single blesséd piece of arithmetic, I would find it impossible to do any kind of significant algebra.

Having the basics immediately to hand, without having to look them up, is needed to move on to the next level. Knowing how to recognize a bipartite graph means that when there's one around you're likely to think - hmm, I wonder if that's bipartite? recognizing the problem to be solved sometimes requires that you have a deep and fundamental understanding of the types of solutions that exist.

People seem constantly to say this - I can look it up on Google, why do I need to know this? I constantly solve problems my colleagues can't (and vice versa) because I have a deep familiarity with a range of techniques that they can look up any time they want, but don't recognize "in the wild."

There is a difference between knowing what a meal tastes like, and how to cook it.

What you are arguing for is knowledge of what meals taste like (e.g. two primary taste dimensions being space and time). I think this is really useful; so I agree with you.

But what Colin (and I mean OP) is asking for is knowledge of how to cook two specific meals, themselves rarely asked for.

I personally think CS education is mostly useless. Big-O notation and a survey of the field (a taste test of the meals, if you will) are all you really need bring along with you from discrete math, in my very humble opinion.

Cooking some of the meals while learning will teach you the general skill of following a recipe; a few more applied challenges will develop your improvisation skills. But the more applied it gets, the less CS it is.

The field is too broad, IMHO, for deeper knowledge of only a handful of things to be very useful. I know a lot about hash tables, for example, but that's because they were very useful in my job, and critical for performance. It would be a waste of time for me to know as much as I know about them now, coming out of college, never mind 10+ years later. I wouldn't come down like a ton of bricks on a recent graduate for not being able to name 3 different collision resolution approaches off the top of his head, never mind someone 10+ years out; even for a job that required their use and impromptu implementation, as mine did.

Have you considered that it's teaching you a way to think outside of your normal range, so that you can comprehend and imagine more complex designs that are easier to build and maintain?

This is the interesting bit

"on each paper, at least four of the five questions relates directly to material I have needed to know during my time working on Tarsnap, "

Most software jobs are far more distant from CS that Colin's. And more power to him for working fulltime on something that leverages his knowledge. (spoken as someone who spent a decade doing glue-random-apis-together-and-bill -by-the-hour enterprise software scut work)

Yet, at least in the European countries I've lived in, you won't get through HR without a CS degree, unless you're doing your own business.

I'm in the UK, I know people who are excellent developers who don't have degrees on any kind and who have no problems whatsoever getting employed.

Mind you they have 10+ years experience and can point to major succesful projects - so whether or not they have a degree or not is rather irrelevant.

I am too, a CS student at university. I've got a Github account with around 10 projects, just side-projects when I had some time.

Even with that, and fairly high grades, it was still a slight challenge to get a placement. Said placement is just to give me an edge over fresh graduates, I'll have a year in industrial experience over them.

In Portugal, Switzerland and Germany I never knew about the HR department accepting any candidate without CS degree or related field (e.g Electronics), in the companies I worked for.

Personally I'd regard any company where the HR department gets to filter candidates on silly criteria like that as somewhere I'd never want to work.

I'm employed in Portugal without a (completed) CS degree, and I had more than one company to choose from. From what I could tell, the interviews were much more important than any lines on my CV.

I'm actually Portuguese, the only people I know that managed to do that, were guys and girls from my degree doing something on the side during the .com days for some startups, back in the 90's.

Never saw that in the big companies, but it's been several years that I don't work there.

Well, the only "big" company I interviewed for was Sybase, but they were interested (although, I think they mentioned that my salary would be affected by that).

I chose a smaller software company, though. I much rather earn less but have a less enterprise-y work environment.

That's patent nonsense in Belgium as well, true some companies do care. But I've often seen the opposite, they don't care at all what degree you have. They care if you can pass _their_ test. Because obviously they're better at testing people than colleges are.

I think you've accurately identified why you could have those gaps. Not that you do have them, but hourly billing enterprise software "scut work" doesn't require a whole lot of CS. I wonder if the stuff on the other side of those APIs did?

'spent' == past tense. (and yes some of us have been doing this for a while ;) )

Working on large scale machine learning projects now. Couldn't be happier. And yes, I knew the answers to the questions without having to look them up. (though I suspect I'll get the Networking questions wrong.)

These questions have very little to do with software development as practiced by 99% of developers. They relate to computer science, a branch of mathematics with different concerns.

I've been developing software professionally for 23 years. I have a BSc in Computer Science [1] and an MSc in a computing-related field. And I have never needed to be able to answer questions like these to do my job. I have a vague understanding of big-O notation, and equally vague knowledge of the characteristics of various algorithms like heapsort, but thats all. If I need a sort algorithm then the framework or standard library provides a selection, all written by smarter people than me.

If you are developing a relational database or Google-scale distributed platform then, yes, you probably need to know this stuff. But most developers don't. I have to understand how to capture customer requirements, fit them into systems that already exist, estimate and implement them quickly and reliably, deal with the practical details of IIS or an RDBMS, maintain code, etc. I wonder if the author needs to get out more.

[1] Badly named, I learned little real comp sci.

I'm seeing a lot of apologists for mediocrity here. They seem to be drawn out by articles like this one as well as articles about interviewing.

Despite the sentiment here, Colin is right: this is knowledge that developers ought to have. Not because they need it to do their day job, but because they can't avoid learning these topics if they truly love the field.

Sure, you can earn a pretty decent paycheck hacking web apps and "enterprise" software without ever learning any of these things, and you may think, "This stuff is useless for most everyone." A friend of mine replied to that sentiment on Twitter: "If you manage a Taco Bell, I'm sure you feel a lot of the stuff in an MBA is useless for managers too." You can successfully bring in a paycheck programming without knowledge like this--just like you can successfully manage a Taco Bell without having an MBA--but you should not claim to excel in your field in either case.

Call me a snob if you will, but I don't want to work with mediocre programmers whose knowledge takes them only as far as the nearest web framework any more than I want to do business with the manager of my local Taco Bell. I want people with passion for our field, and those people can't avoid gaining the kind of knowledge that Colin's test asks about.

> Call me a snob if you will, but I don't want to work with mediocre programmers

> I want people with passion for our field, and those people can't avoid gaining the kind of knowledge that Colin's test asks about.

Claiming to own the definition of passion is not snobbish but conceited.

Perhaps I consider programmers mediocre if they lack design experience and cannot show me how to setup custom guides in illustrator. "I want people with passion for our field" I don't simply want programmers who end at the compiled code, I want people who understand design, typography and color theory. A programmer who cannot do these things is mediocre at best.

Or maybe it is linux that is the bar. It doesn't make any sense for a programmer to use windows, only real programmers use linux. They should be able to install/configure openssh, tomcat, nginx, redis, mysql-server. They need to know bash inside and out, be able to create permissions, user groups and have a deep and thorough understanding of the kernel. If you are a windows programmer then I don't have time for you at my company where all we do all day is build awesome.

Perhaps it is mathematics. Algebra, calculus, trig, geometry. Statistics, analytics... I hope you have your TI-83 ready for this interview. As only the most mediocre of programmers will barely be able to survive in this industry without a MS in Applied Mathematics.

Hire who you want, base your interviews how you want you want. Just don't assume to be on the upper bar and beneath you lies mediocrity. That's all.

Your post is largely a straw man and contains snark that certainly does not raise the level of this conversation.

I'm talking about passion for programming. I'm not "claiming to own the definition of passion," I'm saying that passion for programming manifests in knowing the answers to questions like Colin's. It simply does.

> contains snark that certainly does not raise the level of this conversation

Calling people that don't agree with you mediocre or apologists of mediocrity is just rude.

I did nothing of the sort.

I agree with you.

However, there is a difference between having been taught something, knowing about it, and having the information available in instant recall. For example, the B-tree question. three possible thought processes for this question (forgive me for my all-male cast):

Alan thinks, "Hmm. Well, they're both trees, but I don't really remember that difference. Maybe a BST is optimized for binary search, hence the name? This is a waste of time, I'll google it. Oh, of course, now I remember."

Bob thinks, "What!? What are trees!? Like outside? Oh no. Oh no oh no oh no. I remember what binary search is but how does it apply to a tree? Oh geez :("

Carl thinks, "A B-tree is a generalization of a BST, nodes in a B-tree can have more children than a BST"

Alan is probably a typical software developer with experience who might or might not have a degree but has been exposed to concepts. Bob is a college sophomore or someone who was never exposed to algorithms and doesn't have the background to even grapple with the question. If he googles for the answers, he'll be able to write them down but won't be able to relate them to other concepts or incorporate them into his mental model.

Carl just wrote a B-tree. Maybe Carl is a college senior, maybe a senior computer scientist at IBM.

in my opinion, you want people like both Alan and Carl. There are probably a lot more Alans than Carls. Hiring either is probably fine. There are probably a lot of Alans reading the OP, they probably won't be too upset about not knowing these off the top of their head because they know that they can refresh their memory on demand. Or maybe this is just my apology for my own mediocrity, because I had very few Carl level responses to the OP!

the problem I recognize is differentiating Alan from Bob. It's harder. If you only take the Carl responses to these questions, you'll draw only from a pool of smart people. If you start to allow Alan-level responses to be acceptable, you could also get a Bob. That would be bad.

As a mediocre programmer, I probably could have answered the questions at some point in time, but since they are not applicable to my day-to-day life, much of the information has faded away. This has prompted me to refresh my memory, but it is too late for this quiz, as per the requirements.

Assuming you've been in the business for a long time, and not just fresh CS graduate or something, what have you done to stay intimately familiar with the concepts that you do not utilize in any regular capacity?

I would say that a good developer must work 19 hours a day, have at least 20 projects on github and must be able to burn through tough deadlines every day.

I would say any developer worth his salt ought to have(as you would say it) all this skills.

I'm self taught and I'm in two minds about this. In one way, these are fundamental questions and, yeah, we should at least know why they're important. However, I can't help feel the binary nature of the questions is a bit like asking a Computer Scientist "Which flag do you set on an Android Intent to make sure an activity only exists at the top of a history stack?" straight off the bat... It favors those who have prepared specifically for the question.

In any case, I guess I have a bit of reading to do..

This is a less extreme way of saying that understanding Newtonian physics will make one a superior shooter in basketball.

Understanding CS theory is a way of testing intelligence, sure. But the author betrays his own lack of familiarity with the industry today in implying that it is required knowledge to be an effective developer.

@cperciva - you're right, I can't answer those off the top of my head like I could 15 years ago when I studied it.

I don't think that's a problem with the exam system though; I could probably have answered them 12 years ago. It's just related to the field you end up in and how much you need to lean on that knowledge in your day to day job.

As it stands, given a couple of hours of reading, I'd easily be able to refresh my knowledge and answer the questions, primarily because I had a solid formal training in it initially. Ability-wise, the me of my uni days could never do the job I'm doing now - well, not without another 15 years experience :)

As I wrote in my introductory post, I have needed to know most of these things in my day-to-day job. Of course, it depends on what sort of work you're doing.

I don't need to know these things in my day-to-day job either. However, if I did need to know them, they'd be a Google search away, and I could find them out in less than a minute (less than the time it takes me to get urllib2 to send cookies, for example).

On the other hand, I imagine that someone who hasn't been formally trained in complexity would have a harder time, because they wouldn't be aware of basic concepts such as complexity, the O notation, etc. Therefore, asking whether someone knows what O(n) means strikes me as a better performance test than trivia like the difference between O(2^n) and O(3^n).

The difference between O(2^n) and O(3^n) isn't trivia; if you can answer the question, then you have an understanding of what big-O means. Asking someone to recite the definition they learned of O(n) is easy; asking someone to think critically about the definition of big-O and how 2^n and 3^n relate shows a deeper understanding. I haven't memorized any trivia relating to these functions, but I convinced myself within a few seconds of the answer from the definition of big-O.

asking whether someone knows what O(n) means strikes me as a better performance test than trivia like the difference between O(2^n) and O(3^n)

So far that's turning out to be one of the most useful questions, actually... but I wouldn't call it trivia.

Yeah, bad example there. The asymptotic running time of the heap operations, or quicksort runtimes, for example, though, were more like trivia, even though you can basically get them if you know the algorithms and reason about them a bit.

I know the running time of quicksort offhand, but I'd still consult wikipedia on the off chance I'm wrong.

> trivia like the difference between O(2^n) and O(3^n).

ignorance regarding basic complexity differences; this is the problem.

An interesting side project of this would be describing how the difference between O(2^n) and O(3^n) or knowing what the heap operations are have been needed in your job.

And, since I'm asking already, I'd also like to know why you needed to know the information "on-the-fly" and couldn't have gotten there almost as easily with a quick refresher.

For example, I don't remember all the heap operations by heart but, if I needed to select a data structure for a critical task, I would go fetch Cormen and Skiena (the books, not the people) and get a quick refresher.

I have a CS degree and I am employed as an Application Developer.

To me these questions are daunting. Am I the only one here who feels like they have an inferior CS degree? Reading the comments in HN threads usually makes me feel a bit dumb, but this post in particular makes me feel like a lot of money has been thrown at a degree that did not teach me a lot about Computer Science.

These questions test the students' familiarity of the basic CS topics taught during the previous year. It's a test on the subject matters of the CS courses. If you don't use some of these stuff over the years, you will forget about them. I forgot the properties of the bipartite graph and have to look them up.

This reminds me of those trigonometry proofs we learnt in high school. I remember I could go through those complicate trig proofs with ease back then. Now beside some basic sine/consine stuff due to work, I completely forgot about the rest of them since there wasn't a chance to use any of them.

Basic course level knowledge are good to learn. Just the test needs to be relevant and in context.

A lot of these questions look great if you just finished school, but not so great 10 years later. There are an almost unlimited number of things I can focus on that will actually improve my performance on the job, and none of them are memorizing the workings of B-trees just in case I need to use one someday and the internet is down.

Computer Science degrees are about laying the foundations so that you can understand things like B-trees when you need them.

First, since I took this class just a year ago, I can answer all of these questions. Will I be able to do so in two or three years? Probably not. Some things like heapsort already start to fade away in my mind. But if, even in 20 years, Ijust open Cormen et al. and take a brief look I will remember.

If you have passed the class in university and now feel bad that you can't answer these questions, don't worry. It does not matter. If you want to work at Google, work through Cormen again before the interview.

Feel free to pick your favourite field like Compilers or Machine Learning and put a basic question to the author of the quiz, when you meet him in person. If he can't answer, pat yourself on the back and feel free to consider yourself a better software developer.

I love clever algorithms, and really enjoyed algorithm and data structure classes at university. However, despite having worked for over 20 years as a programmer in several software-intense companies (big and small) in the telecommunications and VoIP field, I have almost never had a need to know the answers to these questions.

In fact, it was the second biggest surprise for me when starting out as a programmer after university. I wrote this up in "Top 5 Surprises When Starting Out as a Software Developer" http://henrikwarne.com/2012/08/22/top-5-surprises-when-start...

Did you ever need to write an algorithm to process some large volume of data, and you had to make choices related to performance, e.g. to avoid a combinatorial explosion in some searching problem which trivial implementation would be a full cartesian product of several dimensions of data (i.e. a bunch of deeply nested loops)?

Speaking of cartesian product, did you ever wrote a huge database query that you had to optimize through smart usage of join methods, indexing structure, caching (eg materialized views), decisions like subqueries versus joins versus procedural code etc.?

Did you ever had to pick a library collection -- not implement your own hashmap or tree, but just select the ideal implementation -- to allow optimal searches in a given dataset?

If answer='Y' to any of these, then you needed that knowledge, and I guess you do in a regular basis. The thing about CS theory is that it structures and formalizes these problems so they can be precisely analyzed; used as basis for further inventions and improvements; reliably predict the result of some design choice, etc. You have some empirical knowledge of things like computational complexity, but having formal knowledge would often allow to replace hunches with objective certainty, or arrive to the optimal answer to some problems much quicker and more reliably.

At least that's the theory. ;-)

I absolutely agree that knowing basic CS is a good thing, and it will make you a better developer. Also, we do internalize a lot of the concepts so they become second nature. That being said, it surprised me how much code there is where this doesn't matter.

As for the specific questions: It's definitely good to know big O notation, so maybe I was a bit quick to dismiss the first question. However, a more realistic/useful comparison would be between O(n^2) and O(n log n).

For picking a collection to use – definitely knowing when to use a hash map versus a list. Asymptotic run-time behavior of Quicksort – no. Binary search tree – that's the one example I mentioned in my blog post, so yes, useful. Graphs – never needed in the applications I've worked with. Same for using a heap, never seen/needed.

Graphs are an interesting case in this debate. I agree that you very rarely run into real-world problems that look like a traveling salesman or bipartite test etc. But the thing about graphs is that they are the foundation of ALL data structures, and lots of algorithms. Lists, hashing, trees, arrays and more -- just special cases of graphs. If you can describe anything at all with a boxes-and-arrows diagram, it's a graph problem ;-) so if you know at least the core concepts of graphs, including some fundamental techniques (e.g. advantages of each representation such as adjacency matrix vs. node list vs. edge list), then you have the tools to work through any data structure problem; and any algorithm that's focused on searching or manipulation of specific data structures. And this can take you very, very far.

Once again, of course you can learn most of these foundational concepts more empirically, or in a more "bottom-up" way by starting with the standard data structures and algorithms that you use in practice, but gradually deducing general concepts that unify them. But I wonder if this can ever result in the same deep insight that some study of graph theory can give.

These questions are way too basic- the first 4 can be answered directly from the definition of each term, and there is no thinking involved. The 5th question can be answered with knowledge from 1 lesson in graph theory.

I'm glad I'm not the only person who thinks these questions should be easy. Alas, evidence so far is that they aren't.

They're only easy if you know those definitions, which makes them trivia questions.

The first two questions I knew the answer immediately because I've read about asymptotics and runtime analysis.

Questions 3 and 4 I simply don't have the background (I've never heard of a B-tree, and I've heard of heap sort but I don't know the algorithm).

Question 5 I'd not seen before, but an algorithm was quite obvious because I've taken a course on graph theory (of course, my answer could easily be wrong).

My tentative position is that these questions are easy if you've covered the relevant material before, and very hard to impossible if not.

Disclaimer 1: my degree was mathematics, not computer science, and I don't work as a software developer (though I do write code every day).

Disclaimer 2: one of the questions I "knew" the answer to, I got wrong!

So, my tentative positions is that these questions are easy if you've covered the relevant material before, and very hard to impossible if not.

There's a lot of people who are very confidently giving wrong answers to questions, which weighs against that position.

That's interesting. I would expect you not to get many submissions from people who would expect themselves to do poorly on your exam. But, if it turns out that a sizable portion of people who expect to do well do not, that's indeed interesting.

Given the selection bias we would expect among respondents, have you thought about how you will interpret the responses?

which is exactly the point why this type of question is useless I will always reference other sources (google,book,peer) if I am not sure and I am not sure for anything that I am not using frequently Real test is how many people will get it wrong using all resources usually available in working environment

I guess I'll wait for my grading before saying any more :)

I wanted to ask this question regarding big O notation. When we say f(n) = O(g(n)) all we mean to say is that f(n) <= c(g(n)) with other constraints. My question is, why do we have an equal to sign, why is f(n) equal to O(g(n)).. they could have made up a new symbol to establish such a relationship... The reason why I think so is because I see equal to as a transitive relationship.. So, if a = b and c = b, then a = c .. This clearly doesnt hold when we use f(n) = O(g(n)) .. n^2 = O(n^2) and n = O(n^2), so n = n^2 which is not making any sense..

Writing f(n) = O(g(n)) is actually bad notation perpetuated by lazy instructors. O(g(n)) is actually a set of functions, and the correct notation is f(n) ∊ O(g(n)).

(If the unicode breaks, that's the is-a-member-of symbol.)

I too was slightly unhappy with an equals sign being used like that, Wikipedia suggests that I was being a bit over-literal and it is considered acceptable:


why is f(n) equal to O(g(n)).. they could have made up a new symbol to establish such a relationship...

It is an equality relationship why would they need a new symbol?

f(n) = O(g(n)) .. n^2 = O(n^2) and n = O(n^2), so n = n^2 which is not making any sense

You are right it doesn't make any sense. Try n^2 = O(Selection_sort(n)),O(Bubble_sort(n)) = O(Selection_sort(n)), n^2 = O(Bubble_sort(n)). Nice and transitive the way it should be.

It's not an equality relationship. If it was an equality relationship it would be transitive. But as your parent points out, you have n=O(n^2), n^2=O(n^2) but not n=n^2, so transitivity doesn't hold.

Another way to see that the relationship can't be equality: the left hand side is a function, the right hand side is a set of functions. Two things from different classes can't be equal.

The relationship is simply set membership.

Ah thanks, I misunderstood which part was being called in to question.

Unfortunately this answer is incorrect and is a common misunderstanding of big-O. Selection sort is typically stated to be O(n^2), which is true, but the way that O works is that it's a bound from above. It means "the set of functions that asympotically are guaranteed to not take more time than".

Selection sort is also O(n^3) and O(n!) and O(n^2 log log log log log n). The correct operator for saying that f(x) is O(n) is the set exists operator ∊. O(n^2) is a set of functions that is a strict subset of the set of functions that are in O(n^3).

Selection sort is only big-theta(n^2), which indicates both the bounds are tight (eg for Selection sort it is true that it's asymptotic behavior is both f(n) <= an^2 + m AND f(n) >= bn^2 + n.

It's originally because someone once had to work with ASCII. But I believe the reason it has stuck is because it's often used in contexts like f(n)=2^O(n), which is difficult to fix elegantly. There are a couple of other notations, but they've mostly fallen by the wayside.

I'm self taught and working as a developer. I found these questions almost trivial, I'm genuinely surprised so many developers are having problems with them.

Whilst I agree with you on being self-taught and knowing those answers I will say that I finally went and got an MSc in Computer Science after 12 years working professionally as a developer and being self-taught.

Part of the reason was to find out what I didn't know.

What I didn't know was basically some parts of set theory and quite a chunk of automata theory.

I did find cperciva's questions to be trivial, but I wouldn't say that makes them trivial. A TV quiz once said, "It's easy when you know the answer.".

Algorithms for someone else might be what automata theory was to me.

I agree, you'll only know the answer if you've taken the time to learn about the specific concepts being tested.

I suppose what I'm surprised about is that a large number of developers haven't taken the time to learn (or have since forgotten) the theory around these particular concepts.

Life isn't a perpetual data structures class.

I aced an Advanced Algorithms course at a major public university last semester, and I don't recall learning about B-trees. I mean, I usually have pretty good retention of things I learn, and the rest of the "test" was easy enough, and I'd expect to at least think "that sounds familiar" about anything that's come up in a class, but even after I looked up B-trees, I grasped the concept fairly quickly, but it didn't seem like anything I'd learned before.

I wonder if I legitimately just forgot about B-trees, or if my algorithms courses just didn't cover them.

Colin -- it will probably shock you to learn that I, ostensibly wielding an honours degree in computer science -- failed algorithms.

(Remember our talk about impostor syndrome? In my case it's actually true.)

I have no educational training in computer science, but I work as a software developer. I could not answer any of those questions, literally 0 out of 5.

1. Is O(2^n) equal to O(3^n)? Why? I have absolutely no idea what that means.

But I did have to answer this question: why does the application crash? "Oh...that's because the developer that was pontificating yesterday about how heap operations behave asymptotically forgot to check if the database connection was open before he called the Save method. Apparently, such arcane trivia is beneath him. I can fix it."

> But I did have to answer this question: why does the application crash? "Oh...that's because the developer that was pontificating yesterday about how heap operations behave asymptotically forgot to check if the database connection was open before he called the Save method. Apparently, such arcane trivia is beneath him. I can fix it."

False dichotomy. You posit the developer who understands asymptotic notation can't figure out why does the application crash, or how to write to databases. I don't see how understanding asymptotic notation negatively affects someone's programming ability..

> I don't see how understanding asymptotic notation negatively affects someone's programming ability

Why would it make them worse?* I was just arguing it doesn't necessarily make them better. Just because you are a bad developer doesn't mean you get better when you learn about asymptotic behavior or heapsort or O-notation. Lots of really bad developers know about those things. And lots of really bad developers don't know about them as well.

Knowledge of those items is neither sufficient nor necessary to be either a competent or incompetent developer (at least for reasonable definitions of both competent and developer).

* Ok - maybe too much knowledge has some odd 2nd order negative effects on productivity: analysis paralysis, over-confidence, unwillingness to ask for help, programming toward a theoretically pure solution rather than a pragmatic solution.

from all the questions this is the one you MUST know if you want me (or anyone) to trust you with a piece of code. It takes one hour (in wikipedia!) to learn everything you'll need for a day-to-day work complexity assessment with the Big O notation.

They asked us this questions on our high school final exams, I'm sure you'll manage.

Yeah, I mean, I'm also a self-taught, workaday programmer, and I agree with you. Understanding time complexity to a degree that you can communicate about it isn't something to shrug off. Don't be proud of ignorance.

The problem is that the perspective expressed by the GP comment is a pure expression of the "don't know what you don't know" adage. Sure, you can hammer out solutions to problems, but the more exposure you've got to more concepts in your field, 1. the bigger your toolkit and 2. the quicker you can start drawing conclusions about the problem.

I'm actually pretty shocked someone's confessing to not knowing what big O notation is and proud of it.

> O(2^n) equal to O(3^n)

Yep. So I looked into it and it seems they are not equal. Every f(n) in the set O(2^n) belongs to O(3^n). While this means that O(2^n) is a subset of O(3^n), we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.

Great. I still don't see how I am more qualified to do absolutely anything in life.

> we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.

For what it's worth, this isn't really acceptable as a proof. The first half of your argument would go through equally well for 3n and 2n in place of 3^n and 2^n even though O(2n) = O(3n). The second half is simply a re-statement of what it means to say that 3^n is not in O(2^n) and doesn't actually prove it. The pertinent fact is that there is no constant k such that k >= (3/2)^n for all sufficiently large n because (3/2)^n goes to infinity as n goes to infinity. Compare this to the situation with 2n and 3n where (3n/2n) = 3/2 is a constant and hence bounded. A less trivial example would be something like n^2 + n versus n^2 where the ratio 1 + 1/n isn't constant but is nevertheless bounded for n >= 1.

That's a great textbook summary, now for applications:

You got an array of your friends, an array of people who upvoted this post and an array of people who replied to this post. Filter all your friends who upvoted and replied to this post.

Now, before you write the code, how fast/slow will it run? for 1,000 friends? 1,000,000? will the runtime grow extremely fast? why?

Here's a Redis command that intersects two keys - http://redis.io/commands/sinter - the complexity is "O(N*M) worst case where N is the cardinality of the smallest set and M is the number of sets." - do you understand why? do you understand when it will be fast and when will it be slow?

I'll give this a shot.

The code I would write would take longer the more friends you have. Basically, it would grow linearly with the size of all the sets and the number of intersections found. Something like O(n * M) in the worst case. But I don't get why later you say that n is the size of the smallest set.

So I don't see how you can beat that. I guess you could sort the lists, but that take O(n log n) over M lists, and then lookups would take O(log n).

To me it will be fastest when the algorithm terminates the quickest - the smallest set contains nothing, the 2nd smallest set contains no intersecting keys, anything like that and will be the longest in the worst case (each set is a proper subset of the next) and progressively worse as the subsequent sets get larger.

As for that algorithm running in O(N * M), I am not sure I understood that at first, but I think I do now. If you hold all other set sizes constant, and can only vary the input of the smallest array or the number of sets and you will notice the running time follow O(N * M), but if you increase the size of the largest set by k, then the algorithm will take k times as long to run (in the worst case), but if the running time function f(N,M) belongs to O(N * M) then k * f(N,M) belongs to O(N * M) as well, so the O notation still represents the running time complexity even if you increase the size of the largest set.

So the running time will still increase with the largest set, but the O-notation of the algorithm will still belong to the same representation.

With extra memory, and if you don't care about ordering, multiple set intersection can be made O(N) where N = size of each set added together, assuming no duplicates in any single set (which should be true, but it depends on how loosely you are using the word set) You only have to walk each element exactly once, and all other operations are (amortized) O(1)

For each element in all sets: Insert element into hash table. If it does not exist, use key = element, value = 1. If it exists, increment counter. This is O(1)

When done over all sets, it's O(N)

Walk hash table, the only elements that are in the intersection are those with value = number of sets. This is O(N)

O(N) + O(N) = O(N). This only works if there are no duplicates within a single set. It transforms set intersection into a counting problem over the universe of elements.

You can optimize it as well. You never need to deal with elements that don't exist in the hash table after the first set is processed, as they can never be in the intersection. This does not change the asymptotic complexity, except it means the hashtable will never resize after the first set. So it's advantageous to choose the smallest set to process first if you can do so easily. If you use dynamic perfect hashing or something similar, you can also guarantee all operations after the first set will be O(1), rather than amortized O(1).

That ("how would you implement ruby array intersection operator & and what computational complexity does your implementation have") was actually one of my interview warmup questions. In ruby core it's done exactly like you described, using a hash table, but I didn't know that until after the interview. I proposed a slightly worse solution, with sorting both inputs and then walking them simultaneously, which was O(n*log(n)). Still got the job, though :)

As a side note, I think that tasks like "implement a data structure with following operations and calculate computational complexity for each operation" are much better for exam than trivia questions like "Name the heap operations used by heapsort and the asymptotic running time of each.", with all due respect to @cperciva.

That's a very interesting solution, I would have never thought to try that.

Hmm, does all this imply that a blue-collar IT-workers' class is rising? When I was working during the summers at construction sites in my youth, I noted that everybody all the time complained about the impractical decisions the engineers and architects had made, and grumbling that they had to be there to fix them.

Would they have been happier if they wouldn't have been needed at all?

(disclaimer: I tend to change my shirt between white-collar and blue-collar [figuratively, of course] where I work)

(Shameless plug) I just wrote about this very thing on Friday: http://mattdeboard.net/2012/10/05/larry-the-software-guy/

Not my best writing ever but that round's already downrange. Plus there's an excellent response from Alex Feinberg in the comments.

Interesting analogy. I think it more strongly implies that for the type of applications most frequently written, general intelligence and experience trump theoretical knowledge. Honestly the education you get from a BS in comp sci (the level of knowledge the post addresses) isn't going to qualify you for much more than the guy who has 4 years of experience on his résumé. Much less in some circumstances actually. This is a strange field.

Meanwhile, you discover a long time down the road that the algorithm you thought was fine is actually incredibly slow when scaled to large inputs because you didn't have a good understanding of runtime analysis.

Or maybe, as you say, understanding more things makes you worse at coding. Who knows.

I guess I'm not buying that Big-O is a necessary requirement to understand how fast or slow an algorithm will be. It could be that someone understands and uses the process behind it without being at all familiar with the notation.

Some things you can look up quickly. Being functionally innumerate is not something you can fix with Google.

The tedious details of whatever god-forsaken CRUD framework is flavour-of-the-week with the cool kids probably WERE beneath him.

I think that question 1 is not well formulated (unless that is the answer he expects). Equality makes only very limited sense for asymptotic notation.

It is perfectly well formulated. The notation O(f(n)) denotes a set of functions. The question is asking if the two sets O(2^n) and O(3^n) are the same set.

That's right. Some people get confused due to the rather unfortunate convention of writing f(n) = O(n) instead of the correct f(n) ∈ O(n).

I think the OP is looking for a bit more of an answer than a simple Yes/No.

He is also asking "Why?", i.e., for a proof[1] to back up your Yes/No answer. Seems like a perfectly good question.

[1] Calculation, if you prefer.

It's a perfectly sensible question. Knowing that both O(2^n) and O(3^n) are sets, and using the definition of set equality, it asks whether "f in O(2^n)" implies "f in O(3^n)", and vice versa. Only one of the implications holds...

I see a common argument in the comments here. Many of us are saying this is "Computer Science" and not "Software Development". In my opinion, knowledge of the answers to these questions will make you a better software developer. Ofcourse, without knowing any of these, you can still develop software that will work. But if you know answers to these questions (and many other similar computer science questions), you can make smart decisions on how to implement the software you are developing in a more efficient manner.

I knew the answers to all of these questions two years ago when I had just graduated, but I've completely forgotten what heapsort is (despite having used it a few times in programming contests) and cannot offhand remember what "bipartite" means (googled, saw the wikipedia image and instantly remembered maxflow and Ford–Fulkerson although I guess a DFS is good enough to determine bipartite-ness).

Why is the ability to retain this information at the top of my mind at all times necessary for me to be a Software Developer?

If it was a simple form that sent an email you'd probably get 100x more results. People are lazy.

Given the rate at which I'm getting emails, I don't want 100x more results!

How many have you got already?

16 in the first 1.5 hours.

Given that you're only getting dozens of replies, why are you asking for institution? The only useful pattern I can think of would be something like correlating against university ranking, but that doesn't sound too useful.

I was thinking I'd start with a "places most of us have heard of" (Oxford, Cambridge, Harvard, MIT, Stanford, etc.) vs. "everywhere else" comparison. I'm at somewhere over 100 replies at this point, so I imagine I'll have enough data for that sort of comparison to be meaningful.

Carl Sagan said "If you wish to make an apple pie from scratch, you must first invent the universe."

I'm a web developer. I use tools, languages, and frameworks built on many Computer Science concepts that I don't fully understand, but I would sorely hate to have to reinvent those wheels every time I need to create a new application, or to even need to understand their most intricate inner workings despite the fact that they have been abstracted away for my convenience.

Perhaps this makes me "mediocre" in the eyes of some, but I get my work done and have never had any of my managers or customers tell me that what I have delivered is "mediocre" in their eyes.

Several times I've come up against an issue that requires me to push the limits of my skills and knowledge, and I do not shy away from learning, but it would do me little good to learn many of the things that are taught in a typical CS course. I consider myself passionate about my work, and I strive to learn new things every day, but I don't consider myself to be at a disadvantage, nor do I find my lack of knowledge in those areas to be a deficiency.

I am proud of my accomplishments, and of the course I have taken in my career. Perhaps my skills would be of little use at Tarsnap, but I have no shortage of work at any number of other companies.

I remember reading a thread someplace (I think here or at reddit) that criticized the traditional software developer interview process. One person made the suggestion that the reason why a lot of companies ask more...bookish knowledge that seems like its from college homework/final exam, is that they don't want to admit that what they do is relatively mundane, and doesn't really require the daily application of the kind of thing you find on your data structures or operating systems final exam. So, they add complexity to their interview questions, as well as their development process, to compensate.

Does anyone know if traditional, "real" engineering fields, (civil, chemical, mechanical, etc) ask detailed, technical kinds of questions during interviews, like software engineering is known for? I find it hard to believe civil engineers get bombarded with all day grill fests by other civil engineers about technical questions in interviews, but I really don't know.

This is so depressing. I just finished a degree in "Information Systems" with the hopes of being a developer full-time and I can't answer any of these questions.

I guess I should be mad at the institution, but that wouldn't be productive. Where do I start now to start learning what I actually need to know? It seems like there is so much I don't know.

Information Systems (or IT, or MIS, or CIS) is not Computer Science. I made the same mistake: I was a lazy student and didn't want to do a lot of math homework, so I picked IT as it sounded a lot more 'practical' and 'hands on'.

I've slowly learned basic CS concepts since then on my own and working with others, and either some day I'll have time to fill in the gaps or I'll just continue to work in the "narrow area where [my] deficit doesn't matter."

I think the author has it backwards though, -he- is the one that works in a narrow area where the deficit does matter.

This is a very good book: http://www.amazon.com/Introduction-Algorithms-Third-Edition-...

I will say though, it's really intense on theory, and it requires a good background in math. A lot of people will recommend this book though. Also, it's physically heavy :)

That book is considered the 'Bible' of this area but I wouldn't recommend it to anyone who is looking to get started. It's like reading the dictionary.

I feel like this test would be better if it were reasonably timed, and you were allowed to look up the answers. That might be a little bit closer to what I actually do (especially when I don't know what something is). Calling me lucky to get by without knowing offhand the definition of a B-Tree, is kind of rude and also very wrong.

This is a problem in our industry. These are not "Software Development" questions. These are "Computer Science" questions.

But you should Google afterwards so you can feel like a right ponce for confusing Big-Theta and Big-O.

Too bad the author does not care to ask for the more important bits of developing software but devulges into questions a web crawler could answer.  Creating elitisim (by offensively downgrading any developer who cannot answers these questions or simply has to look up an answer) does benefit no one. Developing software is so much more than details of an algorithm, of an operating system or a network topology. It is about needs, about requirements, users, customers, support, processes, technology and the list goes on and on. Simply put, he is not asking questions for software developers but for programmers. 

That isn't an exercise in elitism. It is an extremely easy test to anyone who is actually trained in computer science.

If you can't answer the questions, then you should try to fill the gaps in your knowledge rather than whining that it's elitist.

A little off topic but this domain hosts a hacker news daily updated archive located here: http://www.daemonology.net/hn-daily/

Thanks cperciva for hosting it!

I've settled for http://hckrnews.com/ as the HN filtering site -- you can switch between Top 10 / Top 20 and for extreme procrastination, "Top 50%". With a Chrome extensions you can also track # of comments since you last time read them, for the stories with interesting discussions.

The idea is good, but the questions not so much. First off, the number of questions is too small for a subject so large as algorithms and data structures; I'd rather have something in the 25-40 range. But that of course, for objective questions which answer is a one-line sentence max (not requiring writing code or calculation).

And this approach is the second problem. A few questions of this kind are good; for one, I like Q.1 (trivial if you know the subject, puzzling if you don't as the proposition is just nonsensical as a mathematical problem). But some other questions are tests of memorization, sometimes for not too relevant information. For one thing, you ask to "name" heapsort operations. I have long forgot those names, BUT I know the fundamental ideas of heaps and heapsort, and writing the full heapsort (even in paper) is a simple exercise of coding... but I wouldn't still remember those names.

Also, I would spend significant time writing this code (and also deducing the complexity if I don't know it by heart), compared to somebody who just crammed through textbooks. (And yes, it is possible to memorize this kind of information in a few days of cramming; if those final exams have schedules known well in advance and they are spread at least 3-4 days from each other, this pretty much cancels the advantages you mention.)

You seem to be dismissive of "practical programming ability"; I agree that code as answer to written questions has issues -- coding problems are better in a whiteboard test, interacting with the examiner. Or with careful restrictions of language etc. But it seems wrong to have questions that penalize (due to time restrictions) people who can deduce the answer with some coding or calculation, and reward people who just memorized the subject... for one thing, this memory won't last forever, unless you become a professor to keep "practicing the theory" on a daily basis. But more important, the ability to deduce the answer is very revealing of somebody's knowledge of the really fundamental concepts. Show me a CS graduate who knows by memory the bipartite graph algorithm, and this may be just some guy that has study discipline and good memory but is clueless about the core ideas of graphs and will have forgotten everything after a couple years of graduation. Now take somebody who can sketch the code for bipartite testing, even when given a problem that does not use the word "bipartite" or even the word "graph" -- so the student needs to identify the proposed problem as a specific graph problem, and then deduce the algorithm in order to solve the graph problem -- and that's somebody I would hire, either for a software engineer position or for a professor position.

Software engineering and/or programming is a vast, vast field. Anything you can express logically, someone somewhere is expressing in software. This "or you're lucky enough to be working within a narrow area where your deficit doesn't matter" should read "or you're lucky enough to be working outside this narrow area so your deficit doesn't matter".

Note: What you're seeing today is just 1/4 of the test. The other three parts cover different areas.

Question for the author: are these things that YOU knew before working on the projects that required them?

Pretty sure a PhD in Computer Science, like Dr. Colin Percival (the author) has to known that stuff. Even a lowly undergraduate like I was had to learn it by the second year.

I was writing software professionally in high school. Second year undergrad seems a little late in life to be learning this stuff, if it is critically important.


Why do people who like to ask O(N) questions have this obsession with sorting algorithms?

Because in the 1960s, sorting records was what computers spent most of their time doing.

At this point, it's mostly a "because that's how we've always introduced algorithms and complexity" thing.

It's still what computers often spend a lot of time doing. What do you think your database engine is doing most of the time? Those indexed columns aren't just there to smile warmly upon your data.

I wonder about the utility of an exam that one can ace with five minutes of googling.

I'm not going to be handing out degrees based on these. If you mean the utility of the specific questions, the point is that knowing these things is indicative of your general level of knowledge in certain areas.

>knowing these things is indicative of your general level of knowledge in certain areas.

Very possibly. But is knowing the same as remembering? Compare remembering the specific value of quicksort's worst case to knowing that some sort algorithms are better than others when, say, the input is likely to already be sorted, and that you should go crack open CLRS if you dont recall which.

If you can't remember quicksort's average vs. worst cases, odds are that the behaviour of sorting algorithms isn't sufficiently top-of-mind that you'll say "wait, this data might already be sorted! I might need to use a different algorithm here".

Quicksort is usually randomized. So it is not a matter of the order of the elements, it is a matter of very bad luck on selecting your pivots. So even if worst case performance is very bad it is very unlikely. But you have to understand why.

You probably will never implement quicksort, but it is useful to know, how it works, as you may be able to use some of the ideas on your own algorithms. Quicksort is also beautiful, so it can be a good inspiration on writing good software.

> I wonder about the utility of an exam that one can ace with five minutes of googling.

I won't debate the utility of the exam, but the majority of the tests I have taken can be aced with five minutes of googling.

Likewise for myself. Though I suspect that it's because testing for memory is easier than testing for knowledge, especially when one can simply assume that the former accomplishes the latter.

Because any professional worth of its salt is supposed to be able to work with googling.

But the author specifically asked people not to "cheat".

Typo. I meant to say "without"

I use these kind of questions in interviews. You either know the answers or you don't. There's no looking at your 'phone in an interview (at least not one with me being the interviewer).

Why is the software development interview so radically different from the kinds of conditions a developer actually works under? I mean, I can't imagine someone on the job struggling with code, but only being able to look at a printout of it rather than in an ide/debugger, unable to look at a book, the internet, or ask someone for advice, and having a strict 30 minute or so time limit to figure out their problem.

Fair enough. I take a different approach in my interviews: try to gauge how useful the person will be to the team. We have internet access, and a library with CLRS if someone needs to implement a bipartite graph test. It matters more to me that they'd know why they'd need one.

We use those kinds of approaches too. I never said the interview consists solely of closed book tests, they're just one thing I find useful to help gauge the suitability of an interviewee.

[EDIT] The thing I'm most looking out for is how a candidate answers when they don't know the right answer.

I had to omit proofs to complete in 15 min.

This will obviously not be a representative sample of working programmers (or even readers of this post). People who know they don't know are less likely to respond.

I think I could do the first and the fourth (economist by day, coder by night). Answers to the others elude me: good tutorials and relative importance to coding would be greatly appreciated!

That looks like a "Computer Science" final exam. Software dev != CS.

It's a "Computer Science software developers should know" exam.

I have a question: These are pretty rudimentary questions and if you have a background in the literature, you could brush up in about an hour or two on most of this stuff (aside from memorizing the catalogue of data structures and algorithms you might find in a text book [which makes me wonder why he's focusing on these questions instead of an understanding of induction, space analysis for an arbitrary data structure, or general Big-O/Big-Omega analysis]), but why don't we actually come across uses for that sort of basic stuff all that often?

I mean, I don't know if what I do is all that specialist, but I spend a lot more time working out far less interesting things about how domain knowledge works than what sort of Data Structure or Algorithm is best for storing it and using it.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact