Where by "narrow" one presumably means "narrow in theoretical scope, but extremely large in terms of number of people, number of customers, number of paid hours spent working on things, and amount of impact on the world". Most of the web was built by people who don't know what B-trees are.
How do these people succeed as much as they do? Because the folks who designed PostgreSQL do understand what B-trees are, and they have been very successful at their goal: To encapsulate that knowledge in an abstracted system that other people can learn to drive without entirely understanding how it works.
This is not a problem with the web developers, nor is it a problem with this exam, which at first glance looks reasonable – people with CS degrees should be able to answer these questions. It is, perhaps, a problem with the concept of "software developer": It's being asked to cover too big a range. If the automotive industry were like computing, we'd use the phrase "auto mechanic" to refer to an amateur with a Haynes guide, a Midas Muffler employee, a member of a NASCAR pit crew, a Formula One driver, a mechanical engineer at Tesla Motors, an organic chemist working at a tire company, and Kiichiro Toyoda.
By number of sites, perhaps. By revenue, I don't think so. The big names that dominated what the web is to most people certainly do know this stuff, at least insofar as the practical implications, and they try not to hire people that don't for development roles. That isn't to say that they don't also just use Postgres (or Berkeley DB, or whatever) instead of writing their own.
If the automotive industry were like computing, we'd use the phrase "auto mechanic" to refer to...
I totally agree with this - the field could use a few more job titles (and a little less grade inflation - SENIOR software engineer on 2 years experience? Really?) Software Fitter would be a much better description for the skilled assembly role that a lot of modern software boils down to. CRUD Technician is going to to need some PR work though!
>>Software Fitter would be a much better description for the skilled assembly role that a lot of modern software boils down to. CRUD Technician is going to to need some PR work though!
I don't know why intelligent people with a little knowledge automatically think they are going to be rich. This causes a lot of heart burn and pain to people in later parts of their lives when they figure out things don't work that way in the real world.
Regardless of whatever you may think about them. You may even think them undeserving of all the money they earn, you may think they are stupid. The fact is abstraction enables people to work with a lot of things without knowing much about them in detail. In fact abstractions are invented for that very purpose. So that a million people shouldn't waste their time, energy and effort to relearn things, when all their resources can be better utilized to know just the abstract details and build things on top of it. This enables people to build things quickly on what is already available.
Revenues, money et al are dictated by what the world considers valuable at any time. You might know 1000 algorithms from the book, you might have studied every other data structure that's out there. But if knowing how to write a SQL query is what is valuable in the economy right now, guys doing that are likely to be paid more than you.
This isn't surprising at all. As developers we enjoy a unique position in the market. As for the money we get, compare this with any other industry, our peers working there get paid well less. How is is possible that people who work at the same level of difficult as we do get paid less than us? Because- merely knowing things and difficulty of tasks doesn't qualify for a better compensation.
On the other hand, something I'd expect to see on the final exam - and is already clearly absent based on the outline - is compiler design. If most of the world is doing web development and doing a lot of "slopping around of data between formats", parsing becomes important. Type systems become important. Code generation _may_ be important. And here, again, they don't have to be in-depth questions about the differences between LALR and packrat parsers, or describing the eight sides of the lambda cube. Simply demonstrating awareness of doing something other than mashing out a bad regex for every parsing problem would be an advance over the status quo.
The author believes this test is very basic, and to follow your auto mechanic metaphor it would be like asking anyone from a Midas Muffler employee to Kiichiro Toyoda the difference between disc and drum brakes.
(Not that I agree with the author - to me, "basic knowledge" would be "What is big O notation?")
Your explanation is linked in the top of the test, but based on many of the comments here I don't think many people read it. It helped me put the test into perspective.
I would like to believe that if I came across a problem where knowing this material would help solve the problem, some part of my brain would activate and put me in the position of "Oh, I know I don't know this anymore" and I'd go look it up. Vs "not knowing what I don't know" and just bumbling along ignorantly.
I guess this is a long-winded way of saying I'm glad to have a CS degree and I think it puts me a leg up over software developers who haven't studied CS, but I'm still not sure how applicable these questions are to most developers.
Edit: On most exams I've taken, the fifth question would have been asked like this: A bipartite graph is a graph whose vertices can be divided into two disjoint sets U and V such that every edge connects a vertex in U to one in V; i.e. U and V are each independent sets. Equivalently, a bipartite graph is a graph that does not contain any odd-length cycles. Describe an algorithm that determines if a graph is bipartite.
I find all the uses of bipartite graphs to be in maximal matching or similar situations.
These kind of questions should not be the only basis of the software development final exam. In software development problem solving, communication with others, knowledge of the tools etc. are much more important than knowing random knowledge about algorithms. And these skills are much harder to test than asking the worst-case run time of quicksort.
The difference between me and someone who is unfamiliar with computer science (but is still good with Google) is that I can read the answers and their corresponding articles and understand what they mean, as opposed to the other person, who would just be able to recite the answers as trivia.
Seriously do people remember all this math all the years they are building software? Unless you spend a great deal of time revisiting these concepts every other month, there is no way this is all going to be in your head. And if you actually are spending that amount of time learning this kind of math, I would wonder what kind of a programmer you are. A programmer is supposed to build things in this time, gaining mountains of factual knowledge which doesn't have much value in the real world is not very good use of time.
By the way this algo/DS love looks very similar to tool religion. People focus too much on tools used to solve problem, while they should actually be focused on problems.
And yes, you are right, you can get away without knowing that stuff, and still do awesome work. And you can know all those things, and never get anything done.
That said, I looked up on the internet what they were and it was pretty easy to follow.
If you can't answer the majority of the questions on these four papers, and you're working or intend to work as a software developer, you should ask yourself why — most likely you're either you're missing something you really should know, or you're lucky enough to be working within a narrow area where your deficit doesn't matter.
The reader can judge for themselves whether this is claiming that "people who don't know the answers shouldn't be programmers." I personally find that characterization rather rude toward the author.
This implies that non-algorithmic work is a narrow area. In my experience, it's the vast majority.
The reason for this is that most Application Programming emphasizes the use and sufficient knowledge of a framework, rather than a full understanding of CS fundamentals.
While those with CS have a head start in understanding frameworks quickly and utilizing them effectively, self-taught programmers can catch up rather quickly if the framework is well written and geared towards them. And fast majority of frameworks trend towards this.
Application programmers generally need to know the ins and outs of a framework and this knowledge often carries them further than the fundamentals.
All that being said, after 10 years of application programming, I'm slowly trying to re-learn much of my CS material out of my own personal interest and get off the framework treadmill. It's been a fun process to rediscover fundamentals after learning them over 10 years ago.
But let's be realistic about it, most app programmers have no interest in CS fundamentals or theory unless it has some direct context to their daily work.
I know there's a use for these details, I want to learn them, but I suspect that many times the direct context is built-in, which is pretty much what you say about learning frameworks, which may be possible to consider as dialects of a language. In fact, by Googling I see that people have written about language subsets in the context of Lisp, so full-circle I go.
> Sure, there's plenty of software development which doesn't involve fancy algorithms or data structures... avoiding computer architecture, operating systems, mathematics, networking, databases, and distributed systems as well is a bit harder.
This is at odds with the statement in your article that jumped out at me:
"If you can't answer the majority of the questions on these four papers, and you're working or intend to work as a software developer, you should ask yourself why--"
I disagree, and furthermore, adding more topics means that your assertion is going to be less accurate.
Of course, it's better to know more, and I'm glad to have a computer science degree. However, I think that conflating having a computer science background and general software development is confirmation bias. I don't have data to back me up, but I believe that it's more common to simply not need to know what a bipartite graph is, or even a majority set of these topics. Given the theoretical grounding of the question in your first block, I'm guessing that parts 2-4 will be similarly structured.
Computer science is one slice of software development. Software development is a huge, deep field. A software development exam could just as easily contain an exam including the following topics: communicating effectively through email, distilling requirements from customers, writing meaningful commit messages, testing theory, or hundreds of other non-theoretical topics.
What I'm trying to say is that your exam is a good reminder of computer science at the root of many of the fields of knowledge in software, and is fun, but it's not representative of software development. In a given year, most software developers are going to spend more time puzzling through garden-variety implementation bugs than thinking about any two of the questions you list. Which is why they're not "lucky". :)
The interview was 99% math. The people who asked me question looked all smart alecs and their questions all seemed to indicate that they wanted to show me how intelligent they are, and how stupid I am.
By the end of the interview it was clear to me why they were stuck. Those people had absolutely no interest in building things. It was all about individual One-upmanship, and showing their little math and puzzle tricks they would have gathered by combined reading on the internet over the years.
> you should ask yourself why
As referring to "you can't answer the majority of the questions" or "you're working or intend to work as a software developer". I read it as the latter, but I guess it could also be read as the former.
My degree in Portugal was a 5 year degree with lectures from computer science and software engineering.
I'm a self-taught front end web developer who didn't have a traditional computer science education. I'm looking to move into general software development, and I can't answer any of the questions on the first paper without cheating.
Can anyone recommend specific online courses or books for the four proposed papers to fill the gaps in my knowledge without spending three years at Oxford?
: From the original post at http://www.daemonology.net/blog/2012-10-08-software-developm...
1. Algorithms and Data Structures
2. Computer Architecture and Operating Systems
4. Networking and Systems
Algorithms and Data Structures:
Computer Architecture and Operating Systems:
Networking and Systems:
Computer Systems: A Programmer's Perspective
is superior to the widely mandated
Computer Architecture: A Quantitative Approach
The Graham, Knuth, Patashnik book on concrete mathematics (a bit of a play on words for continuous and discrete) is a quality work as well but written to a considerably higher level that may or may not be pleasant to contend with depending on the reader's background.
Thanks also for proposing the test and posting the papers online. It's hard to discover what level of knowledge is assumed of software developers if you've never worked in a large team and haven't entered the field via graduate or postgraduate study.
The good thing is @cpervica has suggested standard textbooks, as opposed to suggesting obtuse texts unreachable to a beginner.
I think a beginner might be confused by Sedgewick's terse coding style.
I haven't read either of the books. I would add http://bcs.wiley.com/he-bcs/Books?action=index&itemId=04... to the list. Like most of the OS books, you won't get a full picture of the OS from this book, but you will know the topics which will help you understand a real OS. Also most of the non-recent book won't cover flat memory model http://en.wikipedia.org/wiki/Flat_memory_model That is not to say knowing about segmentation is going to hurt you.
I haven't read any of the books. From TOC, the cryptography and the statistics book looks good. I would recommend skipping cryptography maths for starters. If you are a beginner, an application level understanding of cryptography is what you need.
Tanenbaums books loves history and theory. It's a nice book nevertheless. Once you are done with it, you should read Stevens' book on TCP/IP http://www.kohala.com/start/tcpipiv1.html
If you find time, you should read the second volume as well:
For databases, I read this book:
This is not a very interesting book, but it does cover the fundamentals very well. You will need a bit of will power to read it.
For OS's, Tanenbaum (http://www.amazon.com/Modern-Operating-Systems-Andrew-Tanenb...) is popular.
'Math' is broad - if I can recommend only one book to cover all of Math I'd probably say 'The Road to Reality' (http://www.amazon.com/The-Road-Reality-Complete-Universe/dp/...). More practically (for the subset of math most programmers are likely to care about), you'll do fine with one good discrete math book and one linear algebra book. Throw in one each on Stats, Abstract Algebra, Calc (up to ~diffeq), and Real Analysis (in roughly that order) if you're a bit more ambitious ;-)
I also think for understanding how operating systems work, nothing beats writing your own! I learned most of the concepts by building a toy OS during the better part of my undergraduate studies. I highly recommend this for people who like coding and are afraid of jumping into the theory too quickly. For example, analyzing memory allocation algorithms is never as interesting as when you have to pick one for your own kernel!
Class central (http://www.class-central.com/) is a good resource for seeing what is about to start.
I took the CS101 last month. The only new thing I learned was Hashes. However, I think it was well worth it. Hashing is probably one of the most important concepts in CS. Also the course was entertaining. It helped me also review old code I wrote. I figured two cases where I can use hash functions to speed up code.
I'm taking the Software Testing, and Design of Computer Programs this month. If you are free (no job/freelancing/errands...), you can do each course in 7-10 days. Granted, you'll be fully committed to them. That is, you can* do these 3 CS years in just 3 months.
* depends on your mental abilities; but I think as a self-taught Front-End, you'll tackle it.
- Algorithms part I
- Algorithms, design and analysis. Part I
- Computer Architecture
- Calculus : single variable
- introduction to logic
- introduction to computer networks
He covers all the questions in this, except the last one regarding graphs.
The website says: "Enrollment is Closed"
My current day job is working for the company that maintains and manages NZ's Company registry, as well as a dozen or so other registries over varying subjects.
My previous day jobs were in healthcare and various genres of insurance.
I have a BSc in CS. I have used, and this is the salient bit, absolutely nothing of this complexity in my day jobs. Ever. (personal projects are another matter, but I don't get paid for those)
- Big O notation has never been relevant: performance is always improved by doing less IO, rethinking data structures or more complicated SQL queries, throwing more metal at it and occasionally actual profiling which finds out we're doing stupid things (not that those stupid things are ever Big O related).
- Quicksort, who cares? I just do sorts in SQL or run Java's .sort() command (which does QS anyway), see above for perf concerns. I don't have to know about it to use it.
- Heapsort, who cares? Again, sort performance has never been a concern.
- Never used graphs, the only "algorithm" I've ever had to professionally write was a Luhn check and I probably should have used a library for that anyway.
Again, I don't want to say whether or not this is a good or bad reality, but the point is, the vast majority of people writing code professionally are basically writing the same app over and over again:
- Build web page I can CRUD data with
- Store data from that web form in a database
- Modulo some bespoke business rules
- Integrate with some 3rd party systems.
Big O notation , or at least a Big O way of thinking is definitely useful when doing things like optimising queries and data structures etc.
For example I've solved performance issues in the past with DBs with poor indexing strategies (or no index at all). When I read the DB manual and it talks about different types of indexes and says "doing X will cause Y or doing Z will cause A". Knowing something about Big O notation and related things means that I can immediately intuitively understand what they key differences between Y and A would be.
Thinking about performance in terms of growth also helps me guess which parts of the application require the most optimisation. For example which parts have a roughly fixed n and which parts n will grow with time.
Abstractions leak, especially when you are dealing with parallel processing which will become increasingly important as time goes on. How you store and think about your data is going to have huge implications for just about everyone in software in the coming years when having to scale sort operations etc over hundreds of discreet CPU cores becomes the norm.
I think a degree in CS is not intended to teach you "what you will need to be a successful software developer today" but more "here are the tools to think about whatever the problems may be for software developers in 10 years time".
I agree that maybe there is going to be a gap between what is required for a CRUD implementer vs what is required for someone who designs DBMS systems for oracle and that employers want people qualified in the latter to perform the former.
OTOH I feel that this is something that the free market will correct over time and is already beginning to do so. I know a few programmers who are productively employed who are not well versed in CS theory but are still payed well because they can bring other skills to the table.
- My current team created a project that lets you run complicated, multi-dependency Java code in an non-blocking manner. Internally, the framework uses topological sorting of a graph of nodes, where each node is a method with potentially blocking code and the dependencies between the nodes are the edges of the graph. The framework is able to set up callbacks automatically in the right order given the above.
- iOS6 auto-layouts internally use constraint solving on a set of linear equations so that you can write code to set the layout of your UI elements in terms of equations (so that you can basically just say that your button should be at 40px left of center of its superview, and with a 100px padding from the bottom).
- I wrote code to improve the accuracy of models at my previous job by employing various machine learning algorithms (thanks Andrew Ng and Coursera).
These might be isolated examples, but my point is that some of the best code I have seen uses a lot of math and CS concepts (sometimes in clever ways).
Graph theory has also proven useful. I created a simple topological sort to ensure that objects in my dataflow-based visual programming system targeted at home automation are run in the correct order.
You make a convincing case for your insularity IMHO :)
Now the didactic exercise of implementing a handful of data structures and algorithms like these from scratch has a lot of value. And knowing they exist, so you can look them up and find a library / implement if necessary, is also valuable. But knowing them 10+ years after college? The details are simply irrelevant; other things are more worth knowing off the top of your head.
Having the basics immediately to hand, without having to look them up, is needed to move on to the next level. Knowing how to recognize a bipartite graph means that when there's one around you're likely to think - hmm, I wonder if that's bipartite? recognizing the problem to be solved sometimes requires that you have a deep and fundamental understanding of the types of solutions that exist.
People seem constantly to say this - I can look it up on Google, why do I need to know this? I constantly solve problems my colleagues can't (and vice versa) because I have a deep familiarity with a range of techniques that they can look up any time they want, but don't recognize "in the wild."
What you are arguing for is knowledge of what meals taste like (e.g. two primary taste dimensions being space and time). I think this is really useful; so I agree with you.
But what Colin (and I mean OP) is asking for is knowledge of how to cook two specific meals, themselves rarely asked for.
I personally think CS education is mostly useless. Big-O notation and a survey of the field (a taste test of the meals, if you will) are all you really need bring along with you from discrete math, in my very humble opinion.
Cooking some of the meals while learning will teach you the general skill of following a recipe; a few more applied challenges will develop your improvisation skills. But the more applied it gets, the less CS it is.
The field is too broad, IMHO, for deeper knowledge of only a handful of things to be very useful. I know a lot about hash tables, for example, but that's because they were very useful in my job, and critical for performance. It would be a waste of time for me to know as much as I know about them now, coming out of college, never mind 10+ years later. I wouldn't come down like a ton of bricks on a recent graduate for not being able to name 3 different collision resolution approaches off the top of his head, never mind someone 10+ years out; even for a job that required their use and impromptu implementation, as mine did.
"on each paper, at least four of the five questions relates directly to material I have needed to know during my time working on Tarsnap, "
Most software jobs are far more distant from CS that Colin's. And more power to him for working fulltime on something that leverages his knowledge. (spoken as someone who spent a decade doing glue-random-apis-together-and-bill -by-the-hour enterprise software scut work)
Mind you they have 10+ years experience and can point to major succesful projects - so whether or not they have a degree or not is rather irrelevant.
Even with that, and fairly high grades, it was still a slight challenge to get a placement. Said placement is just to give me an edge over fresh graduates, I'll have a year in industrial experience over them.
Never saw that in the big companies, but it's been several years that I don't work there.
I chose a smaller software company, though. I much rather earn less but have a less enterprise-y work environment.
Working on large scale machine learning projects now. Couldn't be happier. And yes, I knew the answers to the questions without having to look them up. (though I suspect I'll get the Networking questions wrong.)
I've been developing software professionally for 23 years. I have a BSc in Computer Science  and an MSc in a computing-related field. And I have never needed to be able to answer questions like these to do my job. I have a vague understanding of big-O notation, and equally vague knowledge of the characteristics of various algorithms like heapsort, but thats all. If I need a sort algorithm then the framework or standard library provides a selection, all written by smarter people than me.
If you are developing a relational database or Google-scale distributed platform then, yes, you probably need to know this stuff. But most developers don't. I have to understand how to capture customer requirements, fit them into systems that already exist, estimate and implement them quickly and reliably, deal with the practical details of IIS or an RDBMS, maintain code, etc. I wonder if the author needs to get out more.
 Badly named, I learned little real comp sci.
Despite the sentiment here, Colin is right: this is knowledge that developers ought to have. Not because they need it to do their day job, but because they can't avoid learning these topics if they truly love the field.
Sure, you can earn a pretty decent paycheck hacking web apps and "enterprise" software without ever learning any of these things, and you may think, "This stuff is useless for most everyone." A friend of mine replied to that sentiment on Twitter: "If you manage a Taco Bell, I'm sure you feel a lot of the stuff in an MBA is useless for managers too." You can successfully bring in a paycheck programming without knowledge like this--just like you can successfully manage a Taco Bell without having an MBA--but you should not claim to excel in your field in either case.
Call me a snob if you will, but I don't want to work with mediocre programmers whose knowledge takes them only as far as the nearest web framework any more than I want to do business with the manager of my local Taco Bell. I want people with passion for our field, and those people can't avoid gaining the kind of knowledge that Colin's test asks about.
> I want people with passion for our field, and those people can't avoid gaining the kind of knowledge that Colin's test asks about.
Claiming to own the definition of passion is not snobbish but conceited.
Perhaps I consider programmers mediocre if they lack design experience and cannot show me how to setup custom guides in illustrator. "I want people with passion for our field" I don't simply want programmers who end at the compiled code, I want people who understand design, typography and color theory. A programmer who cannot do these things is mediocre at best.
Or maybe it is linux that is the bar. It doesn't make any sense for a programmer to use windows, only real programmers use linux. They should be able to install/configure openssh, tomcat, nginx, redis, mysql-server. They need to know bash inside and out, be able to create permissions, user groups and have a deep and thorough understanding of the kernel. If you are a windows programmer then I don't have time for you at my company where all we do all day is build awesome.
Perhaps it is mathematics. Algebra, calculus, trig, geometry. Statistics, analytics... I hope you have your TI-83 ready for this interview. As only the most mediocre of programmers will barely be able to survive in this industry without a MS in Applied Mathematics.
Hire who you want, base your interviews how you want you want. Just don't assume to be on the upper bar and beneath you lies mediocrity. That's all.
I'm talking about passion for programming. I'm not "claiming to own the definition of passion," I'm saying that passion for programming manifests in knowing the answers to questions like Colin's. It simply does.
Calling people that don't agree with you mediocre or apologists of mediocrity is just rude.
However, there is a difference between having been taught something, knowing about it, and having the information available in instant recall. For example, the B-tree question. three possible thought processes for this question (forgive me for my all-male cast):
Alan thinks, "Hmm. Well, they're both trees, but I don't really remember that difference. Maybe a BST is optimized for binary search, hence the name? This is a waste of time, I'll google it. Oh, of course, now I remember."
Bob thinks, "What!? What are trees!? Like outside? Oh no. Oh no oh no oh no. I remember what binary search is but how does it apply to a tree? Oh geez :("
Carl thinks, "A B-tree is a generalization of a BST, nodes in a B-tree can have more children than a BST"
Alan is probably a typical software developer with experience who might or might not have a degree but has been exposed to concepts. Bob is a college sophomore or someone who was never exposed to algorithms and doesn't have the background to even grapple with the question. If he googles for the answers, he'll be able to write them down but won't be able to relate them to other concepts or incorporate them into his mental model.
Carl just wrote a B-tree. Maybe Carl is a college senior, maybe a senior computer scientist at IBM.
in my opinion, you want people like both Alan and Carl. There are probably a lot more Alans than Carls. Hiring either is probably fine. There are probably a lot of Alans reading the OP, they probably won't be too upset about not knowing these off the top of their head because they know that they can refresh their memory on demand. Or maybe this is just my apology for my own mediocrity, because I had very few Carl level responses to the OP!
the problem I recognize is differentiating Alan from Bob. It's harder. If you only take the Carl responses to these questions, you'll draw only from a pool of smart people. If you start to allow Alan-level responses to be acceptable, you could also get a Bob. That would be bad.
Assuming you've been in the business for a long time, and not just fresh CS graduate or something, what have you done to stay intimately familiar with the concepts that you do not utilize in any regular capacity?
I would say any developer worth his salt ought to have(as you would say it) all this skills.
In any case, I guess I have a bit of reading to do..
Understanding CS theory is a way of testing intelligence, sure. But the author betrays his own lack of familiarity with the industry today in implying that it is required knowledge to be an effective developer.
I don't think that's a problem with the exam system though; I could probably have answered them 12 years ago. It's just related to the field you end up in and how much you need to lean on that knowledge in your day to day job.
As it stands, given a couple of hours of reading, I'd easily be able to refresh my knowledge and answer the questions, primarily because I had a solid formal training in it initially. Ability-wise, the me of my uni days could never do the job I'm doing now - well, not without another 15 years experience :)
On the other hand, I imagine that someone who hasn't been formally trained in complexity would have a harder time, because they wouldn't be aware of basic concepts such as complexity, the O notation, etc. Therefore, asking whether someone knows what O(n) means strikes me as a better performance test than trivia like the difference between O(2^n) and O(3^n).
So far that's turning out to be one of the most useful questions, actually... but I wouldn't call it trivia.
I know the running time of quicksort offhand, but I'd still consult wikipedia on the off chance I'm wrong.
ignorance regarding basic complexity differences; this is the problem.
And, since I'm asking already, I'd also like to know why you needed to know the information "on-the-fly" and couldn't have gotten there almost as easily with a quick refresher.
For example, I don't remember all the heap operations by heart but, if I needed to select a data structure for a critical task, I would go fetch Cormen and Skiena (the books, not the people) and get a quick refresher.
To me these questions are daunting. Am I the only one here who feels like they have an inferior CS degree? Reading the comments in HN threads usually makes me feel a bit dumb, but this post in particular makes me feel like a lot of money has been thrown at a degree that did not teach me a lot about Computer Science.
This reminds me of those trigonometry proofs we learnt in high school. I remember I could go through those complicate trig proofs with ease back then. Now beside some basic sine/consine stuff due to work, I completely forgot about the rest of them since there wasn't a chance to use any of them.
Basic course level knowledge are good to learn. Just the test needs to be relevant and in context.
Computer Science degrees are about laying the foundations so that you can understand things like B-trees when you need them.
If you have passed the class in university and now feel bad that you can't answer these questions, don't worry. It does not matter. If you want to work at Google, work through Cormen again before the interview.
Feel free to pick your favourite field like Compilers or Machine Learning and put a basic question to the author of the quiz, when you meet him in person. If he can't answer, pat yourself on the back and feel free to consider yourself a better software developer.
In fact, it was the second biggest surprise for me when starting out as a programmer after university. I wrote this up in "Top 5 Surprises When Starting Out as a Software Developer" http://henrikwarne.com/2012/08/22/top-5-surprises-when-start...
Speaking of cartesian product, did you ever wrote a huge database query that you had to optimize through smart usage of join methods, indexing structure, caching (eg materialized views), decisions like subqueries versus joins versus procedural code etc.?
Did you ever had to pick a library collection -- not implement your own hashmap or tree, but just select the ideal implementation -- to allow optimal searches in a given dataset?
If answer='Y' to any of these, then you needed that knowledge, and I guess you do in a regular basis. The thing about CS theory is that it structures and formalizes these problems so they can be precisely analyzed; used as basis for further inventions and improvements; reliably predict the result of some design choice, etc. You have some empirical knowledge of things like computational complexity, but having formal knowledge would often allow to replace hunches with objective certainty, or arrive to the optimal answer to some problems much quicker and more reliably.
At least that's the theory. ;-)
As for the specific questions: It's definitely good to know big O notation, so maybe I was a bit quick to dismiss the first question. However, a more realistic/useful comparison would be between O(n^2) and O(n log n).
For picking a collection to use – definitely knowing when to use a hash map versus a list. Asymptotic run-time behavior of Quicksort – no.
Binary search tree – that's the one example I mentioned in my blog post, so yes, useful.
Graphs – never needed in the applications I've worked with. Same for using a heap, never seen/needed.
Once again, of course you can learn most of these foundational concepts more empirically, or in a more "bottom-up" way by starting with the standard data structures and algorithms that you use in practice, but gradually deducing general concepts that unify them. But I wonder if this can ever result in the same deep insight that some study of graph theory can give.
Questions 3 and 4 I simply don't have the background (I've never heard of a B-tree, and I've heard of heap sort but I don't know the algorithm).
Question 5 I'd not seen before, but an algorithm was quite obvious because I've taken a course on graph theory (of course, my answer could easily be wrong).
My tentative position is that these questions are easy if you've covered the relevant material before, and very hard to impossible if not.
Disclaimer 1: my degree was mathematics, not computer science, and I don't work as a software developer (though I do write code every day).
Disclaimer 2: one of the questions I "knew" the answer to, I got wrong!
There's a lot of people who are very confidently giving wrong answers to questions, which weighs against that position.
Given the selection bias we would expect among respondents, have you thought about how you will interpret the responses?
(If the unicode breaks, that's the is-a-member-of symbol.)
It is an equality relationship why would they need a new symbol?
f(n) = O(g(n)) .. n^2 = O(n^2) and n = O(n^2), so n = n^2 which is not making any sense
You are right it doesn't make any sense. Try n^2 = O(Selection_sort(n)),O(Bubble_sort(n)) = O(Selection_sort(n)), n^2 = O(Bubble_sort(n)). Nice and transitive the way it should be.
Another way to see that the relationship can't be equality: the left hand side is a function, the right hand side is a set of functions. Two things from different classes can't be equal.
The relationship is simply set membership.
Selection sort is also O(n^3) and O(n!) and O(n^2 log log log log log n). The correct operator for saying that f(x) is O(n) is the set exists operator ∊. O(n^2) is a set of functions that is a strict subset of the set of functions that are in O(n^3).
Selection sort is only big-theta(n^2), which indicates both the bounds are tight (eg for Selection sort it is true that it's asymptotic behavior is both f(n) <= an^2 + m AND f(n) >= bn^2 + n.
Part of the reason was to find out what I didn't know.
What I didn't know was basically some parts of set theory and quite a chunk of automata theory.
I did find cperciva's questions to be trivial, but I wouldn't say that makes them trivial. A TV quiz once said, "It's easy when you know the answer.".
Algorithms for someone else might be what automata theory was to me.
I suppose what I'm surprised about is that a large number of developers haven't taken the time to learn (or have since forgotten) the theory around these particular concepts.
I wonder if I legitimately just forgot about B-trees, or if my algorithms courses just didn't cover them.
(Remember our talk about impostor syndrome? In my case it's actually true.)
1. Is O(2^n) equal to O(3^n)? Why?
I have absolutely no idea what that means.
But I did have to answer this question: why does the application crash? "Oh...that's because the developer that was pontificating yesterday about how heap operations behave asymptotically forgot to check if the database connection was open before he called the Save method. Apparently, such arcane trivia is beneath him. I can fix it."
False dichotomy. You posit the developer who understands asymptotic notation can't figure out why does the application crash, or how to write to databases. I don't see how understanding asymptotic notation negatively affects someone's programming ability..
Why would it make them worse?* I was just arguing it doesn't necessarily make them better. Just because you are a bad developer doesn't mean you get better when you learn about asymptotic behavior or heapsort or O-notation. Lots of really bad developers know about those things. And lots of really bad developers don't know about them as well.
Knowledge of those items is neither sufficient nor necessary to be either a competent or incompetent developer (at least for reasonable definitions of both competent and developer).
* Ok - maybe too much knowledge has some odd 2nd order negative effects on productivity: analysis paralysis, over-confidence, unwillingness to ask for help, programming toward a theoretically pure solution rather than a pragmatic solution.
They asked us this questions on our high school final exams, I'm sure you'll manage.
The problem is that the perspective expressed by the GP comment is a pure expression of the "don't know what you don't know" adage. Sure, you can hammer out solutions to problems, but the more exposure you've got to more concepts in your field, 1. the bigger your toolkit and 2. the quicker you can start drawing conclusions about the problem.
I'm actually pretty shocked someone's confessing to not knowing what big O notation is and proud of it.
Yep. So I looked into it and it seems they are not equal. Every f(n) in the set O(2^n) belongs to O(3^n). While this means that O(2^n) is a subset of O(3^n), we can see from the trival case f(n)=3^n that O(3^n) is not a subset O(2^n) since 3^n > 2^n for all n as n approaches infinity, or there is no k such that is exists an n0, such that all n > n0 implies 3^n < k*2^n, therefore the two sets are not equal.
Great. I still don't see how I am more qualified to do absolutely anything in life.
For what it's worth, this isn't really acceptable as a proof. The first half of your argument would go through equally well for 3n and 2n in place of 3^n and 2^n even though O(2n) = O(3n). The second half is simply a re-statement of what it means to say that 3^n is not in O(2^n) and doesn't actually prove it. The pertinent fact is that there is no constant k such that k >= (3/2)^n for all sufficiently large n because (3/2)^n goes to infinity as n goes to infinity. Compare this to the situation with 2n and 3n where (3n/2n) = 3/2 is a constant and hence bounded. A less trivial example would be something like n^2 + n versus n^2 where the ratio 1 + 1/n isn't constant but is nevertheless bounded for n >= 1.
You got an array of your friends, an array of people who upvoted this post and an array of people who replied to this post. Filter all your friends who upvoted and replied to this post.
Now, before you write the code, how fast/slow will it run? for 1,000 friends? 1,000,000? will the runtime grow extremely fast? why?
Here's a Redis command that intersects two keys - http://redis.io/commands/sinter - the complexity is "O(N*M) worst case where N is the cardinality of the smallest set and M is the number of sets." - do you understand why? do you understand when it will be fast and when will it be slow?
The code I would write would take longer the more friends you have. Basically, it would grow linearly with the size of all the sets and the number of intersections found. Something like O(n * M) in the worst case. But I don't get why later you say that n is the size of the smallest set.
So I don't see how you can beat that. I guess you could sort the lists, but that take O(n log n) over M lists, and then lookups would take O(log n).
To me it will be fastest when the algorithm terminates the quickest - the smallest set contains nothing, the 2nd smallest set contains no intersecting keys, anything like that and will be the longest in the worst case (each set is a proper subset of the next) and progressively worse as the subsequent sets get larger.
As for that algorithm running in O(N * M), I am not sure I understood that at first, but I think I do now. If you hold all other set sizes constant, and can only vary the input of the smallest array or the number of sets and you will notice the running time follow O(N * M), but if you increase the size of the largest set by k, then the algorithm will take k times as long to run (in the worst case), but if the running time function f(N,M) belongs to O(N * M) then k * f(N,M) belongs to O(N * M) as well, so the O notation still represents the running time complexity even if you increase the size of the largest set.
So the running time will still increase with the largest set, but the O-notation of the algorithm will still belong to the same representation.
For each element in all sets:
Insert element into hash table. If it does not exist, use key = element, value = 1. If it exists, increment counter. This is O(1)
When done over all sets, it's O(N)
Walk hash table, the only elements that are in the intersection are those with value = number of sets. This is O(N)
O(N) + O(N) = O(N).
This only works if there are no duplicates within a single set. It transforms set intersection into a counting problem over the universe of elements.
You can optimize it as well. You never need to deal with elements that don't exist in the hash table after the first set is processed, as they can never be in the intersection. This does not change the asymptotic complexity, except it means the hashtable will never resize after the first set. So it's advantageous to choose the smallest set to process first if you can do so easily. If you use dynamic perfect hashing or something similar, you can also guarantee all operations after the first set will be O(1), rather than amortized O(1).
As a side note, I think that tasks like "implement a data structure with following operations and calculate computational complexity for each operation" are much better for exam than trivia questions like "Name the heap operations used by heapsort and the asymptotic running time of each.", with all due respect to @cperciva.
Would they have been happier if they wouldn't have been needed at all?
(disclaimer: I tend to change my shirt between white-collar and blue-collar [figuratively, of course] where I work)
Not my best writing ever but that round's already downrange. Plus there's an excellent response from Alex Feinberg in the comments.
Or maybe, as you say, understanding more things makes you worse at coding. Who knows.
The tedious details of whatever god-forsaken CRUD framework is flavour-of-the-week with the cool kids probably WERE beneath him.
 Calculation, if you prefer.
Why is the ability to retain this information at the top of my mind at all times necessary for me to be a Software Developer?
I'm a web developer. I use tools, languages, and frameworks built on many Computer Science concepts that I don't fully understand, but I would sorely hate to have to reinvent those wheels every time I need to create a new application, or to even need to understand their most intricate inner workings despite the fact that they have been abstracted away for my convenience.
Perhaps this makes me "mediocre" in the eyes of some, but I get my work done and have never had any of my managers or customers tell me that what I have delivered is "mediocre" in their eyes.
Several times I've come up against an issue that requires me to push the limits of my skills and knowledge, and I do not shy away from learning, but it would do me little good to learn many of the things that are taught in a typical CS course. I consider myself passionate about my work, and I strive to learn new things every day, but I don't consider myself to be at a disadvantage, nor do I find my lack of knowledge in those areas to be a deficiency.
I am proud of my accomplishments, and of the course I have taken in my career. Perhaps my skills would be of little use at Tarsnap, but I have no shortage of work at any number of other companies.
Does anyone know if traditional, "real" engineering fields, (civil, chemical, mechanical, etc) ask detailed, technical kinds of questions during interviews, like software engineering is known for? I find it hard to believe civil engineers get bombarded with all day grill fests by other civil engineers about technical questions in interviews, but I really don't know.
I guess I should be mad at the institution, but that wouldn't be productive. Where do I start now to start learning what I actually need to know? It seems like there is so much I don't know.
I've slowly learned basic CS concepts since then on my own and working with others, and either some day I'll have time to fill in the gaps or I'll just continue to work in the "narrow area where [my] deficit doesn't matter."
I think the author has it backwards though, -he- is the one that works in a narrow area where the deficit does matter.
I will say though, it's really intense on theory, and it requires a good background in math. A lot of people will recommend this book though. Also, it's physically heavy :)
If you can't answer the questions, then you should try to fill the gaps in your knowledge rather than whining that it's elitist.
Thanks cperciva for hosting it!
And this approach is the second problem. A few questions of this kind are good; for one, I like Q.1 (trivial if you know the subject, puzzling if you don't as the proposition is just nonsensical as a mathematical problem). But some other questions are tests of memorization, sometimes for not too relevant information. For one thing, you ask to "name" heapsort operations. I have long forgot those names, BUT I know the fundamental ideas of heaps and heapsort, and writing the full heapsort (even in paper) is a simple exercise of coding... but I wouldn't still remember those names.
Also, I would spend significant time writing this code (and also deducing the complexity if I don't know it by heart), compared to somebody who just crammed through textbooks. (And yes, it is possible to memorize this kind of information in a few days of cramming; if those final exams have schedules known well in advance and they are spread at least 3-4 days from each other, this pretty much cancels the advantages you mention.)
You seem to be dismissive of "practical programming ability"; I agree that code as answer to written questions has issues -- coding problems are better in a whiteboard test, interacting with the examiner. Or with careful restrictions of language etc. But it seems wrong to have questions that penalize (due to time restrictions) people who can deduce the answer with some coding or calculation, and reward people who just memorized the subject... for one thing, this memory won't last forever, unless you become a professor to keep "practicing the theory" on a daily basis. But more important, the ability to deduce the answer is very revealing of somebody's knowledge of the really fundamental concepts. Show me a CS graduate who knows by memory the bipartite graph algorithm, and this may be just some guy that has study discipline and good memory but is clueless about the core ideas of graphs and will have forgotten everything after a couple years of graduation. Now take somebody who can sketch the code for bipartite testing, even when given a problem that does not use the word "bipartite" or even the word "graph" -- so the student needs to identify the proposed problem as a specific graph problem, and then deduce the algorithm in order to solve the graph problem -- and that's somebody I would hire, either for a software engineer position or for a professor position.
At this point, it's mostly a "because that's how we've always introduced algorithms and complexity" thing.
Very possibly. But is knowing the same as remembering? Compare remembering the specific value of quicksort's worst case to knowing that some sort algorithms are better than others when, say, the input is likely to already be sorted, and that you should go crack open CLRS if you dont recall which.
You probably will never implement quicksort, but it is useful to know, how it works, as you may be able to use some of the ideas on your own algorithms. Quicksort is also beautiful, so it can be a good inspiration on writing good software.
I won't debate the utility of the exam, but the majority of the tests I have taken can be aced with five minutes of googling.
[EDIT] The thing I'm most looking out for is how a candidate answers when they don't know the right answer.
This will obviously not be a representative sample of working programmers (or even readers of this post). People who know they don't know are less likely to respond.
I mean, I don't know if what I do is all that specialist, but I spend a lot more time working out far less interesting things about how domain knowledge works than what sort of Data Structure or Algorithm is best for storing it and using it.