
The software development final exam: Mathematics - cperciva
http://www.daemonology.net/blog/2012-10-10-software-development-final-exam-part-3.html
======
kamaal
I have immense respect for Dr Collin Percival, and his work. But isn't this
getting overboard?

Seriously, calculus? I understand Collin is a computer scientist who is also
programmer by co incidence. But this is almost like 'I know it, so you must
too'. Most developers are likely to never ever deal with these things at all.

Programming today is so much productivity, discovery and building things. If
you face these thing along the way, very well learn them. But to spend the
next two years of your life learning calculus which you are likely to never
use ever in your career and then only to find you are likely to forget it 2
years down the lane anyway, is a wrong way to be spending your time.

As a programmer what excites me most is a new challenge I've never faced
before, And the journey of hard work, discovery, failure and success that
follow from such a attempt. I don't mind failing while doing something even if
I don't know much about it. I'm likely to learn them by discovering and
reading new things, than spending straight 1 years learning all from a book
without knowing where they will be ever used up.

The only thing that excites me besides money is the joy of discovering new
things, and realizing that I might have solved a real world problem that might
be helping someone. Trivia stuff doesn't excite me anymore, I don't see what
and how have I changed things around me by merely just knowing more.

Life is really short, I know I have little time to make all the money I want
so that I can see the other parts of life. I know coding and math are
exciting, but they are among the many things that are exciting in life. Think
of it this way, you might have a favorite Ice cream flavor, but unless you
taste other flavors how would you ever know if others are better? Or after
trying the other flavors you might just discover you have a new favorite
flavor!

~~~
irahul
> But isn't this getting overboard?

Naive bayes, with a relevant dataset, does a very fine job of data
classification(sentiment analysis, spam detection...). Also, almost everyone
who isn't a liberal arts major would have come across Bayes theorem in high
school or college.

The question about A/B testing is solving simple linear equations. I believe
anyone in 10th grade or above should be able to solve it.

Hashing and MAC are pretty much application level security - they don't
require intrinsic knowledge of _how_ ; _what_ and _why_ is something an
application developer should know.

That leaves us with that harmonic progression thingy and padding. Padding, I
think, is to avoid cryptanalysis(extrapolating from what you know is a part of
knowing things:)).

Overall, I won't say he is going overboard with his questions. If anything, to
anyone familiar with the topics, his questions are pretty trivial. The only
thing under contention is if the topics he considers relevant are actually
relevant.

~~~
mattmanser
But what have they got to do with _software development_.

Zero. Zilch. Nothing. Absolutely sod all.

From the OP's original post, after seeing the questions over the days, his
claim:

 _If you can't answer the majority of the questions on these four papers, and
you're working or intend to work as a software developer, you should ask
yourself why — most likely you're either you're missing something you really
should know, or you're lucky enough to be working within a narrow area where
your deficit doesn't matter_

So far almost none of the questions on any of the days have been the slightest
bit 'important' in software development.

~~~
irahul
> But what have they got to do with software development. Zero. Zilch.
> Nothing. Absolutely sod all.

I don't have all of his questions at hand, but from memory:

1\. Basic knowledge of statistics and probability is required for machine
learning.

2\. His question about zeroing multi-dimension array is to test if you
understand the under lying memory model.

3\. Do we really need to discuss why you should know how cryptographic hashes
work?

4\. B-tree has better locality of reference and are de-facto data structure
for storage for majority of the cases. Granted, not many people do low level
storage, but does that somehow makes it irrelevant to software development?

5\. Mutex, rw-locks etc are building blocks of concurrent programs.

I don't know why you are assuming anything which isn't CRUD web development
doesn't have anything to do with software development.

~~~
rufugee
So I've covered all of these topics at some point in my career. What I'd love
to see is someone like Coursera come out with a "what you should know as a
computer scientist/software developer" curriculum...not only for myself, but
for those who work for me. I would personally love to go back through a review
of the various mathematics which are interesting for computer work, but I'd
also LOVE to have a complete curriculum to put promising young developers
through. You can teach concepts, but you can't teach attitude/demeanor (at
least, not as easily).

I know Coursera has released bits of this sort of coursework, but I don't
think they've put it in a guided, ordered curriculum form with preqs.

If someone puts this together, I'll gladly pay for it.

------
tptacek
Weirdly, I do better on these questions than I did on "Operating Systems",
despite having ~15 years of systems programming (including some kernel work)
and having barely high school math.

I have never needed to know the specifics of MESI cache coherence (I've
benefited from knowing about cache coherence, but not in fine detail), and the
problems I've worked on professionally have lent themselves better to commit
protocols than to pthreads synchronization.

On the other hand, I do a lot of crypto pentesting, so hashes versus MACs, RSA
padding, and basic stat are straightforward.

I don't have Colin's academic background but if I was him I'd be disquieted by
the fact that someone like me comes out OK on his math and less well on his
arch/OS stuff.

~~~
cperciva
I'm not particularly surprised. The mathematics here is generally pretty basic
material which other things build on top of -- and as you say, your crypto
background helps with those questions.

I don't remember seeing any emails from you... did you submit your answers?

~~~
tptacek
Of course not. :)

~~~
nitrogen
In light of this (and the fact that many others are doing the same), I don't
think you can interpret your results as being truly representative of the
industry, cpercival, though I do look forward to reading your final summary.

------
delinka
What I'm not getting about these is the "don't cheat" bit. First, let's be
clear: he's decided the 'rules' for this exam include not researching online
or getting other assistance and just answering from memory. "Cheating" is
"breaking the rules."

I work in the Real World, where I don't write academic data structures every
day, so I don't have this stuff memorized (there's a joke in there that start
"I've forgotten more than...") I use APIs to get work done. When I need to be
more academic, I search online. The only place anyone's ever required to be so
'academic' about algorithms, data structures, math, etc is in academia. Step
out into a job and you have an internet of active developers and useful forums
are your disposal. Even if a company hired me to write an 'academic' code
library, I'd do the research, write the implementation, and (after sufficient
testing, profiling, etc) forget about it.

~~~
alexkus
The more you "know" the less you have to go look for, and when you do have to
go look for stuff on the Internet (since you're not omniscient) then you'll
understand more of the pages you get back as results, and understand them
faster since there's less groundwork you need to cover.

An understanding big-O notation is very important if you go searching for
sorting algorithms and land on a page that has an overview of sorting
functions but listing bogosort as O(n!) and insertion sort as O(n^2), both
relatively easy to implement, and this relatively complicated iterative[1]
quick sort that's listed as O(n log n). Without other clues someone might just
implement the one that looks easiest, and it works fine on their test set of
20 elements, but blows up in the future as their application/site grows and
slows to a crawl sorting through millions of rows with an insertion sort.

Or, one of the most recent questions in the Mathematics part of the exam made
me snigger because someone I work with was asking what bignum library was
easiest to use. Rather than just answering their question (I prefer GMP
myself) I asked why; they wanted to do the equivalent of log( x^300 ) and so
they needed to compute x^300 first and then take the log of it; floating-point
was giving slightly inaccurate answers to what was expected when they took the
log of the intermediate result. It was even worse when they tried to do log(
x^9125 ) - or whatever exponent is used for daily interest on a 25 year
mortgage to account for leap days.

To put it another way. Without the knowledge you end up googling, and
implementing, the wrong thing. You may get the right answer/outcome in the end
but it may be horribly inefficient. Or, worse, an inaccurate answer/outcome;
and you may not even know.

Hammering the insertion sort analogy home; you can profile and optimise an
insertion sort all you like, it's still never going to approach the speed of
quicksort on mid-large datasets. Huge datasets are another matter, someone
stuck with "quicksort is the best" is going to struggle when faced with
hundreds of millions of rows and there are things less pessimal than
quicksort.

Knowledge of processor architectures, caching and how hash tables are
implemented means you know not to use a hash when a simple array will suffice;
but I've seen this done plenty of times.

Etc, etc, etc.

Out of the 15 questions so far there's only been two or three that the
knowledge of wouldn't have helped me at some point in my development over the
last 15 years. Of those that haven't applied to me I can see why they would be
important to other fields and other people.

It's one of the major reasons I decided to do a full Maths degree (part-time
via correspondence) to complement the Comp Sci degree I got ~15 years ago.
It's not just the obvious useful stuff like calculus, group and set theory,
graphs and networks, etc. Whilst I thought that the mechanics part of the
Maths degree wouldn't be very useful in my job I was surprised at how useful
and ubiquitous [non-]homogenous second-order linear differential equations are
in modeling things that are relevant to my job (populations -> user base,
hysteresis, etc).

No doubt others find huge amounts of other knowledge in other subjects that is
directly applicable to IT; psychology, education, design to name but a few.

The right choice of algorithm, and implementation, in a critical part of your
code could mean the difference between having 64GB of memory per server or
getting by with 16GB. It could mean 100 servers to manage your 1M users, or 10
servers; the difference between 100 servers and 10 servers to manage your 1M
users could mean the difference between making a profit and making a loss.
These could be the differences between your idea succeeding or disappearing.

1\. Another one of my common things I do at work is replacing quicksort
implemented using recursion with an iterative version so that the stack
doesn't keep getting blown on large datasets. I like it when people encounter
this problem and find out that stacks are finitely sized and not as big as
they'd hoped they would be. "But that means..." is a great thing to hear.

~~~
nadam
I am not an expert of this, but:

1\. I think the stack usage of Quicksort is O(log(n)). Which means it is
impossible to blow up the stack unless your dataset is bigger than the number
of atoms in the universe (but in that case how are you keeping it in memory?)

2\. For example in Java when you call Arrays.sort() this is called:

[http://cr.openjdk.java.net/~alanb/6905046/webrev/src/share/c...](http://cr.openjdk.java.net/~alanb/6905046/webrev/src/share/classes/java/util/DualPivotQuicksort.java.html)

(It apparently uses recursion)

I know quicksort, but not on this level: these guys researched lots of
different quicksort implementations and optmized the hell out of it. This
implmenetation is way longer to begin with than my naive quicksort
implementation would be and is called 'dual pivot quicksort', which I did not
hear about until now, despite I know how the traditional quicksort works.

~~~
gjm11
1\. It's average-case versus worst-case again. If you implement quicksort
recursively in the obvious way, then when you get unlucky it takes order-n
stack space as well as order-n^2 time.

2\. Yup, it recurses, and it looks to me as if it's vulnerable to blowing out
the stack in the worst cases. However, it's quite a sophisticated
implementation and in practice you're probably only going to see the worst
cases if you feed it actively malicious input (see, e.g.,
<http://www.cs.dartmouth.edu/~doug/mdmspe.pdf> for a paper about doing this).

[EDITED to add: (1) A little discussion of that paper on HN is at
<http://news.ycombinator.com/item?id=1723305>. (2) I don't know whether
McIlroy's "killer adversary" would actually work against this dual-pivot
algorithm without some modification; it depends on how well its heuristic for
telling when the pivot is being found works against this algorithm. I'd
_guess_ that it does work.]

~~~
nadam
You and barkell are right: I forgot the case of actively malicious input.
(other than that worst case cannot practically happen, at least I think,
taking a glance on this algorithm's pivot selection.)

This could be solved with randomization though. (Onlyy a few lines of code
needs to be changed.)

~~~
gjm11
Actually, McIlroy's "killer adversary" will work just fine against typical
randomized quicksort implementations! (It effectively constructs the data on
the fly, guessing when the sorter is trying to find a pivot.)

In that earlier HN discussion there was an extended (and frankly rather
fruitless) debate between me and another poster about whether McIlroy's
adversary can rightly be said to break randomized quicksort; that was
basically terminological, and what is not in question is that if you take a
typical randomized-quicksort implementation and feed it McIlroy's evil
comparator, you will get order-n^2 performance.

------
_pius
Meh.

These questions are indeed trivial, but obviously biased towards security —
your area of expertise. What makes these questions anywhere close to a
compelling measure of a developer's mathematical knowledge? Seems pretty
presumptuous to me.

~~~
cperciva
_These questions are indeed trivial_

Software development doesn't need a huge amount of mathematics -- but some
basic calculus, probability, and statistics, are pretty essential. I debated
whether to include the cryptography questions in this section, but I wanted to
include them somewhere and this was the most natural arrangement. (I started
out by writing about 25 questions, then threw 5 of them out to get a more
balanced set, then tried to figure out how to organize them.)

~~~
Silhouette
_Software development doesn't need a huge amount of mathematics -- but some
basic calculus, probability, and statistics, are pretty essential._

Sorry, but that's just nonsense. The vast majority of professional software
developers could go through an entire career quite happily without any of the
above, and no-one would suffer for it.

Moreover, those who do programming work in mathematical or security-related
domains probably need a much deeper understanding than the kinds of basic
information you're testing for here.

~~~
tzs
You receive a call at 3 AM from someone in the marketing department, asking if
there is a problem with the web site because they haven't seen an order in 20
minutes. How do you know if you should get up and investigate or tell them a
20 minute gap at 3 AM is nothing to worry about but call back if it reaches 45
minutes, if you do not know anything about probability and statistics?

You are thinking of including 30 days of toll free phone support with your
next product, and have calculated that for this to work out financially you
will need less than 1% of your customers to call support. How do you figure
out from the record of bug discovery and severity in your testing group's bug
tracker if your testing program the probability that your product as it stands
is good enough to meet that requirement, and how do you determine how
confident you should be of that result, if you do not know anything about
probability and statistics?

~~~
Silhouette
_You receive a call at 3 AM from someone in the marketing department, asking
if there is a problem with the web site because they haven't seen an order in
20 minutes._

If you could plausibly go 20 minutes without receiving an order, this isn't a
question about statistics, it's a question about PR.

Moreover, the kind of place that doesn't take an order more than every few
minutes probably doesn't pay someone to be on call at 3am.

And what does the marketing department have to do with this? How is that I am
running a web site that is important enough to wake me at 3am because someone
isn't sure if there's a problem, yet we don't have any operations people who
know the normal working of the system well enough to make the call on when
something is broken?

Like most contrived examples of how "important" statistics is in software
development, IME, yours makes no sense when considered in a realistic context.

 _You are thinking of including 30 days of toll free phone support with your
next product, and have calculated that for this to work out financially you
will need less than 1% of your customers to call support._

Again, in any realistic scenario, that little "to work out financially"
probably involves far more complicated considerations than some basic
probability distribution, and if this choice is being made by a random
programmer based on a simplistic model then you're doing it wrong.

------
cperciva
Apologies to people still waiting for me to grade their part 1 and part 2
answers -- it's a slower process than I hoped for; and I have a big stack
(1065 pages, in fact) of scholarship applications which I need to review
tonight, so I won't get any more marking done until tomorrow.

------
mstepniowski
Dont't know whether it's good or bad but these questions closely resemble the
ones I've been asked during my final exam (Faculty of Mathemathics,
Informatics and Mechanics at University of Warsaw).

Of course there were a few nontrivial questions too, like the one abou fixed
point theorems.

------
nitrogen
I'm surprised by how many comments on cpercival's exam series boil down to
"I'll never use this!" Especially since this is Hacker News, almost ground
zero for intellectual self improvement. I can't claim to remember (or have
even learned, in some cases) much of this stuff, but decisions made without
any knowledge of e.g. calculus, probability, etc. will be very sub-optimal.

~~~
jasim
I'd love to know a few of those decisions that you've to take on a day-to-day
basis (even once every month) as a software developer, which needs calculus
and probability.

~~~
nitrogen
Since much of my work involves audio and video, practically everything I do
can be thought of in mathematical terms. I'm not explicitly writing a function
that says "integrate this" or "differentiate that" (except for the Accumulator
and Differentiator plugins in my automation logic system[0]). I do write
things, both for fun and profit, that have patterns similar to integration and
differentiation.

Related to what I said in another comment[1], there's rarely a binary
influence from my (admittedly very limited) mathematical and theoretical
background, where a decision would go one way with it and another without.
Rather, my understanding of calculus and probability shapes the way I think
about and approach problems.

For example, when I see a problem that looks like one of gathering data and
accumulating a result, I can recognize that it looks similar to a numeric
integral, and apply what I know about integration to solving the problem.
Integrals and derivatives are also constantly in my mind when I think about
the physics of the world around me.

Probability, especially Bayesian reasoning, also influence the way I perceive
and approach problems, both in software and in reality. Consciously thinking
about prior expectation combined with prior confidence, even if I'm not
thinking in numeric terms, helps me to understand how my mind works, know how
to communicate new, unintuitive ideas to others, and form more concrete
thoughts about my environment.

In summary, a very rudimentary understanding of calculus and probability
theory are necessary to understand and reason about (i.e. predict) the
physical world. Moreover, I have a general desire to learn all that I can and
integrate it into a web of concepts, which I reason about as a combination of
directed and undirected graphs, where concepts are vertices, relations between
concepts are undirected edges, and influencing concepts and prerequisite
knowledge are directed edges.

[0] <http://www.nitrogenlogic.com/docs/palace/> [1]
<https://news.ycombinator.com/item?id=4630006>

------
raverbashing
Well

Some think crypto is essential. Some think it's merely a tool and even though
having guidance in using these tools is important, no need to go to a lot of
details.

I could point out a lot of mathematics that are important as well and 99% of
devs there are oblivious to it.

\- Newton-Raphson method

\- Gaussian elimination

\- Simplex method

\- Z transform

\- Bezier curves

\- Linear algebra in general

\- Galois fields

------
jheriko
I see very few questions relevant to being able to ship software...
interesting for precisely that reason though.

------
jaimzob
:) This is surely just degenerating into parody now. I can't wait for part 4:
"Build a Turing Machine from scratch using paper-clips and sticky tape and
send it to me for grading. Remember to show your working and no looking it up
on Google."

~~~
irahul
Most of the questions are pretty trivial and relevant(may be not to you but
that doesn't make it irrelevant or a parody).

<http://news.ycombinator.com/item?id=4635946>

~~~
orangethirty
It is irrelevant because the real world is composed of CRUD apps in [Java, C#,
PHP], and you don't need to know most of the subjects the OP talks in his
"tests." It is a parody, because the OP seems to not have worked a common
software position, where most time is spent building new UIs for marketing or
management, making complex and awkward joins, and making sure changes don't
break the spaghetti. It all sounds like academia talk, which is fine (and
valuable), but not a real sign of real world programming.

I will say that the tests have been fun to complete, and have helped me fill
in the gaps here and there. But as someone who is hiring programmers at this
very moment, I would not hire someone with such an approach to programming.
This person would (I assume from experience) write complex code all day to
show off his/her knowledge of advanced CS topics. Then not document it because
the code is just obvious to read. And finally quit after a month because the
job is not up to his/her standards or challenging. They would then write a
blog post ranting about how programming has turned into a circus or produce a
series of "tests" to show off their superior knowledge.

~~~
irahul
> It is irrelevant because the real world is composed of CRUD apps

> but not a real sign of real world programming.

<http://en.wikipedia.org/wiki/No_true_Scotsman>

Real world? As opposed to the OP's world which would be called what? The
Matrix?

> It is a parody, because the OP seems to not have worked a common software
> position,

And for some reason, only the people who have worked for common software
positions can have opinions about software?

A good majority is making CRUD apps, yes, but the spam classification for
GMail needs to be written, the binary data packing has to be done for Dropbox,
Facebook has to detect faces, that small startup doing a storage engine for
MySql has to understand B-tree, automated translation has to understand n-gram
modelling, tarsnap has to make sure the data is secure, and so on and on.

> But as someone who is hiring programmers at this very moment, I would not
> hire someone with such an approach to programming.

Well, if all you are doing is writing CRUD apps, I don't see how someone like
OP is going to be even remotely a good fit. You need a mechanic, you hire a
mechanic; you don't go looking for someone who can design a V engine.

~~~
orangethirty
_Well, if all you are doing is writing CRUD apps, I don't see how someone like
OP is going to be even remotely a good fit. You need a mechanic, you hire a
mechanic; you don't go looking for someone who can design a V engine._

Funny that you wrote that. I used to be a mechanic, and have modified my share
of Porsche 911s. Have a couple that put out above 600 horsepower when measured
at the wheels. Almost doubling their factory output. Designed turbocharger
systems, intake manifolds (which was difficult due to me not previously
knowing much about fluid dynamics , and such), intercoolers, stand alone fuel
injection systems, and aerodynamics with composite materials (mostly
fiberglass and carbon fiber). If you ask me right now about any of the
subjects I would draw a blank. Why? Because I learned how to do something back
then for a couple of projects, and then moved on. I did not need to know fluid
dynamics for my day job fixing cars. Neither did I need to know much math.
Just needed to know how to do the basics that the work required. Which were
mostly things learned in the field.

Same with programming. I started out writing programs that solved a problem.
Then continued with such an approach until hitting a wall due to lack of
mathmatical knowledge. Learned whatever needed and moved on. Do I remmeber
most of the math I've had to learn? Not really. I don't use it everyday. If I
have to use it again, I'll just go to my reference material and refresh my
memory.

 _Well, if all you are doing is writing CRUD apps, I don't see how someone
like OP is going to be even remotely a good fit. You need a mechanic, you hire
a mechanic; you don't go looking for someone who can design a V engine._

Problem is that all these tests do is promote the idea that real-world
programming inside the matrix is about CS. Its not. Not knowing the answers to
the tests created by the OP does not make anyone a bad programmer. Hell, the
most productive programmer I know used to work with Visual Basic and
Excell/Access all day long. His code served thousands of users and he shipped
something out every week. When I asked him about big O notation his face drew
a blank. But boy could he knock out software in a couple of days.

------
pja
This fun. I'd forgotten how much of this stuff I used to know.

(ps, for Q3: if you know the formula for the Harmonic series then you can
answer this one very easily. If you don't, you're probably going to be a bit
stuck.)

------
alexkus
Sent my answers in, reminds me how much I've actively avoided stats (I much
prefer pure Maths).

------
chuppo
I do not see how a web developer or DBA will feel better about themselves and
more confident in what they do if they knew the answer to these questions. It
isnt even related to the work a Java EE architect does.

~~~
cperciva
Since you say web developer... don't you think it's useful for web developers
to have some idea how much traffic is needed for A/B testing to produce
meaningful results?

~~~
pgsandstrom
Once you decide to do A/B-testing you only need to know enough about
probability to know that it is very hard to guess the amount of traffic
needed. That's when you google to remind yourself how to do the actual
calculations.

~~~
pja
Knowing that you don't know how to correctly calculate the traffic required to
reach a specific confidence for an A/B test makes you better than all those
people who don't even know that they don't know, and just wing it.

(edit: remove multiple calculates!)

