
Evidence That Computer Science Grades Are Not Bimodal [pdf] - moyix
http://www.cs.toronto.edu/~sme/papers/2016/icer_2016_bimodal.pdf
======
hyperion2010
Having worked with many of my fellow graduate students to try to teach CS
concepts to incoming students without any CS background my conclusion here is
that we simply don't know how to teach computer science. My suspicion is that
many people who learn CS usually do it inspite of attempts to teach them, not
because of instruction. Think about how long it took humans to figure out how
to teach reading to _everyone_. We had to break the old whole reading method
and switch over to phonics, and that was decades after we had the hard
scientific evidence that phonics was the missing perspective needed to get
that last percentage of students to read. We are probably decades away from
understanding how to teach CS to everyone, let alone getting those techniques
implemented.

~~~
GregBuchholz
>We had to break the old whole reading method and switch over to phonics, and
that was decades after we had the hard scientific evidence that phonics was
the missing perspective needed to get that last percentage of students to
read.

[https://en.wikipedia.org/wiki/Phoenician_alphabet#Spread_of_...](https://en.wikipedia.org/wiki/Phoenician_alphabet#Spread_of_the_alphabet_and_its_social_effects)

~~~
taeric
I'm not clear on what the link is supposed to be getting me. Phonetics and
Phoenician are two different things. Right?

~~~
jdmichal
I agree that the post is severely underspecified. I think the point is the
last paragraph of the linked section:

> Phoenician had long-term effects on the social structures of the
> civilizations which came in contact with it. As mentioned above, the script
> was the first widespread phonetic script. Its simplicity not only allowed it
> to be used in multiple languages, but it also allowed the common people to
> learn how to write. This upset the long-standing status of writing systems
> only being learned and employed by members of the royal and religious
> hierarchies of society, who used writing as an instrument of power to
> control access to information by the larger population.[11] The appearance
> of Phoenician disintegrated many of these class divisions, although many
> Middle Eastern kingdoms such as Assyria, Babylonia and Adiabene would
> continue to use cuneiform for legal and liturgical matters well into the
> Common Era.

That is, Phoenician was the first wide-spread phonetic script. That, in turn,
had major effects on civilizations as it was simple enough for commoners to
learn. _That_ being the rather tenuous link back to the original idea of
teaching reading based on phonetics.

Of course, there is a bit of a gap here, in that English phonetics are
notoriously unstandardized across the lexicon.

~~~
thaumasiotes
> Phoenician was the first wide-spread phonetic script. That, in turn, had
> major effects on civilizations as it was simple enough for commoners to
> learn.

The cuneiform of the time was, at heart, a syllabic script; while a phonetic
script is arguably simpler, it's not much simpler. The real problem with
cuneiform is not that it was syllabic instead of alphabetic, it's that you had
plenty of options other than syllabic representation, as well.

(And written Egyptian, the parent system of the Phoenician alphabet, was
already genuinely phonetic! It didn't have vowels, but neither did the
Phoenician script.)

All that said, that sort of shortcut-encrustation seems to develop naturally
within all writing systems. If I send someone "have a good day" and get back
"u 2", that's exactly the kind of shortcut that we blast ancient writing
systems for using. But that shortcut, in modern English, actually represents a
very high literacy rate, and use of the writing system by someone who can't be
bothered to observe its formal conventions. Under low-literacy conditions,
educated people tend to scrupulously observe whatever weird conventions their
writing system might have. That's how you know they're educated.

~~~
jdmichal
I'm assuming by Egyptian you mean Hieroglyphics. (They also used Hieratic and
later Demotic.) Hieroglyphics are a mixed phonetic-logographic system. Even on
the phonetic side, there are many ways to spell the same set of consonants,
thanks to the existence of digraphs and trigraphs and repetition. The vowels
probably differed, but those aren't written. The spelling for a particular
word was usually fixed, though.

Wikipedia has an excellent example using _nfr_ in its determinatives section:

[https://en.wikipedia.org/wiki/Egyptian_hieroglyphs#Determina...](https://en.wikipedia.org/wiki/Egyptian_hieroglyphs#Determinatives)

So, Hieroglyphics had phonetic components, but it was far from being a
phonetic abjad like Phoenician. (An abjad is an alphabet without vowel
markers.)

~~~
thaumasiotes
Hieroglyphics are a mixed phonetic-logographic system in the same way that
English is a mixed phonetic-logographic system, where the second person
pronoun can be represented by "you" or by the logograph u. How much is that
hurting our literacy rate?

~~~
jdmichal
"u" and "you" have exactly the same pronunciation /ju/. "2" and "two" and "to"
and "too" all have exactly the same pronunciation /tu/. That is, phonetically
there is no difference -- again, English has terribly inconsistent phonetics.
These shortcuts arose due to texting applying negative pressure to the length
of words, both in maximum message size and ease of input.

Compare this to Hieroglyphics, where the glyph representive of a loaf of bread
could mean either an actual loaf of bread or the abstract phoneme /t/.
Specifically, a logogram represents an idea regardless of its pronunciation.
This is not what we see in "text-speak", where words were replaced with
shorter methods of achieving the same phonetics. Or, in other words, reading
the text "u 2" is /ju tu/, which is exactly the same as what you would read
with "you too". The matched phonetics are an integral part of this
replacement.

~~~
thaumasiotes
> Compare this to Hieroglyphics, where the glyph representive of a loaf of
> bread could mean either an actual loaf of bread or the abstract phoneme /t/.
> Specifically, a logogram represents an idea regardless of its pronunciation.
> This is not what we see in "text-speak"

"u" does not have the pronunciation /ju/, it has the _name_ /ju/.

And we specifically _don 't_ see that egyptian glyphs "represent an idea
regardless of its pronunciation". Look at the story of the decipherment:

> Champollion focussed on a cartouche containing just four hieroglyphs: the
> first two symbols were unknown, but the repeated pair at the end signified
> 's-s'. This meant that the cartouche represented ('?-?-s-s').

> Champollion wondered if the first hieroglyph in the cartouche, the disc,
> might represent the sun, and then he assumed its sound value to be that of
> the Coptic word for sun, 'ra'. This gave him the sequence ('ra-?-s-s'). Only
> one pharaonic name seemed to fit. Allowing for the omission of vowels and
> the unknown letter, surely this was Rameses.

(
[http://www.bbc.co.uk/history/ancient/egyptians/decipherment_...](http://www.bbc.co.uk/history/ancient/egyptians/decipherment_01.shtml)
)

That's the glyph "sun" being used because of the phonetic value of the word
"sun" coinciding with part of a name. The idea goes unused (at least, the
glyph is not marked with the logograph mark). That's precisely what we see in
text-speak.

(Side note: while the vowel of Coptic "sun" might be /a/, this is a rare case
where the ancient Egyptian vowel is known, and it is /i/. Fortunately, the
general omission of vowels makes this mistake irrelevant.)

> "2" and "two" and "to" and "too" all have exactly the same pronunciation
> /tu/

This is not correct. "to" is a clitic; its pronunciation differs from the
others, which means in particular that "to" and "2" have different
pronunciations. That doesn't stop "2" from substituting for "to" on occasion.

~~~
jdmichal
When you see an isolated letter "u" in English, you say /ju/. When you see the
word "you", you also say /ju/. This fact is what allows "u" to be shorthand or
abbreviation for "you". Perhaps for you there is no distinction between a
shorthand and a logogram. But, in that case, I don't see how you also couldn't
argue that "you" is a logogram for the abstract idea of the second person.

In reality, we separate between logograms and alphabets by the use of symbols
representing ideas or sounds. In the English example, "u" is being used as a
non-traditional phonetic digram /ju/. Your example from the Rosetta Stone was
a phonetic use of the symbol. There are also non-phonetic uses, where the
symbol represents the sun, regardless of its phonetics. This is why it's a
mixed system.

The Egyptians marked semivowels /j~i/ and /w~u/. Egyptologists use them as
vowels, because they have to use _something_ and that's as good a thing as
any. We have no idea how any ancient Egyptian word was realized.

Everyone I know produces all the variations of "to" the same. Maybe there is a
dialectic difference.

EDIT: I just had a thought. Let's use an English example where the shorthand
is not isolated. Let's look at "r8" for "rate". This is obviously _not_ a
logographic use of "8", as the "r" is still necessary for the meaning.
Instead, "8" is again being used as a non-traditional phonetic gram, the
trigram /eɪt/. The use of "u" and "2" is exactly the same; their use is as
phonetic grams and not logograms.

------
a_puppy
This is mostly unrelated to the main topic of the article, but this paragraph
caught my eye:

> For example, Padavic et al. [20] found that the "work-family" narrative in
> business is an example of a social defense: people will say that women leave
> the workplace because of "family", despite the large amount of evidence that
> women leave their jobs because of inadequate pay or opportunities for
> advancement [20], particularly when they see male co-workers promoted ahead
> of them. The "work-family" narrative is a more palatable explanation rather
> than to confront sexual discrimination in the workplace, and so the
> narrative continues. > > [20] I. Padavic and R. J. Ely. The work-family
> narrative as a social defense, 2013.

I tried to track down the work by Padavic et al. that Patitsas et al. are
citing, and found this:

[http://www.hbs.edu/faculty/conferences/2013-w50-research-
sym...](http://www.hbs.edu/faculty/conferences/2013-w50-research-
symposium/Documents/Gender_and_work_web_update2015.pdf)

However, this article by Padavic et al. is actually saying something different
from the way that Patitsas et al. summarized it. Padavic et al. appear to
actually be arguing that the work-family narrative is a social defense against
the uncomfortable truth of excessively long working hours for both genders at
the corporation, not against any uncomfortable truth about sexual
discrimination. Also, Padavic et al. don't seem to discuss sexual
discrimination in wages or promotions at all.

Did Patitsas et al. make an error in citing their sources? Am I missing
something?

~~~
danieltillett
Mis-citation occurs all the time. When I was a professional scientist I could
only find the source for a claim I was interested in less than 25% of the
time. I would find some claim and about 50% of the time the reference cited
another reference and so on until the chain broke. The other 25% of the time
the reference cited said something completely different to what was being
claimed. Very frustrating.

~~~
tjl
During my Ph.D. I tried tracking down the reasoning behind a couple sets of
assumptions made in non-uniform torsion of beams (e.g., twisting a beam with
one end fixed). It turns out that they just kept the same assumptions from
uniform torsion (just twisting the beam but with no ends fixed) without
checking if they still held (they don't). All because nobody thought to check
if the assumptions might be in conflict. The uniform torsion models go back to
the mid 1800s, while the non-uniform torsion models are from the early to mid
20th century. So, it's been over 60 years people have been working with wrong
assumptions.

------
mcguire
" _Are CS grades bimodal, or unimodal? To test this, we ac- quired the final
grades distributions for every undergraduate CS class at the University of
British Columbia (UBC), from 1996 to 2013. This represents 778 different
lecture sections, containing a total of 30,214 final grades (average class
size: 75)._ "

My understanding of the bimodal situation, if it exists, is that it primarily
applies to earlier classes---later classes only include those who did well, or
at least passed, the previous classes.

Ah, yes...

" _Of the 45 classes which were multimodal, 16 were 100- level classes (35%),
5 were 200-level (11%), 12 were 300-level (27%), and 12 were 400-level (27%).
For comparison, in the full set of 778 classes, 171 were 100-level (22%), 165
were 200-level (21%), 243 were 300-level (31%), and 199 were 400-level (26%)._
"

How about we take a closer look at those 100 level classes, hmmm?

~~~
nerdponx
I remember my real analysis professor explicitly telling us that he ran the
numbers on one of our exams and found bimodality. Real analysis is not what I
would call an earlier class. I'm not sure if we were an outlier or a
representative data point, however.

~~~
trendia
If there are two major topics in a test (say Topic 1 and Topic 2), then
bimodality could arise simply because

* some people studied both Topic 1 and Topic 2

* some people studied Topic 1 more than Topic 2

* some people studied Topic 2 more than Topic 1

This would cause bimodality even when inherent "genetic" skill is the same.

~~~
elcritch
If this were be turned into a more rigorous MCMC style simulation and fed
basic societal trends of high school and early stage college on courses taken,
interest, aptitude, etc you might be able to get an estimate on the size of
effect of these factors. Hmmm...

------
scott00
The whole premise of the survey part of this paper strikes me as very flawed.
The idea was "show some professors histograms we know are normal, and see if
they see bimodality that isn't there". The problem is, they didn't show them 6
random histograms of the normal distribution. For one, they didn't actually
show them normal distributions at all; they showed them normal distributions
capped at 100. That's going to give you a point mass at 100... making it a
bimodal distribution! Second, they didn't give 6 random examples of a capped
normal histogram; they gave them 6 examples chosen from a random set of
histograms chosen in such a way that 4 of the 6 look bimodal. The survey
participants didn't see bimodality that wasn't there, they saw bimodality that
was purposefully generated by the histogram selection methodology!

------
nerdponx
Aren't final grades often "curved", i.e. re-graded according to rank? That
would explicitly destroy bimodality, making the final grade distribution
consist of non-iid order statistics.

Also, it's commendable that they point out that they expect 5% false
positives, but frustrating that they don't go further and explicitly plan for
a multiple testing correction procedure. It seems that they don't need it to
fail to reject the null "meta-hypothesis", but still.

~~~
mcguire
The curving process should be linear, I think.

~~~
BeetleB
The curving process shouldn't exist.

If you make an exam that is difficult enough to need curving, you're getting a
poor measure of ability. This is because exams ask only a few questions, and
unreasonably _difficult_ exams result in low scores even from high achieving
students. Low scores are very susceptible to noise (the delta between 50% and
60% is greater than between 85% and 95%).

If that doesn't convince you, take my argument to the obvious extreme. The
Putnam Competition in mathematics is tough. Sometimes over half of the people
score 0. Getting 1 question correct (out of 12) at times puts you in the top
20%.

Imagine I gave this as an exam to a math class of 20 students. One person
scores a 1. The rest score 0. Is it meaningful to curve this? I could correct
by giving partial credit: Some people get 0.25, others 0.5, and others 0.75,
so we now have 5 different grades. Should I just give A, B, C, D and F?

The lower the scores, the higher the effect of noise. It's a bad idea.

~~~
smallnamespace
> Low scores are very susceptible to noise (the delta between 50% and 60% is
> greater than between 85% and 95%)

This seems untrue from information theory.

The entropy of a question is maximized when its probability of being answered
correctly is exactly 50% [1]. If your only goal is to have the _least_ amount
of measurement noise given a fixed number of questions, then you'll want each
question to be hard enough to filter 50% of the class out, and to minimize the
correlation between questions.

For example, 10 independent 50% questions reveals as much information as 16
independent 85% questions.

> Imagine I gave this as an exam to a math class of 20 students. One person
> scores a 1. The rest score 0. Is it meaningful to curve this?

You're looking at only 1 tail but ignoring the other. By symmetry, a question
that only 1 person gets correct tells you exactly as much as a question that
19 get correct. An exam with a 90% pass rate (before curving) is no better
than a 10% exam.

[1]
[https://en.wikipedia.org/wiki/Binary_entropy_function](https://en.wikipedia.org/wiki/Binary_entropy_function)

~~~
BeetleB
First, while you may have a point, you misunderstood my percentages. I did not
mean them to be the _pass_ rate, but rather the _score_.

> An exam with a 90% pass rate (before curving) is no better than a 10% exam.

I do not encourage a 90% pass rate. I do not endorse exams where most of the
students score 90%. I'm saying an exam which allows for a large variation in
scores (anywhere from 10 to 100) yields more information, whereas an exam
where a really brilliant student will get 50, with the next highest score
being a 30 by someone who is _only_ very smart, tends to be less likely to
yield useful information about most of the students who will get 20 or less. A
fairly bright student and a fairly average student may both score a 20 on such
a test - yet the test failed to distinguish between them.

(BTW, I had an instructor whose exams were like this - I think I once had the
highest score at around 25-30 out of 100).

A less demanding, but not trivial test, will separate out the average from the
brighter.

Regardless, why the need for a curve? Your grading system should not depend on
which students are present. It can lead to poor students getting a good grade
and smarter students getting a poorer grade - in different semesters, for the
very same tests.

~~~
aianus
> Regardless, why the need for a curve? Your grading system should not depend
> on which students are present. It can lead to poor students getting a good
> grade and smarter students getting a poorer grade - in different semesters,
> for the very same tests.

The distribution of skill between one 300-person calc class and the next is
going to vary much less than the difference in teaching styles and exam
difficulties. This is exactly why a curve is required -- so that your grade
reflects how well you perform relative to others doing the same thing instead
of some absolute level of competence that would vary from school to school,
prof to prof, semester to semester.

Consider an employer or graduate school admissions committee that needs to
decide who to interview. Looking at curved grades makes it easy to pick the
top X% of students, whereas looking at uncurved grades leaves a lot more to
chance (maybe a C was the highest grade in your section, but there was an
easier prof the following year where the highest grade was an A).

~~~
BeetleB
That works fine when you have classes with 100+ students. In the ones I
attended, it would range from 15 to 40 (the latter being considered high).
Lower numbers tend to be impacted more by noise.

>Consider an employer or graduate school admissions committee that needs to
decide who to interview. Looking at curved grades makes it easy to pick the
top X% of students

As an employer, I'm not interested in the candidate's ranking in the class.
I'm interested in their skills. While one is often used as a proxy for the
other, I do not.

As a student, I want feedback on how much knowledge I learned, not how I did
in comparison with the class. This was the original purpose of scoring tests.

Having gone through the PhD route, I know that "A" grade students who were
always focused on the metric of relative ranking rather than knowledge
acquired eventually were more likely to do a poor thesis or drop out, compared
to "A" grade students who were focused on acquiring knowledge.

This was more acute from students who came from top undergrad schools: Very
competitive background with heavy curving - and they would take their A as a
faulty indicator that they were "doing well". In grad school, even though the
courses are more challenging, most professors give A's and B's. Only rarely
were C's given. The professors want to focus on learning and theses - grades
are a distraction. Suddenly these students were getting A's, thinking they
were doing well and not learning much. Their internal barometers were
measuring the wrong thing, so their research suffered.

~~~
aianus
> As a student, I want feedback on how much knowledge I learned, not how I did
> in comparison with the class. This was the original purpose of scoring
> tests.

My university automatically attached our grades to all internship
applications. It's pretty clear the purpose of grades (other than pass/fail)
is employer or grad school evaluation, not for student feedback. For better or
for worse.

~~~
BeetleB
Which is why in grad school, many professors subvert this by never giving C,
and only giving B's if you're fairly poor. They lost the battle for undergrad,
but grad school (at least in science/engineering) is still their domain.

------
yodsanklai
"We theorized that the perception of bimodal grades in CS is a social defense.
It is easier for the CS education community to believe that some students
“have it” and others do not than it is for the community to come to terms with
the shortfalls of our pedagogical approaches and assessment tools."

I didn't read the whole article, but I'm not sure I agree with the conclusion.
Even with a standard gaussian distribution, one may still believe that some
students have it and some don't. Let say there is consistently 25% of students
that are unable to pass their exams (for various reasons: lack of interest,
motivation, intelligence, discipline...), why would a "social defense" be
needed?

While I agree that we always should try to improve our pedagogical tools, I
don't believe that anyone can learn anything provided they have a good
teacher.

~~~
throwaway729
Read section 4 of the paper.

They fix some data that _as a matter of mathematical fact_ is/is not bimodal,
and then ask people who do/don't believe in the "geek gene" hypotehsis to
interpret the data.

Conclusion of the section: "We found a statistically significant relationship
between seeing-bimodality and participants’ responses to the questions
relating to the Geek Gene hypothesis"

So the "social defense mechanism" theory is definitely debatable, but
apparently people who believe in the "Geek Gene hypothesis" are more likely to
see bimodality where there is none.

 _> I don't believe that anyone can learn anything provided they have a good
teacher._

The point isn't that _everyone_ can learn _anything_. The point is just that
the distribution isn't as bimodal as people seem to think it is. Also worth
noting that the researchers posit _normal_ distrbutions, not _uniform_
distributions, as the alternative to bimodality.

~~~
Bartweiss
I'm less impressed by "wrong about rigged data" results than most people seem
to be. Yes, it implies that people bring assumptions to their answer, but is
that really shocking?

People with a high prior for a thing assume that new data is most likely to
conform to that prior. If you handed me a questionably-bimodal data set, my
judgement of its distribution would absolutely depend on what you told me the
data represented. Hopefully I'd answer right if I sat down and analyzed the
thing in depth, but if you simply go "what kind of distribution does this look
like?" then I'm going to include my outside expectations.

Yeah, there's a risk of bias here, largely from people not conserving
expectations (if you already assume Geek Gene is true, unfavorable data should
count more than favorable data). But saying "people used experience to
interpret new data!" doesn't seem like it's proving much.

~~~
ubernostrum
There's a "risk" of bias here? The point seems to be that the professors don't
have a good prior -- they have a bias and are confusing it for a prior, and
that's what was demonstrated by the experiment.

(that, and perhaps CS professors aren't particularly good at statistics)

~~~
Bartweiss
I disagree with the "bias not prior" claim.

The professors were shown constructed data, and primed to apply a prior to it
about grades. Prompting someone to use a prior inappropriately isn't the same
as revealing a bias.

I see what you're getting at (if they wrongly assessed these curves, why would
they be right about real data?), but there's a connective step that seems
missing. If you show me a histogram and ask if it's bimodal, I can't tell you
with certainty, so I have to guess. That's partly by shape, and partly by my
knowledge of what the data is. Misjudging an actively-misleading histogram
doesn't prove that I'm bad at assessing real data. (Since they just said 'yes'
or 'no', I wouldn't accuse them of being bad at statistics - they didn't
computer an answer.)

So _even if_ real CS grades are bimodal, I would have expected the 'priming'
result they found here. The professors were wrong, but this doesn't seem to
have much predictive value.

More importantly, though, this __whole study __is bad. Section 3 is based off
a single university 's final grades, with no discussion of whether assignment
grades were being curved before going into the final score (they are in at
least some universities). So it's entirely possible that they took _normalized
data_ and found that it had normal distributions. Section 4 is unconvincing in
several ways. "Assess this constructed data, maybe we lied about the origin"
isn't a direct analogue to "are your actual grades bimodal?". "Assess this
data _for bimodality_ " threatens a priming effect that's incomparable to
unprompted observations about grades. Priming effects are struggling to
replicate, and may simply not work the way these researchers expected at all.

The study does an admirable job of _acknowledging_ a lot of this, like the
researcher-priming risk, but it doesn't actually _remedy_ any of them. So I'm
not sure what to say except "any of these methods could be meaningless".

------
Animats
Programming tests amplify modality because they require success on multiple
items to succeed.

Consider teaching N skills. Success is binary, uncorrelated and everyone
randomly succeeds some high percentage P on each skill. Scoring is the number
of binary successes. You'll get a Gaussian distribution. That's most of
education.

Now suppose scoring is 1 if all N skills are learned. You'll get a bimodal
distribution. That's programming tested on whether the program runs.

~~~
aidenn0
There are two links discussing that in a blog post by the author of this
article[1] (second to last paragraph)

1: [http://patitsas.blogspot.ca/2016/01/cs-grades-probably-
more-...](http://patitsas.blogspot.ca/2016/01/cs-grades-probably-more-normal-
than-you.html)

~~~
Animats
Sadly, they're both paywalled.

------
makmanalp
I get especially mad when people suggest that there are some ideas that are
just special, that only special people can get. Pointers and recursion are at
the top of this list.

Take Joel Spolsky, a person whose ideas I read and respect:
[http://www.joelonsoftware.com/articles/GuerrillaInterviewing...](http://www.joelonsoftware.com/articles/GuerrillaInterviewing3.html)

> For some reason most people seem to be born without the part of the brain
> that understands pointers. Pointers require a complex form of doubly-
> indirected thinking that some people just can’t do

Really? Here's my 20 dollar challenge: give me a person and I can teach them
pointers. Pointers, referencing and dereferencing, null pointers, pointers to
pointers, the whole deal.

The problem is the pedagogy. Learning a new abstraction and a new way of
thinking takes _time_. It's easy to get it but really not /get/ it. You have
to do a lot of examples. One on one with a person, with real time feedback.
Until they internalize it. Many many many simple examples at first, and then
examples of using it in a real world context. Then more. If you already know
this abstraction, consciously or unconsciously, it's impossible to un-know it,
and the reaction when someone is struggling is "I explained it as simply as I
can and I don't know why they still don't get it". It's not like that. It's
like basketball: you can explain how to shoot, but really the student needs to
do it, and you comment while they're doing it.

~~~
ap22213
Absolutely - I'm sure that a bit of innate talent can improve one's rate of
learning, but 90% of the willing can do it. The best programmers that I've met
just loved the art and science of computing, had respect for their craft and
very high standards of quality, were undaunted or even motivated by new
challenges, but most importantly, put in the necessary time.

I doubt things have changed much since I went to college in the 90s, but back
then, the ones that did well at CS either had been hacking computers since the
Apple II or spending enormous amounts of time at the lab (or both). Also, when
at the Microsoft campus in the 90s, it was not a coincidence that the most
expensive cars were the ones that were in the parking garage the latest.

~~~
slmyers
> The best programmers that I've met just loved the art and science of
> computing, had respect for their craft and very high standards of quality,
> were undaunted or even motivated by new challenges, but most importantly,
> put in the necessary time.

This description doesn't conjure the image of a person unable to understand
pointers.

------
JoeAltmaier
A class (30?) students is too small to produce a curve which is unambiguously
either normal or bimodal. Its always noisy. So if the CS profs, with their
thousands of hours of experience with students, chose the bimodal
interpretation I for one tend to believe it.

------
k__
I don't know much about this, but I wasn't good at studying.

I'm totally living for development and everything to do with it, but most of
the formal stuff completely eluded me.

I failed every math lecture at least once.

Because of my projects and thesis I still finished with a 2,7 which isn't good
but also not really bad. Maybe I would have been one of those "college-
dropouts who followed their dream" if companies in Germany didn't look so much
on your degrees.

Also I never had the impression our grades were bimodal.

Yes about 40% dopped out and yes there were a few "eager beavers", but most
were simply okay enough to finish their degrees.

~~~
groundhogday1
I'm about the opposite of you. Not a good programmer, but I excelled at the
'formal stuff.' Automata theory, regular expressions, compilers, operating
systems, discrete math... these are all areas where I did well. Labs and
programming assignments held me back. Although I will say that I feel
infinitely more comfortable working with memory allocation in C than I do with
anything involving JavaScript.

I also don't think the overall grades were bimodal. I guess I was in the
second mode in programming intense courses and the first mode in theory heavy
courses. I also feel that most of my peers were either the same as me or the
reverse, like you. Wouldn't that create a single mode bell curve?

Of course, this is all purely anecdotal

------
toomanybeersies
Oh, useful anecdote time!

I was talking to a professor a while back about this, and he reckoned that
when the University of Canterbury (in NZ) switched from Java to Python for
their 100 level papers, the grades went from bimodal to single-modal. Not sure
why that is, and I don't remember him giving me a reason, but I guess it's
fair to assume that a lot of students were really struggling with the initial
learning curve of Java (boilerplate, static typing, more difficult array and
string manipulation) compared to Python, and either just not getting it, or
getting frustrated and giving up.

Now I didn't see any numbers to back up his claims, as it was a casual
conversation, but I'd be inclined to believe him that they were bimodal.

Anyway, the point I'm making is that the study here references only one
university, different universities have different curriculum and different
teaching styles, which could affect whether the distribution is bimodal or
not.

------
lifeisstillgood
I am not sure we should assume bimodality in programming means we are all
special snowflakes. That's a dangerous and self serving interpretation

It seems bimodality is correlated with "does it even compile" \- the first
hump is those who can't get it to compile and the second is those who can but
then spread out normally on ability.

I would conjecture that the first hump would be seen in any educational
environment where, for example we took illiterates and made them write essays.
Those who had tried reading and writing in high school would have a better
chance of putting ink on paper.

We just are seeing an artifact of software not being taught early enough in
everyone.

Edit - how does my spelling corrector turn bimodality into bumps skirt...

------
htns
They don't consider participation or hours spent studying as explanations for
bimodality. This makes their method of determining belief in the "geek gene"
questionable, and could invert the moral story of their results: Perhaps
lecturers seeing low lecture attendance would like to believe their grade
distributions are bimodal, but the data shows the lecturers' impact is at best
a tiny factor among many, and grades are determined by factors out of the
lecturers' control, such as students' ability.

------
rcheu
I've never heard that computer science grades are bimodal, I thought it was
that performance in programming is bimodal?

It's always been relatively obvious to me that grades were not bimodal since I
could just click and see the grade distribution, which didn't appear to be
bimodal.

------
losteverything
"Your scores are bimodal!!!!" trying to get the "so what?"

Does that say something (e.g ,bad instructor _) or does it support something (
idea that 50% of students should not even be taking course, for example)

_ Organic Chem class was bimodal which led to dismissal of instructor.
Different instructor meant better learning and a different distribution.

------
moyix
Folks may also be interested in the accompanying blog post by the first
author:

[http://patitsas.blogspot.ca/2016/01/cs-grades-probably-
more-...](http://patitsas.blogspot.ca/2016/01/cs-grades-probably-more-normal-
than-you.html)

------
Bartweiss
So... This is a _bad_ study, top to bottom.

Section 3 is based off a single university's final grades.

\- There's no discussion of whether assignment grades were being curved before
going into the final score (they are in at least some universities). So it's
entirely possible that they took normalized data and found that it had normal
distributions.

\- Even if it's raw data, bimodality would appear per-assignment. Averaging
problem sets with papers with tests, and assignments of varying difficulty,
threatens to obscure any task-level bimodality.

\- Finally, the paper acknowledges the risk of university-specific effects,
but can't actually adjust for them. So Section 3 may not generalize at all.

Section 4 is unconvincing in several ways.

\- The participant selection was open solicitation from multiple forums, with
a high dropout rate after providing tests. That screams selection bias, but
isn't acknowledged.

\- "Assess this constructed data based on an inaccurate origin" threatens to
carry over a prior of "all my real grades are actually bimodal" to the sample
data. This would produce the observed results regardless of whether professors
are misjudging normal data. This risk would be irrelevant if grades aren't
bimodal, but Section 3 didn't sell me on that.

\- "Assess this data for bimodality" threatens to prime seeing bimodality for
any set of data. This renders it incomparable to unprompted observations about
grades.

Supporting Literature is a disaster. It's a hit parade of papers and topics
which have or may yet fail to replicate. The male/female candidates paper has
a sibling which found the _inverse_ result. The weapon bias study hasn't been
found predictive outside the original study. The "brilliance-requiring
disciplines" literature is such a disaster that there are multi-thousand-word
essays breaking down why it's unconvincing.

I'd say that's not the author's fault, but this is dated September 2016. The
male/female candidates result _at least_ should have come with an
acknowledgement that other studies have found complete different outcomes.

 _In summary:_ The data analysis here is fragile and unconvincing. The human
studies misuse unproven effects to draw unsupported conclusions. The
supporting literature grounds the preceding mess on other studies which were
suspect by the time the paper was written.

------
cdevs
I hated trying to read about pointers in c when I was about 15, a few years
later a different book put it more clear and it all snapped into place for me.

------
groundhogday1
So the assumption being made here is that students either get it or they don't
based on genetic predisposition? I, for one, don't particularly subscribe to
the notion that every CS student needs to score an INTP on the Meyers-Briggs
to be successful.

~~~
Practicality
ENTP makes a better developer anyway. :D

(In case anybody doesn't get the joke, ENTP is the "debater" type, so I am
picking a fight about personality type, since that is what the ENTP is
supposed to do, so self-referential humor:
[https://www.16personalities.com/entp-
personality](https://www.16personalities.com/entp-personality))

