
The Average Student Does Not Exist - ibrahima
https://blog.gradescope.com/the-average-student-does-not-exist-bc885a818145
======
forgotpwtomain
Somehow despite all the conversations around education in the US the education
system still sucks. I went to one of the highest funded (amount spent per
child) public schools in my state, and as far as I am aware it was far behind
in terms of curriculum strength compared to what my parents were taught in the
Soviet Union at the same age.

I mean we didn't read a classic American author till 6th or 7th grade! And if
I recall correctly there were still M&M's in math class in grade 4!

The US may have an education problem but somehow the Soviet Union and China
did fine years ago with out all the ed-tech snake oil.

~~~
bluGill
> in the US the education system still sucks

citation?

Education is a complex matter. There are many people with OPINIONS on what the
best way to teach is. These ideas are in conflict and only rarely does anyone
study what really works. (rarely compared to the number of opinions - there
could be a lot of studies that nobody knows about when they state their
opinion)

Humans have a limited lifetime: you cannot teach all possible useful
knowledge/skills in a lifetime. I limited this to useful, there is a lot of
useless things that are fun to know anyway, somehow those are are interested
need time to learn it for fun. I didn't define useful either: is
Music/French/Algebra/Sports... useful (I can make either argument for any
subject)

Why is reading a classic American author important? Reading is important in an
abstract sense, but if you can understand written instructions it doesn't
matter what you happened to read to get that skill.

Likewise, what is wrong with using M&Ms for learning math? a concrete example
helps to learn. (to be clear, this is an opinion that I was ranting against in
the first paragraph - I don't know if I agree with the opinion but I
understand it enough to repeat it)

One constant in the US in popular culture is our education system sucks
compared to X. We have done well over the years despite that (or maybe because
of it?)

~~~
forgotpwtomain
> citation?

[http://www.businessinsider.com/pisa-worldwide-ranking-of-
mat...](http://www.businessinsider.com/pisa-worldwide-ranking-of-math-science-
reading-skills-2016-12)

> I didn't define useful either:

> but if you can understand written instructions it doesn't matter what you
> happened to read to get that skill.

Of course you are free to define _useful_ in a way that makes it impossible to
argue or to have a discussion. So let's stick to the way it is defined for the
purpose of say University admission.

> Why is reading a classic American author important?

Reading difficult work earlier develops higher reading comprehension faster.

> Likewise, what is wrong with using M&Ms for learning math?

I think if by 4th grade you still need concrete pieces to understand integers
or denominators of a fraction or whatever they were supposed to represent,
that is a sign of a weak math education. In general concrete examples are
antithetical to learning advanced math, this leads do the monkey-style ability
to solve problems that are similar ones presented in textbooks, but not the
ability to reason effectively about an unfamiliar problems.

~~~
bluGill
> Reading difficult work earlier develops higher reading comprehension faster.

What makes a classic American author better than a modern author who writes at
a high level? (Note that most popular authors don't write at a difficult
enough level, but out of the thousands of books published each year some will
be high enough - many authors of old did not write at a high enough level
either)

~~~
slaman
Classics are classics for a reason, they've stood the stand of time and
scrutiny as literature of value.

Reading level of the material aside, I think it's more valuable to read The
Catcher in the Rye than The Hunger Games because of the subject matter and
impact on popular culture.

Classic literature is genre defining and gives you appreciation for the art of
novelization.

It's hard to gain an initial appreciation for reading if you don't enjoy the
reading you do, which is a good argument for the bestseller list, but it's
hard to gain any depth of appreciation without understanding it's roots.

You might say you like hip-hop because lil-yachty made your head bounce on the
radio, but without listening to N.W.A. you can't really say you understand it.

------
closed
To be honest, I like that this article tries to perform simple analyses, but
find their rationale pretty confusing.

This kind of data is commonly modeled using item response theory (IRT). I
suspect that even in data generated by a unidimensional IRT model (which they
are arguing against), you might get the results they report, depending on the
level of measurement error in the model.

Measurement error is the key here, but is not considered in the article. That
+ setting an unjustified margin of 20% around the average is very strange. An
analogous situation would be criticizing a simple regression, by looking at
how many points fall X units above/below the fitted line, without explaining
your choice of X.

~~~
arjun810
Totally agree that this is not a fully rigorous analysis, and we do want to
dig deeper and try to extend some IRT models to these types of questions.

The main point of this post is to highlight that the most common metric of
student performance may not be that useful. Most of the time, students will
get their score, the average score, and sometimes a standard deviation as
well. As jimhefferon mentioned in a response to a different comment, the
conventional wisdom is that two students with the same grade know roughly the
same stuff, and that's seeming not to be true.

We're hoping to build some tools here to help instructors give students a
better experience by helping them cater to the different groups that are
present.

disclaimer: I'm one of the founders of Gradescope.

~~~
closed
I agree with your point, that the average likely misses important factors (and
think the tagging you guys are implementing looks really cool!).

However, I'd say that the issue is more than having a non-rigorous analysis.
It's the wrong analysis for the question your article tries to answer. In the
language often used in the analysis of tests, your analyses are essentially
examining reliability (how much do student's scores vary on different test
items due to "noise"), rather than validity (e.g. how many underlying skills
did we test). Or rather, they don't try to separate the two, so cannot make
clear conclusions.

I am definitely with you in terms of the goal of the article, and there is a
rich history in psychology examining your question (but they do not use the
analyses in the article for the reasons above).

------
jjaredsimpson
Does this article say anything more profound than, "If you roll 10 dice,
you'll expect a score of 35, however any pair of rolls which sum to 35 are
unlikely to be similar."

All the worst students will be very similar and all the best students will be
very similar because the number of available states is low. Average students
are all unique in their average-ness.

Am I missing some subtle statistical understanding that the toy example
doesn't capture?

~~~
jimhefferon
I think the article's contention is that on-the-ground teachers expect that
two people coming out of a high school Algebra II with C+'s are similar.
(Certainly that is my working hypothesis.) The article argues that it ain't
so.

~~~
jjaredsimpson
The sets of dice which have equal sums will often have different constituent
values.

~~~
jimhefferon
The article's contention is that on-the-ground teachers expect that two people
coming out of a high school Algebra II with C+'s are similar.

~~~
arjun810
This is exactly what we think is a fairly common attitude -- thanks for
stating it so clearly! It has ramifications both within a single class and
when you think about how prerequisite and dependent classes are structured.

~~~
jimhefferon
How do you think it could be done differently? Student need to be judged for
who moves ahead. That is, I have people in Calc I and I have to decide who
moves on to Calc II. I can't send the next instructor a poset of their
competence. I cannot require that everyone be competent at everything. I
wonder what is your proposal?

------
tpeo
>Out of 4,063 pilots, not a single one fell within the average 30 percent on
all 10 dimensions.

I wondered about a very similar problem some weeks ago. I was bothered about
the terms "ectomorph" and "mesomorph" because they seemed useless once you
considered height: the vast majority of "ectomorphs" seemed to be taller than
the average while the vast majority of "mesomophs" seemed to be of average
height, so there's no point to these words. And so I wondered how would
shoulder width would change given height (which seems to have some kind
"decreasing returns"), and how the average measures would relate to actual
average build. I mean, is the "average guy" really the guy with the average
height and average shoulders? Because it's not as if the scale had just
changed, like doubling the size of a cube, but there seems to be some
deformation going on as well.

Anyway, didn't get past the wondering phase at the time. But I think it's too
much of an important problem to be casually thrown as part of a pitch. I don't
see an immediate reason why the average tuple should be the tuple of all
averages, because some of the variables might be "dislocated" and thus not
coincide with the averages of other variables. Some guy might be very close to
average height yet still somewhere in the left-tail when it comes to body
mass, shoulder width or any other measure. So there might be a typical
student, but I don't think this is the way to find him.

~~~
forgotpwtomain
As you say, they definitely aren't uncorrelated dimensions - otherwise we
would have seen ~50 pilots within one stdev for all 10 dimensions. So this
simplified metaphor really isn't telling us anything about how statistics
apply to students.

------
connoredel
There is an analogy to clustering (an unsupervised learning technique) here.

Take the simple case of 2 dimensions (each observation is plotted in 2D space)
with possible values of 0-10. Let's say the extreme (far from average) space
is within 5% of the border. The total extreme area is (10x10)-(9x9) = 19 (i.e.
19%). Now add a 3rd dimension. The extreme "volume" in 3d space is now
(10x10x10)-(9x9x9) = 271 (i.e. 27%). You can see where this is trending. Add
enough dimensions, and every observation is now "extreme." They become so far
apart that each observation almost deserves its own cluster, and you lose any
idea of similarity.

Back to this particular article: when you _add_ (or average) all of the
dimensions -- like you do on an exam -- suddenly they are close again.

~~~
vlasev
Here's another look. If you have variables X_1, ..., X_n that are independent
and random from normal distributions, if you want someone to be within 1
standard deviation from the mean in EACH dimension, then you are looking at a
probability of that happening equal to about 0.68^n, which becomes really
small for even a moderate n.

~~~
fats_tromino
This is the most succinct and clearest explanation of what's going on. I see
this discussed a lot when people talk about the curse of dimensionality.
Another very simple example is the example of a n-hypercube with edge length
1/2 embedded in the unit n-hypercube. As n increases, the volume of the unit
hypercube is constant (1), whereas the volume of the smaller hypercube is
decreasing at an exponential rate.

------
fnovd
A silly headline.

According to the article, the average _person_ doesn't exist, either. I don't
know many people that are 13% fluent in Mandarin, 13% fluent in English, 9%
fluent in Hindi... At the same time, having ~2 hands and ~10 fingers seems
about right. Some metrics work with averages, some don't.

~~~
RangerScience
Right, but number of hands and fingers doesn't form a bell curve in the first
place.

~~~
AstralStorm
Grades don't either, their composite is at most beta distributed and probably
not even that.

First of all, finite. There is a minimum and maximum. Second, questions tend
to be internally correlated. (After all, they correspond to subjects.)

Third, students are not expected to be average but pass all the questions.

------
PotatoEngineer
This question of "what skills are students missing?" reminds me of the new
teaching methods they were trying out as I started high school. The new
teaching program centered around objectives. The idea was that each objective
was a skill that the student needed to learn, but the upshot was that you had
to score more than 70% on every single quiz to pass the class, and that you
could retake every quiz you failed, repeatedly.

The implementation varied between classes - in my World History class, there
were a large number of objectives, and each objective was met by a small quiz
that tested ~one skill. (There were a _lot_ of retaken quizzes in that class.)
In Biology, there were about 10 objectives for the entire semester, so you
could still pass while missing a few small skills, as long as those missing
skills were spread out among different units.

My high school used that "objectives" system less and less as I moved up the
grades -I assume that most teachers got tired of it pretty quickly and just
decided to make their usual teaching material "look like objectives" rather
than rebuild their curriculum in later years.

~~~
bitwize
This sounds like Outcome Based Education -- one of many American education
boondoggles. Good riddance to it.

~~~
PotatoEngineer
Outcomes! Right, that's what they were called. Thanks for naming it.

------
opportune
I don't like the way this headline is written to match the article. All they
showed is that students with similar average scores over multiple questions
differed in their scores on individual questions. That is kind of obvious.

------
timemachiner
This makes me wonder. What is the "best" way to teach computer science to
students? Universities are not trade schools (nor should they be), but it
seems apparent that CS graduates in general are unprepared entering the
workforce. The other extreme (bootcamps) seem to produce graduates that are
more "industry ready" but only at a superficial level. These graduates seem to
lack rigor/theory. Makes me wonder if there is a more optimal training path
for training students.

~~~
maccard
Almost certainly an apprenticeship of some sort. You wouldn't hire a
mechanical engineer and expect him to do a mechanics job straight out f
University, which is in essence what you're asking people with CS degrees to
do.

~~~
rockostrich
Internships/Co-ops basically serve this purpose already. As a mechanical
engineer, if you haven't had any internships during your undergrad then you
are going to have a very hard time finding a job after graduation. The same
goes for software in my experience.

~~~
maccard
If the problem is already solved then why are people still saying it's not?
(E.g. This post).

I'm an engineering grad (as opposed to a CS grad). Most of the people who
graduated from our mech eng course studied thermodynamics, control systems,
fluid mechanics, acoustics.

Most of those people are now working jobs where they use those skills (or some
of them) day to day. A CS grad studies algorithms, discrete math, fuzzy logic,
compilers, possibly some networking/telecoms. And Day to day, most CS grads
are writing CRUD apps/glueing APIs together.

------
pacaro
If my memory and understanding are correct, the way that Mathematics is graded
at Cambridge is interesting here.

Questions are scored _alpha_ for a completely correct solution, _beta_ if the
examinee demonstrated that they knew what they were doing by maybe made some
small mistake, and _gamma_ for a reasonable effort.

The bare minimum pass mark is one alpha.

~~~
digikata
That sounds very interesting, but I'm curious about the pass mark criteria. Is
there some larger number of beta and gamma that can also pass? Otherwise, it
seems like generating Beta and gamma scoring gives some nice measures for use
by the teachers/students, but if ultimately passing only relies on alpha, it's
a lot like any other math scoring.

It becomes a little like companies saying they value x & y, but take action
only aligned to z.

~~~
pacaro
Each problem in the exam is worth some number of points (20), I think that the
aim is to ensure that you can't pass by mediocre performance across many
problems, that a bare pass indicates that you have retained enough
knowledge/ability to get basically correct answers on two problems.

Explicitly the aim is to eliminate students who haven't deeply understood some
aspect of the curriculum, so accumulating lots of partial results is exactly
what they don't want.

It's worth noting that out-right failure is extremely rare and subject to an
appeals process etc. Partly this is because this is a set of exams at the end
of each year of instruction with no mechanism for a re-sit, so a student who
fails will not graduate (the system isn't totally barbaric, there are
mechanisms in place to handle health related concerns etc.).

------
QML
When I read that the data was collected from 1500 CS finals, my immediate
guess was that the class was CS61A.

\---

I suspect that the distribution of the curve has to depend on: subjectiviness
of the test and on the grading. Tests with questions where you know it or you
don't. And how much partial credit graders are willing to give.

------
bryanrasmussen
what about the 10X student?

~~~
suyash
That I'm not sure about but a 10X Engineer is for real :)

------
crimsonalucard
Given a large enough sample size, I'm sure you'll find such a student.
Additionally, you will have plenty of students who beat the average and are
below average. Performance below or above average matters because student
performance is ranked while cockpit dimensions are not.

------
suyash
Average is just a statistical concept - in reality there is no average.

