
MIT and Harvard release de-identified learning data from open online courses - kercker
http://newsoffice.mit.edu/2014/mit-and-harvard-release-de-identified-learning-data-open-online-courses
======
minimaxir
Working up a few quick statistics:

\- Of the people that finished a course, male students had an average grade of
83.8%, female students had an average grade of 82.7%.

\- The correlation between the grade of students who finished a course and the
usage of supplemental materials is positive, but weak (0.16 for each variable,
except for forum posts, which are complete uncorrelated with grade).

\- The easiest course was HarvardX/CS50x/2012, with a perfect 100% average
grade from all students who finished the course [EDIT: this course has
pass/fail assignments]. The hardest was HarvardX/CB22x/2013_Spring, with an
average score of 73.3% (the class, unsurprisingly, is about Ancient Greek
Heroes)

\- The course with the highest completion rate (# completed / # registered) is
MITx/14.73x/2013_Spring at 7.48% (The Challenges of Global Poverty). The worst
completion rate is HarvardX/CS50x/2012 at 0.07% (Introduction to Computer
Science I, note that this is also the most registered course by a large
margin)

\- All classes have more male students than female students. The class with
the highest Male/Female ratio is MITx/2.01x/2013_Spring at 17:1 (Elements of
Structures). The class with the lowest ratio is HarvardX/PH278x/2013_Spring at
1.03:1 (Human Health and Global Environmental Change)

Let me know if you have any statistical ideas.

~~~
cpks
Regarding easiest vs. hardest, while I'd tend to agree with the conclusion,
the logic is just plain wrong. Final course scores depend more on grading
structure than anything.

Many of the best courses use mastery learning. You can keep trying until you
get it. This usually leads to 100% averages, actually quite well-deserved,
since all students learn everything.

Heroes was a course which encouraged different levels of participation. It's
probably the best MOOC ever run, but explicitly designed so you'd get out what
you put in, and to accommodate all levels. Many students chose to put less in.

~~~
minimaxir
The course has 1287 graduates, all with a perfect score, so I believe your
conclusion is correct. Edited OP.

------
danso
FYI, just in case you don't want to go through the (short) signup process to
see what's in the data.

This is what the first 10 lines of the CSV file look like:

[https://gist.github.com/dannguyen/4d372986b74ff087927a](https://gist.github.com/dannguyen/4d372986b74ff087927a)

Here's the output of `wc -l`

    
    
         641139 1092627 70165566 HMXPC13_DI_v2_5-14-14.csv
    

Here's a mirror of the file [9.6MB]: [http://danwin-
files.s3.amazonaws.com/data/nf/HMXPC13_DI_v2_5...](http://danwin-
files.s3.amazonaws.com/data/nf/HMXPC13_DI_v2_5-14-14.zip)

------
svedlin
There's been a lot of discussion about the "completion rates" in online
courses. A study in January reported that "only 4 percent of people who
register for MOOCs actually finish them" but that "MOOCs still have
considerable impact" because "nearly two-thirds got at least something out of
the experience." [1]

One possible solution to this - instead of measuring progress against a single
completion date, split courses up into a continuous series of milestones or
smaller units. A student could cover 1 or more units.

While it's true some courses are considered prerequisites for others, the
requirements could be made more granular (e.g.: course B unit 4 requires
course A units 5 and 6). Discovering these dependencies could potentially be
automated by text data mining the course material.

Courses are designed around a traditional college semester and reflect the
amount of material that can be reasonably covered in that time period.
However, that constraint shouldn't necessarily be the benchmark for all study
programs.

[1] [http://hechingerreport.org/content/harvard-mit-despite-
low-c...](http://hechingerreport.org/content/harvard-mit-despite-low-
completion-rates-moocs-work_14495/)

------
contingencies
My problem with MOOCs is immediacy: I'm interested _now_ and I don't want to
delay that over _x_ weeks, beginning at _y_ point in the future. Hell, I often
don't know what country I'm going to be in, let alone whether I'll have free
time and an internet connection. So for my lifestyle, the relatively simple
mapping from traditional tertiary course formats across to MOOCs is
fundamentally flawed one, though I believe they are improving these days by
offering access to all materials immediately. Another thing is downloads ... I
just want everything, please. I'm often offline, as I believe many developing
country learners may be. I don't want to have to register, log in, then
painfully click through everything bit by bit. I want bittorrent with early-
to-late material download priority. (Otherwise, maybe someone should start
developing a converted and open courseware format to share on PirateBoxes?
[http://piratebox.cc/](http://piratebox.cc/))

~~~
incision
I'd wager that you're in the small minority here.

I've gone through a number of MOOC courses and structure, schedule, TA
interaction and shared experience seem pretty important to most enthusiastic /
best performing course takers.

Some courses, like CS50 are already completely self-paced. Others run quite
frequently, restart every month.

Being able to have it all in terms of TA/instructor support and an arbitrary
schedule seems to be the direction Udacity is going with subscriptions.

I do agree that being to easily grab some/all course materials at once would a
nice option. However, I've completed a majority of the work for my courses
offline and only one had its coursework structure in a way that necessitated a
frequent connection to complete.

------
hershel
Does anybody have anidea what's the completion rates for students who paid(for
certification/something else) ?

~~~
ghaff
The only numbers I've seen are from the Spring 2013 Gamification course. 68%
of the signature track students earned a verified certificate. (That's not
quite what you asked but it's probably the only meaningful way to measure
completion.) Other statistics associated with the class were fairly typical of
MOOCs so I assume that's a reasonable ballpark to assume for MOOCs generally.

~~~
cpks
I've done a few verified certs. There's a question of causality. I only get on
the verified track _after_ I'm relatively confident I'll finish the course.
Those are the same courses I would have gotten an honor code in if verified
wasn't available.

