Hacker News new | more | comments | ask | show | jobs | submit login
Coursera-dl – A script for downloading course material from coursera.org (github.com)
184 points by carlosgg on Aug 22, 2013 | hide | past | web | favorite | 70 comments

Most of my free time these days is spent watching Coursera videos. It is an absolute revolution for those of us who love to learn. There is so much material though and I don't have enough time!

The quality and volume is truly incredible. There are at least 4 classes that I'm interested in at any given time, and that's on coursera alone (not including udacity, edx, etc.).

I wonder how much longer these services are going to stay free. I like to download all the content just to be sure that it's not taken away when they eventually go to paid models.

> I wonder how much longer these services are going to stay free. I like to download all the content just to be sure that it's not taken away when they eventually go to paid models.

I also download everything. I'd hate to want to reference a lecture later only to find it missing.

Last I heard, Coursera was still searching to find a revenue model. I've noticed their "resume" section expand, and you can now opt in (or is it out, by default?) to job hunter searches. Which makes me curious as to what Coursera's revenue-generating product is going to be. Is it the database of people with certified(-ish) skills, or will it be the content they provide?

I share your concerns about long term viability as well (and I download everything). I really want to see them succeed though...

I've been thinking that a viable model that would keep courses free might be to use a kickstarter crowdfunding model to launch a course offering. If I did it, I'd set it up so that the threshold was high enough to make sure that the course was profitable, then make the course available to contributors first (and they'd get certificates), after which it would be freely available to anybody who wants to take it. You'd have to work out how to do the advertising efficiently so that it didn't eat your time and profits, which is probably a huge problem with the idea. Personally, I'd happily contribute to something like that on a regular basis though.

Periodic relaunches could also be set up for certificate track stuff. You'd get the professor back in to answer questions and all that, but the video content could be left the same so it should be much less expensive to operate existing courses.

They're probably trying to find as many ways as possible to embed themselves into relevant industries before they aggressively monetize.

I bet the end goal (for all these MOOCS) is to become the de facto source for web based courses offered, for pay, through the current university system. You're right, they're also trying to become a recruitment tool for various industries, but they'll make the most money by switching on the revenue steam once universities become dependent on them and there's no turning back.

why can't they charge the universities instead to put up their own courses? It'd be a good advertisement. Like how everyone knows about Strang's linal courses. The complexity of the courses are limited by the number of people attending it, so anyway, things can get only so much advanced. And use it as a platform for getting into more advanced courses in physical classrooms.

My retired father had been following the archives for "Learn to program: The fundamentals", and after completing 3 weeks of work, the coursepage was suddenly taken down (https://class.coursera.org/programming1-001/class). It looks like they just started a new iteration of the course and decided to remove the old one without notice. Needless to say, this experience took away from his belief and excitement in online learning (something which he had recently discovered).

Same here. I think this is a great opportunity and many of us waste it — it's so much easier to post snarky comments on Facebook, after all.

On the practical side, once you download the videos, I can heartily recommend the SwiftPlayer iOS app — it lets you vary the speed at which the videos play, which is very practical for lectures. I watch easier sections (or ones that I just want to repeat) at 1.5-2.0x, going down to 1.25x or 1.0x when things get tougher. Easy FF/RW in 10 or 30-second increments and bookmarking are very useful as well.

I got so used to SwiftPlayer that I can't bear to watch lecture videos in the web browser anymore.

I've found Coursepad [1] is a really nice app for Coursera courses. It can automatically download the videos, keeps track of which ones you've watched, and even allows you to take notes while watching the video. Playback up to 1.8x as well.

[1] http://www.coursepad.org/

This is interesting because I find watching recorded lectures one of the most tedious and inefficient ways to learn about a given subject. I far prefer a good textbook for self-directed learning.

Is this just me or do others really find video superior?

It depends on the instructor. I'm working through a Quantum Course by Prof. Binney at Oxford. His moment to moment teaching is not great, but he'll intermittently and regularly just drop gold nuggets.

It's those "aha" moments where someone with a deep knowledge of the material clears up some misconception I've had for decades with a few sentences, or an off the cuff remark.

I can only watch videos on 1.5 to 2x speed now. Thanks god youtube added this feature as well or I'd go crazy.

Wuzza-fuzza what now? Where? How? What keys?

Here's one way:

1) Join the HTML5 trial http://www.youtube.com/html5

2) Open Chrome inspector, select the <video> element

3) In the Chrome console, type $0.playbackRate = 2.0 ($0 refers to the element you've selected in the inspector)

This also works for other sites that use the HTML5 video element, and allows you to go faster than 2x if you prefer.

YouTube's HTML5 interface also has a menu option (gear icon) which has a speed control, but the Chrome inspector trick works with any HTML5 site.

Quite handy - thanks for the tip!

Another way... download with youtube-dl and play back with VLC. I believe the square bracket keys change the speed.

I do agree with you. Maybe it's something the Internet is "doing to my brain" [1], but everyday I feel less and less inclined to resort to video when I'm actively seeking information; I guess the "new" me can't stand watching something at its own pace (or, worst, focusing on one and only one thing at once). I got too used to reading, skimming and scanning at my own pace (and no - clicking randomly at a timeline or accelerating the playback is not the same).

I've noticed this before MOOCs, actually. Take web development tutorials, for instance. A few colleagues of mine loved video tutorials from (e.g) lynda.com; I found it utterly boring and inefficient.

[1] http://www.amazon.com/The-Shallows-Internet-Doing-Brains/dp/...

It's not just you, but I believe it's a subset of people. Though I'm not an educational researcher, I believe it has to do with your comprehension style -- people tend to gravitate to one style they like best.

The common one I've seen referenced is the VARK model: Visual, Auditory, Read/write, Kinetic[0]. Though as that wiki page states, there are several other theories.

[0] http://en.wikipedia.org/wiki/Learning_styles#Neil_Fleming.27...

This violates Coursera's terms of service. The relevant language:

"...as a condition of accessing the Sites, you agree not to...(c) use any high-volume, automated or electronic means to access the Sites (including without limitation, robots, spiders, scripts or web-scraping tools)"


I was actually going to write such a script myself as an exercise (I'm new to programming), but this language dissuaded me.

EDIT: Could someone please explain to me why this simple statement of fact would be downvoted?

This script would appear to be at odds with the language of that clause, but I wonder if it is truly against the spirit of Coursera.

I can see clear ways to automatically exploit the website for the purpose of scraping their content. However, I have had a legitimate use for this in the past and, instead of just doing the smart automatic thing, I did it manually.

My take is this: for personal use, a low-volume, automated tool would not violate the spirit of online education. The presence of that clause is probably intended to protect against exploitative uses.

I had a similar take, especially since we all know that terms of service are written very broadly. But on the other hand, I would really hate to mess up my Coursera account.

After posting my initial comment, I saw further down that HN user pamelafox is a former Coursera employee. I would love to know whether she or anyone else has insight into how Coursera views tools like this.

I upvoted it. :) Good catch! I did not know that! I really hope they mean that clause for people that would try to frivolously download 200 courses or something, just to have them...I figure that would tax their servers and cost money because of the bandwidth being consumed. Before I knew about coursera-dl, I tried to download one of the courses by hand, and it's BEYOND TEDIOUS. So far, I have downloaded 2 courses with coursera-dl, both of which I have taken and submitted lots of homework to. I think it's ok in that case...

If you use Firefox you can just use DownThemAll. To be honest DTA makes downloading anything and everything easier.

+1 for DTA

If DTA can't find the link to the content for some reason (it happened to me on youtube a while ago), you can use an addon like LiveHTTPHeaders to sniff HTTP requests and figure out the link for the video.

edit: I forgot to mention that DTA will lag the entire browser like hell and kill your battery, so only use it when plugged in. This happens on Windows + Linux, and idk about OSX.

You can also use developer tools and view the requests coming in, look at the obvious file type (usually flv/mp4) and grab the URL from there.

I just saw your edit. In case you ever stumble back across this checkout the beta versions of DTA. The beta versions have been greatly improved. DTA nevers locks up the browser for me. Its an incredible improvement.

Just tried it, and the performance is awesome! I do miss seeing the obnoxious "donate" banner on the top right though. Thanks for the tip.

Nice tip. I never knew DTA would work on youtube. I have always used youtube-dl.

DTA fails on download for many of the courses actually. Something about "login" keep popping up.

I have never had a problem with DTA and I often use it with Firefox Aurora and NoScript+RequestPolicy+AdblockPlus (AKA extension developer's worst nightmare). What classes did DTA give you problems with? Where you signed up for the class?

I find it too bad that we can't simply create some mirror for the material and have them available indefinitely on a crowdfunded s3 or something similar. All of this because of the terms of use. It feels like duplication of "effort" to me.

Anyways, that's a great thing to be able to download simply.

Awesome, although I don't know why you couldn't have it automatically accept the 'honor code.' But then again, perhaps if you're using this instead of just rolling your own solution you might not really need/want that sort of automation in the first place.

Also, and more importantly, isn't it a bit strange that there needs to be tool like this at all? Is it still going on where Coursera pulls old course material off when the course is finished? If it is, can we have a discussion on that?

I have had mostly good success using wget and a cookie.txt file exported from chrome with https://chrome.google.com/webstore/detail/cookietxt-export/l...

But it's not so convenient. I'll have to try this in future.

Even better, Chrome's developer options allows you to right click in the HAR and do a "Copy as CURL" which includes all the bells and whistles (headers, cookies) to throw it into terminal and start hacking away.

This is awesome. But, does anyone know how to do this with `--save-cookies` command with wget?

I remember crafting something similar along the lines of this (incorrect argument names almost assuredly):

curl $url_that_has_cookies -c -

That will save the cookies to stdout, and IIRC you can pipe them to curl. I've done similar stuff with wget along the lines of just saving the cookies and then loading them within a single command.

This is not very new. This coursera-dl script has been around for like... a year or something ? ANyway, I have been using it and it works quite well but there are some classes where compatibility problems occur and not all material is downloaded at once. Now the next step is for someone to make a GUI for it.

Seems to work great over here, and the installation was pretty easy, too. You can even choose not to download certain types of files using the -n option. For example, if you have a large hard drive and a smaller one, you can download the whole course to the large HD:

coursera-dl -u username -p password -d pathToLargeHD course_name

and only download pdf lecture notes to the smaller one

coursera-dl -u username -p password -d pathToSmallHD -n mp4,pptx course_name

I tried that over here, worked great.

Some schools prefer students don't download course materials. I succesfully downloaded Machine Learning and Algorithms courses from Stanford but could not download this one, it says "now downloadable content found":


After upgrading to latest version of script, I was able to download this one, too.

Anyone able to get scicomp-001 to work? I get this warning: "Warning: no downloadable content found for scicomp-001, did you accept the honour code?" (I have)

Probably University of Washington not allowing downloading of their material. Or you should just try it again a bit later.

Hmm, I'm not that familiar with BeautifulSoup, but it appears to not properly be parsing the page, rather than a download / availability issue. Specifically, there's no content in the soup object ~line 150 and thus the list of content to parse and download is just [].

Will keep probing, but if anybody has experienced something similar and has a handy solution, I'd greatly appreciate the tip! (And I'm sure the author would appreciate a PR)

I get the same thing, for humankind-001. Oh well.

Here is something similar for Udacity https://github.com/nzmsv/udacity-dl

but hey who needs that when there is https://www.udacity.com/wiki/downloads

I wrote the extension for Udacity before they offered downloads. I agree that it is pretty useless now. I should see if it even works still, thanks for reminding me :) I should probably put a note to that effect in the readme.

I wrote this thing in one evening because I prefer to watch these videos on a big screen. What surprised me was how popular this quick hack became. There were people using it to download videos and share them with others in countries that blocked YouTube. I think at its peak there were a few thousand installs.

Works great. Each Coursera class takes about 1GB of HD space. Will watch the videos on my BART commute.

Yup. Subway time is super-powers time.

Quick FYI, there might be a problem if one tries to run Firefox concurrently:


Doesn't seem to currently work for old courses: https://github.com/dgorissen/coursera-dl/issues/72

Anyone saved material from Jeff Leek's last Data Analysis class? it seemed like a good class. It was in Jan. 2013 and it's coming back in Oct. 2013...

lol, I wrote a little Javascript thingy to do that for me, but I'll try this for sure. I hate that they don't keep the course materials open indefinitely... and I don't get it either.

(I used to work at Coursera) It's actually up to the professors to decide - some of them leave them up, some close the classes. Sometimes they'll close because they want to improve the videos for next time or prevent super simple cheating in the next session, for example. We came up with proposals for how we could leave all classes open while also alleviating professor concerns, but I don't know how far those are along. There's a lot on their plate, as you can imagine!

Is there any way to determine which ones are going to be left up and which ones are going to be put down (and for that matter, whether or not a certain course will be re-run on Coursera the next time around)? I'm curious to know if the ongoing 'Startup Engineering' course ( https://www.coursera.org/course/startup ) will be put down or not.

Instructor here. We'll be doing a v2 of the course after a few months and will repost the materials at that time (we've learned a ton from the first MOOC run).

Awesome, thank you very much. I'm very happy with how the course has been going so far, and will have my siblings and friends take it when it runs again the next time around.

Startup Engineering is probably one of the most applicable classes I've taken. Thank you!

I did not found the time to follow this session, but I will follow the next one for sure :)

How long will the materials stay up ?

Professor Balaji. Your course is one of the best I've ever took. Thank you so much!

As a former employee, do you have any insight into how Coursera views tools like this? As I wrote in a different comment, this appears to violate their TOS.

i presume their goal is to make $

Their only current source of revenue is through a signature track tier where you get a certificate that is verifiable with your identity. I doubt it will stay that way indefinitely, but for now the course material is provided without charge.

I definitely foresee a subscription-type model or a la carte via the signature track model. Either way, I think anyone would be happy to pay for this awesome education.

I think you'd be surprised. If you look at the stats, there's a lot of professional-level (BS/Masters/PhD degree holders) as well as a lot of students from (broadly) the developing world. I'd submit that both those groups are probably pretty price sensitive--especially the latter, many of which don't even have credit cards in many cases.

Would "adult ed" students be willing to pay some nominal amount--say $25-$50 for a course? Probably yes in some cases but remember the pricing discussions that take place here all the time. Getting from free to paid (in any amount) is a big barrier to get people over.

Arguably the case would be different in certifications actually meant something. After all, people and their companies spend lots of $$ on various software certifications. But that's a whole other topic.

It isn't a huge barrier if the courses are as legitimate as actual university courses.

As for the pricing question, the price of a can of coca-cola isn't the same in the U.S. as it is in Sudan. Once these MOOCS establish a brand, it will be easy for them to adjusts their prices to maximize profits.

You can't imagine how prices of such things are similar outside US. Even in countries such as Sudan. Most of the time, prices are even higher.


Another example, I live in the East European xUSSR country, where average School teacher's monthly wage is $200 (not a typo). But the prices of milk, meat, bread, MacDonald's, coca/pepsi, ... are the same or more, compared to e.g. Switzerland.

Last time I checked, the meat was more expensive here than in San Diego.

So not everything is adjusted by income (in case of my country, housing and renting can be stated as significantly cheaper than in Switzerland).

>$200 (not a typo)

I genuinely wonder how they get by. (Especially with multiple family members to worry about.)

In the future, if you want people to be sure it's not a typo, you can just say the words "Two hundred dollars", then your meaning will be absolutely unambiguous.

> I genuinely wonder how they get by

Generally speaking, they don't. The teacher profession is treated as a hobby. Like they prefer to go to the School and teach rather than sitting a whole day at home (unemployment is the Big problem here, outside IT field). So you better have some other member of the family doing some other work.

People in such conditions (quite many), just buy less goodies. They don't have ipads/iphones, new cars or similar...

I just wish one didn't have to log in with an account. If they want to give this stuff away, great! ... but please why are you making me give you my info? Even fake info

They obviously want to give it away for your info. Even fake info =P

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact