Hacker News new | past | comments | ask | show | jobs | submit login
Download Coursera videos in batch (github.com)
73 points by jplehmann on Mar 28, 2012 | hide | past | web | favorite | 27 comments



http://coursera.org is creating some fantastic, free educational videos (algorithms, machine learning, natural language processing, SaaS).

This script allows one to batch download videos for a Coursera class. Given a class name and related cookie file, it scrapes the course listing page to get the week and class names, and then downloads the related videos into appropriately named files and directories.

Why is this helpful? Before I was using wget, but I had the following problems:

  1. Video names have a number in them, but this does not correspond to the
     actual order.  Manually renaming them is a pain.
  2. Using names from the syllabus page provides more informative names.
  3. Using a wget in a forloop picks up extra videos which are not posted/linked,
     and these are sometimes duplicates.
Naming is intentionally verbose, so that it will display and sort properly using MX Video on my Andriod phone.

Inspired in part by youtube-dl (http://rg3.github.com/youtube-dl) by which I've downloaded many other good videos such those from Khan Academy.

Let me know if you like it.


Awesome! I was actually planning on writing such a script over the weekend. I haven't take a look at this semester's courses, but I know last semester the quizzes and tests were quite useful for someone with no previous practice in the subject at hand. I can see your script doesn't try to get all that right?

In that case i'll still have a weekend project.


In the NLP class there are programming assignments with special formatting, headers, etc. I kind of want to write a script that uses NLP to snag NLP's programming instruction pages (as well as example code, etc.) Seems like that would be fun to do.


But in that case wouldn't you be looking to get the essence, the plain text useful stuff of an HTML document, in which case wouldn't parsing using regular expressionism or something be better than NLP? I haven't really done scraping and parsing of documents/text so I'm not too sure.


It's possible yeah, though I like the formatting and highlighting and borders etc, it groups the different sections of the instructions together.

I see what you mean though, it's not really full NLP either way, I just used that term in place of regular expressions because it was in the NLP class that I learned about them (first homework is a phone and email scraper.) Probably my fault for using semantics wrong.


Only support for videos right now.


Update -- now downloads all lecture materials on the videos page (pptx, pdf, etc).


Some shameless self-promotion: I wrote a Chrome extension for downloading Udacity videos (http://nzmsv.github.com/udacity-dl/). If there's any interest in a batch version I could look into it. Alternatively, feel free to write it and let me know :)


I use the downthemall firefox extension and to keep the videos in order I add a number to the renaming mask:

  *num*_*name*.*ext*
I like how jplehmann's tool can rename them using the titles on the page.


You can also check my script over here: https://github.com/fvieira/coursera_resources_downloader It has the advantage of not requiring a cookies file, it can authenticate with your user and password. Otherwise, it does pretty much the same as jplehmann's script, although with some minor changes which you might or might not like.

By the way, congratulations on your script, jplehmann! Wish I had found yours before losing time doing mine...


I'm already using it, after sometime I got a connection forcibly closed by remote host error. I can't access the Coursera website either, not sure why though. (mayhaps a bunch of people suddenly using this script crashed their servers? or they blocked us)

It's back up, must have been a small glitch. Might I add that I love the fact the script picked up on the video I dropped earlier.


Right now, if you kill the script it will remove any file being currently downloaded to remove partials. I'm not sure if that happens for other failure conditions. I have added an issue for this: https://github.com/jplehmann/coursera/issues/1


Going to put in a bug @ github, for some reason the video's won't actually play (and their file size is slightly larger.)


I'm not seeing that problem. If anyone else is having the same problem please reply with more information: https://github.com/jplehmann/coursera/issues/2


Nice! I actually found your project last week through google but wrote my own in js (https://gist.github.com/2225519) after struggling with the python dependencies.

I think coursera really needs to come out with a native solution and a standard way of numbering/organizing videos.


I have tried all of the projects mentioned in the comments here.

To me the most simple & quick was this bookmarklet.

https://github.com/christiangenco/Coursera-Video-Downloader-...


Another version that downloads coursera videos, and also optionally downloads slides and subtitles:

https://github.com/LoganDing/Coursera.org-Downloader


Thank heavens! Er, I mean thank you (the OP) for this tool. I already wrote scripts that renamed files to something sane, but this will make my life so much easier.


That's great -- I am happy if just a couple of other people get use out of it as well.


Do you know which technologies to create the interactive video lectures (subtitles + quizzes ) like those in Coursera courses ?


Thanks a lot, does it skip downloaded videos in next run?


It seems to.

Edit: Tested, yeah it does, and won't skip a video if it was only half downloaded.


Thanks


Can someone create an iOS app for this please ?


this looks sweet. Thanks. I had been manually downloading the vids on my PC, but I hope this tool will now reduce the pain.


How does this work with in-video quizzes?


When you d/l videos, it doesn't do those.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: