
Automagically Generated Summaries of MIT OCW Algorithms Lectures - riffer
http://swimwithoutgettingwet.com/mit_ocw_algorithms_lectures_summaries
======
Swizec
I'm really interested in how you did the automagic summarization? Is there
some sort of public algorithm, or did you make something yourself? If so
what/how?

The reason I'm asking is because I was thinking of making a summarizator but
ended up realising I have no idea how to even begin making one.

~~~
riffer
Yes, I'm working on a blog post describing how this works.

It won't go into a lot of detail, but it should be a good overview. The truth
is that I still have lots of ideas on how to make it better, but it is also
the case that for every 8 things tried maybe 1 works. It comes back to your
point that part of the appeal of working on this class of problems is that it
is somewhat hard. In general, I would prefer to work on things that have
technical barriers to entry, rather than network effects.

~~~
mahmud
While we're waiting, throw us a bone. Name any libs you're using.

~~~
riffer
Although I'm not in love with the tone, I'm happy to answer the question.

The coding is all done in python. I'm using nltk, although the approach is
more statistical inference than linguistics. That's really the only library.
I'm very much a bottoms-up guy, and am happy to re-invent the wheel because it
sometimes gives me a better understanding of the texture of the problem I'm up
against. For example, where I'm doing some linear algebra / graph traversal
math I'm using functions I've written rather than numpy, or scipy, or a
graphdb, etc.

~~~
mahmud
Thanks riffer, I have searched low and high for a usable automatic text
summarization library, but everything was a high-dollar item for finance.

Also, the tone, if there was any, was friendly to neutral.

~~~
riffer
_the tone, if there was any, was friendly to neutral_

No hard feelings, sorry if I was reading too much into it.

 _I have search low and high for a usable automatic text summarization
library, but everything was a high-dollar item for finance_

Now you really have my attention. I'll shoot you an email.

------
astrofinch
Why don't you just read introduction to algorithms?

~~~
riffer
The concept is roughly that people shouldn't have to skim what software can
summarize. Let the machines do the work.

~~~
mahmud
I laud the initiative, but there is hardly any fluff in CLRS.

------
riffer
As promised here yesterday on the MIT Video lectures - Introduction to
Algorithms discussion thread -> <http://news.ycombinator.com/item?id=1751181>

------
Scott_MacGregor
This is nice, though some formatting would make it easier to read. In IE7x
there is almost no whitespace on the right margin. On my big monitor it is
close to wall to wall text.

~~~
riffer
Ouch, sorry about that, I'm not too proficient at this cross-browser stuff.
Let me see what I can do.

~~~
makmanalp
It'd also be nice if the summarizer separated the text into readable chunks,
like paragraphs as opposed to a single wall of text!

~~~
zbanks
Linebreaks would be great. Even if you just spaced them every 3 sentences, it
would improve readability _greatly_. It may seem totally arbitrary, but right
now the wall of text is rather daunting.

Also, set .body { margin: auto; } -- it will center the text in the screen.

