

Node Summary – Naive summarization algorithm for Node.js - charlieirish
http://jbrooksuk.github.io/node-summary/

======
quarterto
Literally nothing about this is node-specific. It uses node-style
function(err, result) callbacks despite doing absolutely nothing asynchronous.
Hooray cargoculting.

I notice it was originally written in synchronous style, but rewritten using
callbacks. Callbacks don't magically make something CPU-bound asynchronous.
They just make it full of callbacks.

~~~
morganherlocker
I would point out that there are cases where sync callbacks do make sense
(granted, this does not look like one of them).

1\. You want to add optional results. See underscore's each function that
returns an object for the current iteration, then an optional iterator value
in addition, which you sometimes need.

2\. There are many functional patterns that utilize callbacks where you want
to pass functions in as parameters (which could do anything, including
accessing the file system or waiting on an external call). Underscore also has
loads of examples where this is the case, despite being a sync library.

3\. This is probably the least valid reason, but sometimes you do not know
going in whether or not a function you are writing will be async. If you
create a sync interface, it will need to be changed if the function needs to
be async in the future, or if it needs to pass additional results. In my
experience the latter is not terribly uncommon. The former is pretty unlikely
in library code, but I could see it happening in application code regularly.
Breaking apis is not fun for anyone.

~~~
quarterto
Callbacks ≠ passing functions as arguments. Callbacks (in the node sense) is
passing a _single_ function as the _final_ argument which will be called
_exactly once_ with an error or result.

The function passed to _.each is used to access the elements of an array. It
is called once for each element. It is used as a mechanism to pass a block of
code to _.each, _not_ to get a result from _.each.

------
ptwobrussell
Nice work. This looks similar to an algorithm developed in the 1950s at IBM by
H.P. Luhn. Luhn's algorithm basically finds the most "important" words (mostly
via frequency) and then ranks the "importance" of sentences according to the
co-occurrences of these "important" words. The summary of the document then
becomes the "top n" sentences in chronological order.

An implementation of Luhn's algorithm (which accepts a URL or HTML input and
also handles relatively clean text extraction with boilerpipe) is available
here: [http://nbviewer.ipython.org/github/ptwobrussell/Mining-
the-S...](http://nbviewer.ipython.org/github/ptwobrussell/Mining-the-Social-
Web-2nd-
Edition/blob/master/ipynb/Chapter%209%20-%20Twitter%20Cookbook.ipynb#Example-24.-Summarizing-
link-targets)

I haven't kept up much with "state of the art" summarization techniques, but
this approach is pretty intuitive and works about as well as anything I can
imagine (without getting into machine-driven language generation, which is a
whole different topic.)

~~~
jbrooksuk
Thanks!

------
shlomib
One thing, If you take a look at line 53 in my original code:
[https://gist.github.com/shlomibabluki/5473521#file-
summary_t...](https://gist.github.com/shlomibabluki/5473521#file-summary_tool-
py-L53)

You can change the code to run only -> for j in range(0, i)

* (instead of range(0, n))

and then store the same score both in values[i][j] and values[j[i]

~~~
jbrooksuk
Thanks for your original work! It was perfect to learn from, so thanks again!

I'll look at implementing your changes shortly, perhaps after a rewrite to
circumvent the blocking behaviour experienced with the callbacks.

------
PLenz
I've used this in a few production projects where I work - it does the job it
needs to do.

~~~
jbrooksuk
Amazing! I'm glad it's worked for you :)

------
runj__
It would be nice to see some example output.

~~~
jbrooksuk
I just added the example output, [https://github.com/jbrooksuk/node-
summary#example](https://github.com/jbrooksuk/node-summary#example)

Thanks!

------
jbrooksuk
Hey, developer of node-summary here! Thanks for posting!

I was wondering why it's being starred again :)

