

Show HN: user generated TL;DRs with a smart scorer - quan
http://gistpoint.com/

======
quan
Hi HN,

Here's a little background about gistpoint, I was an FPGA developer working on
DSP systems at a big company in D.C. Last May, bored of my job and frustrated
at my career progress, I took the paths most travelled and quit my job to
wander around the world. Along the way, I picked up ruby/rails and javascript
because I want to transition into web programming. The learning curve was
steep coming from the hardware world. Plus running from city to city also made
it tough to maintain focus. 2 months ago, after many false starts and months
of distraction, I finally decided to push it through completion.

gistpoint is a web app that lets you search and submit summaries for online
articles. It's unique in that it has a scorer that rates how relevant a
summary is compared to its fulltext. This is done by analyzing several factors
such as information coverage, sentence grammar, word stemming, etc. So the
best summary will rise to the top and gets shown as the default. No
signup/login is needed to submit summaries, and quality is controlled without
voting. When no user-submitted summary is available, it will extract the most
relevant passage from the fulltext. From my experience if I just naturally
summarize an article I just read, the scorer rates it highly compared to the
extracted one.

Please let me know what you think. If anyone wants to know more about the
scoring algorithm, I'll be happy to write a post about it. I definitely open
source the summary extraction and grammar gems on github. They're already on
rubygems <https://rubygems.org/profiles/50055> for you to download, I just
need to create the documentations.

~~~
18pfsmt
Very cool, nice work. It would be interesting if you could do this with
legislation, both proposed and passed. You'd probably have to add/ modify
relevancy factors but it could be helpful to not have to read legalese and
still get the gist of laws.

------
kmfrk
Very interesting concept. I think it's necessary to find a way to gauge the
most popular articles rather than trying to tl;dr the entire internet.

For instance, you could look into some APIs for the most popular submissions
on Hacker News and reddit and present them as articles that need summaries.

You could also create a bookmarklet that users or authors can use to recommend
an article to be summarized, which will then appear in some kind of summary
queue on the main page. (Predefined) tags could also be included, so people
who are only interested in summarizing certain articles, can cherry-pick
these.

Another thing is to suggestion is to reward productive users and upvoted
summaries with a link to the writer's flattr profile. People can do this, even
though the person has a flattr profile now[1][2], and I think this might
increase the incentive even more.

If this takes off, you could create some browser extensions similar to the
Hacker News Sidebar[3], which displays a sidepanel on websites that have been
submitted to Hacker News.

[1]: <http://blog.flattr.net/2011/04/opening-the-floodgates>

[2]: <http://blog.flattr.net/2011/05/flattr-almost-anyone>

[3]:
[https://chrome.google.com/webstore/detail/hhedbplnihmkekhgma...](https://chrome.google.com/webstore/detail/hhedbplnihmkekhgma..).

~~~
quan
Great feedbacks! Actually, it already measures article popularity using
twitter count api. The articles shown on the frontpage are sorted by twitter
link count.

If you install the bookmarklet and click on any page it will show you the
machine extracted summary. You can click edit to submit your own summary.

Great idea to incentivize users to submit summaries by linking to their flattr
profile.

------
NathanKP
I like the concept, but I feel that your algorithm for generating an initial
summary needs a lot of work. For example, I was experimenting with the API:

The input page:

[http://bookflavor.com/dukan-diet-2-steps-lose-
weight-2-steps...](http://bookflavor.com/dukan-diet-2-steps-lose-
weight-2-steps-forever-pierre-dukan-0307887960)

The API results:

[http://gistpoint.com/get?u=http://bookflavor.com/dukan-
diet-...](http://gistpoint.com/get?u=http://bookflavor.com/dukan-diet-2-steps-
lose-weight-2-steps-forever-pierre-dukan-0307887960)

Rather than choosing a snippet of the main text, it chose a few words from one
of the menu items...

If you can fine tune the algorithmic generation of snippets in the API this
could be very useful. Keep up the good work!

~~~
ultrasaurus
Passing it through readability.com first improves it dramatically:
[http://gistpoint.com/get?u=https://www.readability.com/artic...](http://gistpoint.com/get?u=https://www.readability.com/articles/ocp6wcn9)

It would be nice if the API would take a parameter for the approximate number
of words to return, so a 100 word vs a 500 word summary.

~~~
NathanKP
Good idea... I'll check Readability's API's then, perhaps using a combination
of both API's will generate good snippets of the pages on my site.

------
dfranke
Apparently I can quickly produce extremely high-scoring gibberish using Emacs
M-x dissociated-press.

------
rexreed
User-generated Cliff's Notes for online articles? I can see some value there,
but in as much as Cliff's Notes has value. I worry sometimes about the
intellectual shortcuts people are taking in not reading articles completely or
depending on other's feedback / scores / comments to determine the value of
the content. Since when was reading for comprehension a chore? I know people
are short on time, but is it wise to shortcut really understanding something?

~~~
quan
I understand your point, the way I see people use this is to determine before
committing their time if the article content fits their interests and worth
their time. It's not to bypass reading articles altogether, it's to filter out
what they want to read.

------
runjake
I don't remember what tldr means. Someone told me it was something from
Reddit, but that's as much as I can recall.

Can we use words than the general HN audience would understand?

~~~
m_myers
Too Long; Didn't Read. It's a short way to say something is long (or that your
attention span is short).

When it is used at the start of a sentence, it is introducing a short summary
(ostensibly accurate, but sometimes a joke).

Someone on Quora says it actually dates back to Slashdot, not Reddit:
<http://www.quora.com/What-is-the-origin-of-TL-DR>

~~~
runjake
Thanks, this oughta cement it into my head this time. I don't remember it from
my Slashdot days, but those days were a long, long time ago.

------
brosephius
nice idea, but what's the incentive for me to submit a summary?

~~~
quan
may be I'll let people specify a link to their twitter profile

------
dorkitude
Comic Sans?

~~~
quan
I know a font police is going to catch this crime, I hope it's appropriate for
the posit notes :)

~~~
NathanKP
In my personal opinion your use of Comic Sans is fine considering the context,
but you could placate the font connoisseurs and Comic Sans haters by using a
handwriting font face such as one of these:

[http://www.google.com/webfonts?subset=latin&category=han...](http://www.google.com/webfonts?subset=latin&category=handwriting)

