
Show HN: Pass a URL, get summarized content - meeper16
http://54.86.121.4/recommend/getSummary.html
======
peter_l_downs
Not summary, just sentence ranking and extraction. Still cool but not anything
new. Sweet side project though! For anyone wondering how this was done, I have
a similar project up at [http://bookshrink.com](http://bookshrink.com) (source
code at
[https://github.com/peterldowns/bookshrink](https://github.com/peterldowns/bookshrink)),
although I don't fetch article text.

~~~
mck-
I did a similar hack a while back, which summarizes a piece of text in a
single sentence. I'm that lazy.

[https://github.com/mck-/oneliner](https://github.com/mck-/oneliner)

~~~
gravypod
I'm writing a book right now. Would you mind if I used your program to make
the title and the chapter titles?

------
despinozist
You should render the output as actual JSONAPI
([http://jsonapi.org](http://jsonapi.org)):

    
    
        {
          "links": {
            "self": "...",
            "prev": "...",
            "next": "..."
          },
          "data": [],
          "included": []
        }
    

So that we can discover the API beyond the form. Use
[http://www.iana.org/assignments/link-relations/link-
relation...](http://www.iana.org/assignments/link-relations/link-
relations.xhtml) as the starting point for link relations ideas.

~~~
wyldfire
I was pretty quick to knee jerk ask myself "Why is this any better than any
other schema?" (I was not convinced that "API discovery" was, by itself, a
good enough case).

Then I read the very practical first sentence of the jsonapi page: "If you've
ever argued with your team about the way your JSON responses should be
formatted, JSON API can be your anti-bikeshedding tool." That alone is
probably huge. May not mean much for individual projects, but it's good enough
for me to bookmark for the future.

~~~
fishnchips
Can't help but think of [https://xkcd.com/927/](https://xkcd.com/927/) ;)

Not sure if standards like this can prevent bikeshedding. You can always
bikeshed about the need to stick to any particular standard. One
counterexample to what I'm saying may be one standard Go language formatting
with gofmt but that was introduced very early on and became a part of the
culture. Too late for that with JSON APIs.

------
ComputerGuru
I'm not sure I am seeking the same wow-factor results from the service that
everyone else is raving about.

I submitted this link [0] which was on the HN homepage a couple of days ago
and the results that I got back were more either the least important bits or
in some ways implying the _opposite_ of the article, so either the writing was
really bad or the algorithm needs some work.

Submitting a "simpler" less-ranty article [1] was even less successful,
leading to paraphrases of less-important sentences as the results.

Then I submitted the BBC article from this morning about Philae [3] and
received much, much better results. I think it works best on articles that
have single sentences that clearly sum up the gist of the post as a single,
hard fact and doesn't work with anything that works towards logical
conclusions or tries to build an argument. Which makes sense, because this
isn't an AI and can't actually deduce anything.

0: [https://neosmart.net/blog/2016/on-the-growing-intentional-
us...](https://neosmart.net/blog/2016/on-the-growing-intentional-uselessness-
of-google-search-results/)

1: [https://neosmart.net/blog/2016/when-is-the-2016-retina-
macbo...](https://neosmart.net/blog/2016/when-is-the-2016-retina-macbook-pro-
coming-out/)

3: [http://www.bbc.com/news/science-
environment-35559503](http://www.bbc.com/news/science-environment-35559503)

~~~
detaro
> _I 'm not sure I am seeking the same wow-factor results from the service
> that everyone else is raving about._

Um... where is someone raving about the result? Most of the comments seem
neutral to negative to me?

~~~
lpage
> _Most of the comments seem neutral to negative to me?_

Shameless plug, thanks to HackerMoods [1] I can quantify that statement: 0.85
neutral, 0.08 positive, 0.07 negative. The average Show HN is 0.17 positive
and 0.04 negative, so your assessment is in line with the numbers.

[1]:
[https://news.ycombinator.com/item?id=11188633](https://news.ycombinator.com/item?id=11188633)

~~~
Shamiq
I'm color blind, and the charts you use are unintelligible to me.

~~~
lpage
Sorry about that. Design isn't my wheelhouse but I updated it to what google
tells me is a colorblind friendly palette. I would definitely appreciate it if
you could take a look and let me know how it is.

------
xlayn
I would risk to say it works based on assigning information weight to words,
number of non repeating and the way they are related and then filter top down.

I did try it with a link I particularly like

[http://multivax.com/last_question.html](http://multivax.com/last_question.html)

with the following response.

{"1":"nor could anyone for the day had long since passed zee prime knew when
any man had any part of the making of a universal ac","2":"zee prime's
mentality was guided into the dim sea of galaxies and one in particular
enlarged into stars","3":"he gave no further thought to dee sub wun whose body
might be waiting on a galaxy a trillion light-years away or on the star next
to zee prime's own","4":"the universal ac said man's original star has gone
nova","5":"the universal ac interrupted zee prime's wandering thoughts not
with words but with guidance"}}

------
ShinyCyril
Hmm didn't have much luck with: [https://mikeanthonywild.com/stopping-
blocking-threads-in-pyt...](https://mikeanthonywild.com/stopping-blocking-
threads-in-python-using-gevent-sort-of.html)

    
    
        { 
            "1":"betterthreads provides an enhanced replacement for the an enhanced replacement for the python this isn't actually a true thread instead it uses gevent to",
            "2":"the widely-accepted solution is to set a timeout on our blocking functions so we can periodically check a which we set from the main thread to indicate we want the child thread to stop",
            "3":"if the thread is still alive the when the *timeout* argument is not present or ``none`` the operation will block until the thread terminates",
            "4":"`runtimeerror` if an attempt is made to join the current thread as that would cause a deadlock",
            "5":"`join` a thread before it has been started and attempts to do so raises the same exception"
        }
    

That said, I think to summarise that particular would require a certain level
of domain expertise, something which a general bot couldn't provide.

------
phdsummary
The summary of that guy's Phd summary [http://jxyzabc.blogspot.com/2016/02/my-
phd-abridged.html](http://jxyzabc.blogspot.com/2016/02/my-phd-abridged.html)

{"1":"for various reasons i also spend a lot of weekends in new york and make
more friends with people working on data and journalism","2":"my friend jean-
baptiste who reads it asks why my blog is so good but my paper drafts are so
bad","3":"at the beginning of this year i start telling people that i wish i
had more female friends since i realize that there are many fewer women around
me than before","4":"to keep myself from thinking about my uncertain future
all the time i start a cybersecurity accelerator cybersecurity factory with my
friend frank wang with the goal of helping research-minded people start
companies","5":"i am too lazy to make many friends so i spend my free time
reading cooking doing yoga and running"}

------
Animats
Summarization used to be a feature in Microsoft Word through Word 2007, and it
did a decent job. That feature was taken out in Word 2010.[1]

[1] [https://support.office.com/en-US/article/Automatically-
summa...](https://support.office.com/en-US/article/Automatically-summarize-a-
document-B43F20AE-EC4B-41CC-B40A-753EED6D7424)

~~~
skewart
I didn't know that. Any idea why it was taken out?

~~~
lallysingh
I suspect that office has to garbage collect features once in a while.
Otherwise the maintenance cost would be (more?)horrible.

------
fiatjaf
[http://52.90.112.133/recommend/app/getSummary?query=http%3A%...](http://52.90.112.133/recommend/app/getSummary?query=http%3A%2F%2Ftomwoods.com%2Fpodcast%2Fep-597-can-
the-private-sector-protect-against-crime-this-case-study-will-blow-your-
mind%2F&getSummary=getSummary)

------
_RPM
An array would be a better choice of structure for the sentences instead of
hard coding the indexes.."1"...

~~~
brudgers
I see your point and don't disagree.

Thinking about why someone might mike the choice to use text, text is more in
keeping with *nix philosophy. Not that I'm saying it's better, but grep is
pretty light weight and a lot of people use the command line and/or languages
other than Javascript. YMMV.

~~~
_RPM
I'm not sure I understand. JSON is text. JSON provides an array as part of the
grammar.

------
microcolonel
Good quality summaries, but it seems it caches pages based on their base URL,
and throws away the query parameters.

Some blogs use query parameters to distinguish between articles, so it makes
it kinda useless if you want to do more than one article.

------
andreygrehov
This is an off-topic, but I'd like to mention it.

I absolutely love the fact that the OP did not get a domain name for this
demo. This is an interesting "technique" I haven't seen for quite a while.
People tend to own and re-new tenths of domain names, which are just sitting
there for an "just in case" moment. This is a great example of how things can
really be simplified - spin up an instance, make a demo, shut the instance
down.

~~~
developer2
Good luck with the link still being usable in a month or a year. There's a
reason we use domain names for sharing. Not only because they are friendly to
read and remember, but also because IPs are typically far more transient than
domain names.

The IPs behind my projects have changed dozens of times over the years (new
server, changing hosting provider, adding a load balancer, etc.). A simple DNS
change allows the same domain name to follow the project.

I'm actually surprised HN permits links to IP addresses. While links posted
here are not guaranteed to point to the same content in the future anyway, it
is more likely that an IP address will change before the project is taken down
entirely. Search engine posterity and all.

------
shloub
« URL's must start with "[http://"](http://") » «
[https://medium.com/@darrenrovell/all-journalists-need-to-
be-...](https://medium.com/@darrenrovell/all-journalists-need-to-be-data-
driven-6dfc73e420d5#.mz8vd1myq) »

------
hluska
This is an exact duplicate (even posted by the same person) of a link
submitted 11 hours ago.

[https://news.ycombinator.com/item?id=11190008](https://news.ycombinator.com/item?id=11190008)

Edit - it doesn't work for me either, or maybe it is just very slow?

~~~
meeper16
Yes, it is. I sent it out too late last night and thought more people might
want to see this in the morning.

~~~
gus_massa
I think this it's ok here. From the FAQ:
[https://news.ycombinator.com/newsfaq.html](https://news.ycombinator.com/newsfaq.html)

> _Are reposts ok?_

> _If a story has had significant attention in the last year or so, we kill
> reposts as duplicates. If not, a_ small _number of reposts is ok._

> _Please don 't delete and repost the same story, though. Accounts that do
> that eventually lose submission privileges._

------
mohaps
Any details about the backend/implementation?

shameless plugs for two similar projects(open sourced both) I did a while back
1) Algorithmic Summarizer:
[https://github.com/mohaps/tldrzr](https://github.com/mohaps/tldrzr) 2)
Readability Clone / Article Body Extractor with summary, significant image and
text :
[https://github.com/mohaps/xtractor](https://github.com/mohaps/xtractor)

Both are deployed on heroku and the urls are in the github readme files.

------
h1fra
Huum, not quiet sure what is was expecting, but the results were not great :(

But I could see the use of this kind of service.

Also does not work with accentuated char.

------
an_ko
I'd like more details. How does it work?

------
LinkPlug
What is it built with? (Stack, Foss etc)

------
Mark_B
Fun with Lorem Ipsum:

[http://52.90.112.133/recommend/app/getSummary?query=http%3A%...](http://52.90.112.133/recommend/app/getSummary?query=http%3A%2F%2Fwww.lipsum.com%2Ffeed%2Fhtml&getSummary=getSummary)

~~~
meeper16
It seems to be multi-lingual

------
jack9
[http://www.slashdot.org](http://www.slashdot.org) and
[https://news.ycombinator.com](https://news.ycombinator.com)

{"summarized_text": {}}

------
franze
if you submit
[http://54.86.121.4/recommend/getSummary.html](http://54.86.121.4/recommend/getSummary.html)
to
[http://54.86.121.4/recommend/getSummary.html](http://54.86.121.4/recommend/getSummary.html)
you get

    
    
      {"summarized_text": {"1":"insert any block of text or single url url's must start with biomimic@gmail.com"}}
    

which is of course completely wrong

------
tuananh
urgh: empty

[http://54.86.121.4/recommend/app/getSummary?query=http%3A%2F...](http://54.86.121.4/recommend/app/getSummary?query=http%3A%2F%2Fbongdaso.com%2FThua-
Man-
City%252c-Klopp-v%25C4%2583ng-t%25E1%25BB%25A5c-lo%25E1%25BA%25A1n-x%25E1%25BA%25A1-_Art_160672.aspx&getSummary=getSummary)

------
LinkPlug
What are some alternatives to this?

~~~
jbeda
Check out [https://algorithmia.com/](https://algorithmia.com/). Stuff like
this plus a bunch more. Real business model so you can have more confidence in
it.

------
gkumartvm
Wordpress sites urls are not working !!

------
orliesaurus
Not really working as expected :(

------
dang
Url changed from
[http://52.90.112.133/recommend/getSummary.html](http://52.90.112.133/recommend/getSummary.html)
by submitter's request.

