I am delighted to share my new project "TLDR This" with you guys.
Problem : There's so much content out there but too little time to read. So many times, it happens that there's a long article that we're interested in, but we don't really feel like scouring through it to extract relevant information from it.
Hence, I created "TLDR This" to help you navigate relevant content quickly and easily, without having to read the whole thing.
Steps to get cracking -
1. Copy and paste either the URL or the text of the article you'd like summarized.
2. Press the "Process Text" button, and you are good to go.
TLDR This also comes with a chrome extension, allowing you to summarize any webpage at the click of a button.
How to use the chrome extension?
Just click the “tl;dr” button in Chrome's toolbar on a webpage which you'd like summarized and within a few seconds, you'll get the < 5 sentence summary right there.
Please let me know if you have any feedback/suggestions.
Thanks a lot
Each section subtitle is displayed as a sentence. Could you maybe ML generate a 1 sentence summary for section? I don't think there is an ML which could extract the take aways from each section and make a summary correctly, but that was my original expectation when I saw it first time.
There's plenty of abstractive and extractive methods for text summarization. In fact, that's what I'm working on right now, is to develop a news article summarization system for longer articles.
If you're interested in learning more, there's an excellent survey paper on ArXiv: https://arxiv.org/abs/1812.02303
The disadvantage of using only token-based attention is that it's still not enough context for large spans of text (> 1000 tokens), and I'm actually having this issue now in my work. That's where things like applying attention to chunks of text is helpful.
I'm still trying to fully understand attention mechanisms, so I hope this comment made sense.
As long time language learner I can tell you that the most significative words in a text are the less common (that's why learning a language by lists of common words does not work out of the box). In the sort of process you mention the words with more meaning for the idea to communicate might have easily less weight than other more common tokens at the same texts, and it very easily depends on the communication style of the subject.
For me this way of making a summary, looking at the language as a sum of interconnected tokens, sounds like trying to replicate a paint by measuring brush strokes directions and pigmentation (vectors of values) rather than trying to understand the abstract concept or idea behind (a vase with flowers, fruits) and recreating it (use the skill of extracted with the vectors to recreate the abstract idea which was encoded behind).
I think I might end up reading the paper you mentioned.
I tried it on a wikipedia article and it didn't work too well.
Firefox uses the exact same extension format, and except for non-trivial cases, “porting” an extension to Firefox involves literally no code-changes.
Consider publishing your extension for Firefox too. You just need a Mozilla-account to publish it on addons.mozilla.org. That’s pretty much it.
That's a very good suggestion. I didn't know it was that easy to build an extension for Firefox. I will publish it on the Mozilla store as soon as possible.
Thank you very much for taking the time to use the app and providing your feedback.
Check out their Github repo here - https://github.com/mozilla/readability
>You need at least one <p> tag around the text, you want to see in Reader View and at least 516 characters in 7 words inside the text.
Source - https://stackoverflow.com/questions/30661650/how-does-firefo...
i've some experience in this field:
- This is text-extraction, NOT text-generation
- TextRank algorithm is so far fine, but it does not write a "summary", instead it ranks the components of a text according to some "metrics" (simply spoken)
- Using this approach will still make you attackable by copyright claims from copyright owners
- Which stuff is summarized ("put to the final output") is not always clear to me in your implementation, i tried it on some newspaper & blog articles; on some it worked well, on others it didn't.
Funny thing is, i'm currently working on something similar with a slightly different twist - i will post it here if finished, than we can go into a battle :-)
- Yes, you are totally right. TextRank is an extractive method.
- Ah right. But we aren't storing any information on our servers, just showing selected sentences to the user.
- We select the top 5 sentences which have the highest relevancy to the article. I am not an expert in this field so not too sure if that's the best way. Just started with NLP a few days back and wanted to test it out by developing a small application.
Yes, it works on quite a few articles and but also there are some articles where it fails to give accurate results.
Ah nice, I would like to hear more about what you are working on. Let me know if I could contribute to it in some way.
Thank you again for your feedback.
If this is in reference to the copyright comment, it doesn't matter -- you're still transmitting/redistributing the content, which is what matters. One way to get around this is to ship the code and have the code execute on the user's machine (i.e. what you're presumably doing with the extension).
Plus moving it to the client side would free up whatever resources they are currently using to feed summary info to us.
Edit: How about the backend just returns pointers to the text (word #x till word #y) and the js just (re)assembles it?
you're still transmitting/redistributing
So they provide a service that includes storing your content in it's entirety.
Has this ever been tested in court?
They are showing a small extract for context OR a summary specified by the publisher.
That’s completely legit and fair use.
Because there is plenty of precedent for this in available APIs and I've never heard of a case claiming this.
Though, just by copying & summarizing with your current implementation, there would be NO ONE to sue you, since you are just grabbing it and displaying it in the browser (sure, depending on the jurisdiction, one may rate this simple step already as some type of copyright issue)
In reality, this will not happen. (Except in North Korea ;-)
My comment regarding copyright was really about grabbing, summarizing and re-distributing it on another webpage, like a news aggregator.
I'm pretty sure this would be covered by fair use.
When you Google a newspaper article you get a verbatim snippet, same concept.
You don’t need any seed, and can generate summaries (section 3.6).
GPT-2 is the model to learn about if you’re interested in NLP.
I am sorry to hear it didn't work on the article you tried.
I have personally tried it on a number of articles and it seems to provide good results. Also, I have received feedback from a number of users who say that it worked for them. But yes there are some articles where it fails to give accurate results since the "article summarisation" technology is still in its development stage.
We will try to improve it in the next version.
CX_DB8 also supports word and sentence level extraction. I plan on adding paragraph extraction in a future update.
If your idea had a company behind I would invest in it. I wish it was a product which would generate a decent enough executive summary, so if I decide to invest more time into reading a whole article or just move on and filter all the BS floating on the internet.
HOWEVER the current status is far away from the desired expectations; try to input the "burnout" entry from HN top page and see the output (or any other article), and I am afraid the concept will stay science fiction for very long time since the amount of intelligence required to make executive summaries is sometimes not even met by new fresh university graduates.
http://www.gutenberg.org/cache/epub/1777/pg1777.html > https://i.imgur.com/pRX5eeP.png
so I looked for something that was just the play http://shakespeare.mit.edu/romeo_juliet/full.html >
and there are only a handful character entrances
we seems to be quite a way off from summarizing 'anything'.
Use state-of-the-art pseudonymization and anonymization methods to secure your data in real-time.
I think it's a good summary of what we do, but I wouldn't call this summarization as I would think that implies synthesizing new content from the existing one, it's more a highlighting service I'd say. For other pages it doesnt' works so well, so maybe I just got lucky.
It is mainly designed for articles so it wouldn't give any useful info when used on landing pages etc.
There are two kinds of article summarization. We use the extractive approach wherein top n relevant sentences from the article are selected. If the document is too short, it wouldn't select many sentences.
Thank you very much for taking the time to use the app and provide your feedback.
Did it for this article to test it out, it only featured her personal life and none of her accomplishments. Love the idea
In my case, the summarization was manual - I wanted to see if people liked the service, before building the summarization engine.
I also created a chrome extension to be able to see summarize with just one click from HN itself.
That lead into what I believe is the harder problem which is that in order to provide a useful text summary of articles of any topic is that you need to have people who understand the article content.
To make a summary service which is useful, we would need to have “summarizers” who can not only understand a myriad of topics but also possess a deeper understanding of the article domain so the summary is a faithful “distillation” of the original. As we all know, this is not easy, unless you have some expertise in that particular field.
This sort of reminds me of on demand translating services. In a past life I had the need to use translators for technical and legal documents. There were services that offered translators who specialize in certain areas, such as law or technical software writing, for example.
We are glad that you liked it.
We also have a chrome extension which does exactly as you say - https://chrome.google.com/webstore/detail/tldr-this-free-aut...
Unfortunately, this selected almost only the gallery credits lines (By (name) with (person one, person two))
How should I say to automated things: this html is not important for extraction?
Hopefully you'll see my paper about this in EMNLP 2019 System Demos...
>What to Submit On-Topic: Anything that good hackers would find interesting.
which isn't bad really.
Yes, if the webpage doesn't have that much content then it would return the most relevant line.
One important feature, to help improve accuracy, is to display 2 big buttons at the bottom, a thumbs up and a thumbs down, meaning "this TL;DR is reflective of the actual content" or "this TL;DR is not reflective of the actual content".
That's a very good suggestion. We will try to incorporate it in our next update.
That's a very good suggestion.
By tweet bot, you mean when someone tweets an article link to the bot, it would respond back with the summary of that article, right?
I'm writing a Rust library that for Archival purposes I want to immutably refer to source content. So in short, I need to download it and store it immutably. Yet, don't want to grab all the html, UI images, ads, etc - I just want the content. I've found Outline.com amazing, but the tool I'm writing is "distributed", so I don't want to depend on a service.
Anyone familiar with local tooling for these types of services? TLDR, Outline, etc?
There is also a JS library called readability which is what is used by Firefox's reader mode.
The autotldr bot has been active for a number of years on reddit, summarizing articles quite well.
The goal is the same.
TLDR This uses a different algorithm. It is a side project I developed to test out NLP.
Although there are some web apps out there, that allow you to summarize an article, there really isn't any useful chrome extension that does the job well. TLDR This comes with an accompanying chrome extension as well.
Any high level info on how it works?
> Thanks a lot. Sure, first we create a list of all the individual sentences in the article. Then, we use the TextRank algorithm to rank each of the text sentences. Finally, we select the top 5 most representative sentences from the article.
I will try to port it to Safari.