
Show HN: HNText – Top stories daily cleaned for readability - bckmn
http://www.hntext.com#2014-10-14
======
jawns
Three criticisms, in descending order of importance:

1) The length of these excerpts goes a bit beyond what is typically considered
fair use. You're pretty much reprinting the entire articles. So you may run
into copyright infringement claims.

2) It looks like you're scraping the content, and your scraper isn't up to the
task. One of the excerpts reads: "Your email has been sent. An error has
occured and your email has not been sent. Please try again. • You can't enter
more than 20 emails. • You must enter the verification code below to send. •
Invalid entry: Please type the verification code again."

3) Borrowing the multicolumn text layout predominant in print is great when
the columns all fit on the screen, but I found myself having to constantly
scroll up-down-up-down-up-down to read these.

So, in sum, you're only presenting the top stories successfully some of the
time; when you are successful, you're infringing on copyrights; and the layout
is not actually cleaned for readability.

------
m90
Interesting idea,
[http://www.hntext.com/#543c2fd8ba03530200000008](http://www.hntext.com/#543c2fd8ba03530200000008)
has "broken" special characters though.

~~~
Meekro
I'm having this problem as well. I took a screenshot just in case op sees it
normally and doesn't know what we're talking about:
[http://i.imgur.com/0GlHrlc.png](http://i.imgur.com/0GlHrlc.png)

Aside from that, looks like an interesting service, though!

------
btbuildem
Speaking of readability, have you heard of
[https://www.readability.com/](https://www.readability.com/) ? Works wonders
on all but the most stubborn webpages.

~~~
eli
[http://www.diffbot.com/](http://www.diffbot.com/) offers this service as an
API

------
JacobJans
This is absolutely horrible and shows complete lack of respect for copyright.
It should be taken down immediately.

~~~
Houshalter
This isn't qualitatively different than readability or numerous other
extensions. The only difference is he is doing it server side.

~~~
declan
> The only difference is he is doing it server side

Ah, but from a legal perspective, this makes all the difference. It is, I'm
sorry to say, copyright infringement and goes beyond what fair use permits.

~~~
Houshalter
Yes I understand that, I said it isn't _qualitatively_ different. If the web
page somehow distributed javascript that would do the same thing client side,
the website would appear exactly the same, yet would be perfectly legal.

The comment I was replying to wasn't arguing that it was illegal, but actually
wrong and "lacking respect". I was just pointing out that he should have the
same opinion towards things like readability.

Also I think readability did do it server side, and so do other things like
RSS readers.

~~~
declan
Without taking a position on Readability, there's more to the calculation than
client side vs. server side. For more see my post here:
[https://news.ycombinator.com/item?id=8455648](https://news.ycombinator.com/item?id=8455648)

------
JustThrowinAway
My first thought was how these articles were in no way more readable than the
originals.

Does anyone prefer a multi-column layout?

~~~
singlow
Multicolumn is fine as long as the vertical height is less than the browser
window. For long text this is really hard to make work in the browser unless
you take over the pagination with javascript. This is not a good example of
using multicolumn text.

------
shmapf
For those wondering about the usefulness of this. It should be great for those
who commute on the tube or somewhere else where there is no phone signal. This
lets you load a single lightweight page before boarding, and then be able to
browse hn easily. Would be ideal if it was sent out in an email though so you
don't have to remember to load the page every time.

------
andremendes
[http://getpocket.com](http://getpocket.com) do this "cleanse for readability"
to the saved links but I think they got closer to readable than you did. If
you get to avoid copyright infringement, and get rid of this multi-column
layout, I'm sure it would be better.

------
dotsam
The generated podcast is also presumably a violation of copyright (i.e. it
creates a derivative work).

------
Xeoncross
Neat idea, I'm sure you are violating fair use - but it is a neat little
mashup of scrapers and text-to-speech.

I won't ever use it though. I have a bookmark for cleaning up pages.

------
kolev
I think a script that does this, but puts in your Pocket with an HN label
would be even better - you keep track of what's read and what's not as a
bonus.

------
lziz
Cleaned for readability, eh? -
[http://i.imgur.com/faqQrQi.png](http://i.imgur.com/faqQrQi.png)

