
Show HN: txtify.it – Easily convert web articles to plain text - k1m
https://txtify.it
======
DrScump
Nice!

Suggestion: allow larger line widths to be specified (for fewer, longer lines
in large paragraphs).

~~~
k1m
Thanks! I'll try to add that in the next update.

------
ksk
Very Cool! I like the simple no-nonsense UI. What is your goal with the
project?

~~~
k1m
Thanks! :) No big goals with this. Thought it'd be a useful tool, especially
considering how big and bloated web sites are getting these days. So we're
going to try to keep it around as it is. It's a simple wrapper around our
Full-Text RSS product, which, as of the latest release, can output plain text
like this. Demo of that available here:
[http://95.216.144.183](http://95.216.144.183)

------
DecayingOrganic
Just a little feedback, there's a small problem with The Economist. It just
includes "get our daily newsletter" thingy. Also cool project!

~~~
k1m
Have you got a URL I can try? This one worked:

[https://txtify.it/www.economist.com/news/finance-and-
economi...](https://txtify.it/www.economist.com/news/finance-and-
economics/21741563-fourth-our-series-professions-shortcomings-economists-
focus-too)

~~~
DecayingOrganic
Here is it:
[https://txtify.it/https://www.economist.com/news/leaders/217...](https://txtify.it/https://www.economist.com/news/leaders/21741547-regulators-
should-squash-plans-big-telecoms-merger-america-t-mobile-and-sprint-plan)

It also shows up in your example as well. Maybe you don't get bothered by it?
:)

Please note: the bug is inconsistent, it pops out inconsistently.

Here's the part: "GET OUR DAILY NEWSLETTER Upgrade your inbox and get our
Daily Dispatch and Editor&#x27;s Picks."

~~~
k1m
I thought you weren't getting the content text at all. :)

Should be fixed now. We now strip that out:
[https://github.com/fivefilters/ftr-site-
config/blob/master/e...](https://github.com/fivefilters/ftr-site-
config/blob/master/economist.com.txt)

But I don't consider that a big problem - lots of sites include things like
that in their content. It's annoying, but no easy way to identify them all
accurately and remove them without custom rules.

~~~
DecayingOrganic
Oh, it's certainly not a big problem, I just wanted to let you know :)

liked the project by the way!

~~~
k1m
Thanks! :)

------
thrownaway954
The example should rotate between what people have previously converted. Right
now it is static.

~~~
k1m
That would bother people. Say you convert something that's private or not
public yet. Not sure you'd want the next person to see it. I don't like it
when sites do it.

------
masukomi
Am i right in thinking the source code will not be available for this?

~~~
k1m
Any particular aspect of the code you're interested in?

The article detection, extraction and conversion happens with our Full-Text
RSS product which we sell (see link for demo in another comment). So the code
here is mostly sending a HTTP request to that and asking for plain text
output. The HTML to text conversion in Full-Text RSS uses the Html2Text
library found here
[https://github.com/mtibben/html2text](https://github.com/mtibben/html2text)

------
z1mm32m4n
I’ve frequently needed to access a plain text version of a site in a bash
script or from the command line. For this, I use the command

    
    
        w3m -dump <url>
    

which dumps the website’s text content to stdout.

~~~
k1m
One of the things we do with txtify.it is try to find the content block in the
HTML first, then convert only that from HTML to plain text. So in situations
where the input URL is a news article or blog post, the output should be
cleaner.

~~~
RunningRabbit
Where is your privacy policy?

~~~
k1m
We'll have one up soon. But essentially no user accounts, and no logging on
our servers for this service. Requested articles are cached for a few hours
then deleted.

~~~
RunningRabbit
I count this as a proper privacy statement already :)

------
grblovrflowerrr
Hey this looks really cool. I think the typography could be improved though,
check out
[http://bettermotherfuckingwebsite.com/](http://bettermotherfuckingwebsite.com/)
for some inspiration/guidelines on how to accomplish that.

~~~
k1m
Thanks! :) The output is literally plain text (Content-Type: text/plain).

But, I realised it's possible to send a stylesheet in the HTTP header. Chrome
seems to ignore it, but Firefox uses it:

    
    
      Link: <https://txtify.it/dark.css>; rel="stylesheet"

~~~
Boulth
Very cool! Could you also add viewport css so that it looks better on mobile
Firefox? (it already looks nice)

~~~
k1m
Thanks! Tried adding this to dark.css

    
    
      @viewport {
        width: device-width, initial-scale=1;
      }
    

But didn't work in mobile Firefox.

Seems Firefox and Chrome on mobile handle plain text differently. Firefox
shows the text quite small (smaller than if it tried to fit it to device
width). Chrome on Android, on the other hand, doesn't seem to try to fit the
content at all, showing it at fixed scale which causes additional wrapping to
the already 70-char wrapped text we output.

Happy to experiment more with CSS, but ultimately this is simply supposed to
be plain text output, so optimal readability isn't really the goal here -
although would be nice to improve it. :)

