

From tl;dr to Techcrunch: My Rumble app's story - jeremymcanally
http://omgbloglol.com/post/1373695996/from-tl-dr-to-techcrunch-my-rumble-apps-story

======
atomical
[http://omgbloglol.com/post/1373695996/from-tl-dr-to-
techcrun...](http://omgbloglol.com/post/1373695996/from-tl-dr-to-techcrunch-
my-rumble-apps-story)

->

<http://tldr.it/summaries/7744>

Short, Medium, and Long show the same short piece of insignificant text.

~~~
jeremymcanally
Yup, incredibly ironic actually. :)

I know why that's happening, but I unfortunately can't fix it due to Rails
Rumble rules. I'll be overhauling the content extraction over the next little
while using what I've learned from stuff you guys have submitted. Example: Did
you know a lot of sites don't actually use <p> tags? Or that some sites have a
metric ton of JS (many times more than the HTML in the <body>!)? It's awesome!
:(

~~~
atomical
I have been working on a similar problem and was surprised to find that there
isn't a solution that can extract text perfectly (or is there?).

~~~
jeremymcanally
Unfortunately there isn't anything near perfect as far as I've found.

There are a lot of clever things you can do, but markup is incredibly hard to
parse (in the "What's actually in here?" sense) and it's even harder to
determine the original author's intent for elements. HTML 5 sites makes this
much easier, but that's like 0.00000001% of pages on the Internet at this
point. :/

------
marknutter
I made a similar site a year or so ago called synopit.com but that relied on
user generated content rather than algorithmic summarizers. I had some good
traction early on, covered by readwriteweb and a few other decent
publications. Me and a buddy were generating most of the content, usually of
popular stories on reddit or digg, to which we would link in the comments.
That generated a good amount of traffic, but we kind of got busy and then let
the idea fall by the way side.

I still think user generated content is the way to go. A good example is this
summary posted in the comments of the Ars Technica review of the Windows 7
phone: <http://news.ycombinator.com/item?id=1818782> \- that kind of summary
just can't be programmatically generated (yet, anyway). I'm glad tlrd.it is
having such early success, it kind of validates the market for me and may
inspire me to pursue my idea further.

