Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

From a quick test, it seems to treat almost every bit of content on a page equally, even elements which are clearly smaller and next to an image.

Might I recommend taking CSS styles into account? Large text is usually headlines, <strong> text is usually important, and darker greys generally suggest a side comment. Would be much easier if everybody used <aside> and <h1> but even in 2013 that's too high an expectation.



You are right, I'm not taking account of HTML tags. It is because I extract the text beforehand using Pythoon Goose. In that sense, only the text will be feed in the algorithm without any HTML tags.


Try https://github.com/visualrevenue/reporter :) I'm looking at your service now and it is really massively awesome. Can I ask, if you are considering monetizing it, or going the venture-path (boo)? I ask this because I'm curious on the viability of using your service/library on a long-term project.


He's monetizing it as an API here https://www.mashape.com/mojojolo/textteaser




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: