

Show HN: Hacker News with categories and popularity - federkasten
http://newshack.io/
Did you want to know topics of the news or popularity of posts?
&quot;Hacker News Hack&quot; helps you to find news based on some categories and importance or popularity.<p>It tagged news with 9 categories with a natural language processing technology and supervised learning for categorizing news.<p>It also calculates a speed of number of comments in each posts as popularity based on term, that is 1 hour, 1 day, 1 week, 1 month, or 1 year.
======
jasonkester
It's a sad commentary on the current demographics here that there is no
"entrepreneur" category at all. And that the next closest thing, "business"
has only 3 articles, none of which are related to running a business.

I'd noticed myself finding less and less to click on on the front page over
the last year, but this has really driven home why.

~~~
ScottWhigham
I'm not sure that the "business" category means what you or I expected it to
mean. It's certainly nothing to do with being an entrepreneur - as I write
this there are four articles:

DEA redacts tactic that's more secret than parallel construction

Jamaican Bobsledders Ride Dogecoin Into Olympics

Swiss police: Screen in Tesla cars is too large

Ask HN: What will you pay for?

============

I guess maybe that last one could be somehow related but IDK. Seems that
important things are missing from the categories (like entrepreneurship or
running a business, which are not the same thing). In another comment on this
thread, it was mentioned that these are auto-tagged - maybe they can train it
a bit better. Those are, to me, off-topic for 2008 HN but right in the
wheelhouse for 2014 HN.

~~~
davidw
> Those are, to me, off-topic for 2008 HN but right in the wheelhouse for 2014
> HN.

The guidelines haven't actually changed any since then.

~~~
ScottWhigham
haha yes, but would they have made the front page in 2008? I think you know
the answer to that one!

------
sharjeel
Could you please add a category "All but NSA"

~~~
collypops
Isn't that just news.ycombinator.com?

~~~
xerophtye
Nope, I see NSA news every other day on HN and it is getting rather annoying

~~~
Houshalter
Supposedly there is a filter already on HN for NSA articles, each vote only
counts as 1/3 of a vote if the title contains the string "nsa", severely
penalizing those articles.

~~~
tempodox
If that's true, I understand why I don't understand the HN sort order.

------
ColinWright
It would be useful if each article could have its tag displayed. In that way
when looked at "all" I could start to get an idea of what the tags mean to the
system. At the moment I'm reasonably often finding things tagged other than I
would expect. Showing tags next to items would help me recalibrate.

Other than that - good job. Thank you.

~~~
federkasten
Thanks.

You're right. That's nice idea! I will add that feature shortly.

------
d_luaz
Tagging of links (mostly hn), without the NLP.
[http://gcdc2013-hackerio.appspot.com/](http://gcdc2013-hackerio.appspot.com/)

~~~
dclara
What's the difference from the online bookmarks such as del.icio.us?

~~~
d_luaz
1) Tech/HN centric links and tags/categories 2) Besides bookmarking and
tagging, it also support list (a collaboratively editable list on topic such
as SEO tips)

[http://blog.gcdc2013-hackerio.appspot.com/what-is-
hackerio](http://blog.gcdc2013-hackerio.appspot.com/what-is-hackerio)

~~~
dclara
You are doing something very similar to what I've done for two years. But I'm
running a serious business and startup company with patent pending solutions.

Another similar thing between us is that when we showed HN for our projects,
we both got 0 comment from the HN community.

So the question is why they are not more welcomed than OP's NLP solution, even
if we provide more tools to save and share?

Do you have any idea? If you are interested in further discussion, I can be
reached at danmark.clara _at_ yahoo.com.

~~~
d_luaz
Because NLP is more interesting? We suck at community building? Not enough
content? No HN Luck? Perhaps it's neither interesting or good enough yet :)

~~~
dclara
Now the new search engine algorithms are gearing towards NLP led by Google's
Hummingbird, see the article here:

[http://www.wired.com/insights/2014/02/search-today-beyond-
op...](http://www.wired.com/insights/2014/02/search-today-beyond-optimizing-
semantic-web/)

The direction is right because we all need to improve the web towards Semantic
Web which is more meaningful. However, the first round of Semantic Web effort
is dead: [http://bit.ly/JP3KQO](http://bit.ly/JP3KQO)

Now it's the second round. Still the machine intelligence is not comparable
with human intelligence. So what I believe is the way we are applying:
organize the web with human interaction and automated it with some facilities.

I don't think we suck. It takes time for others to dive into this world. So
I'll continue to make this effort and push forward. The problems for yours and
mine are also pretty common: UI is not appealing, lacking of data.

I know you are running it for fun. But if there is any chance for us to work
together, I think it's the best, because you understand this area and solution
so well. Check out my failed Kickstarter project here
[http://kck.st/JNqv8z](http://kck.st/JNqv8z) and give a try of my existing
website: [http://bingobo.com](http://bingobo.com). I'll appreciate your
feedback.

------
ewoodrich
Very nice, would it be possible to have a larger selection of categories, and
the option to exclude certain categories for a more fine grained front page?

Also, I believe you haven't escaped the time field properly server-side, I see
"item.time_str" where I imagine the item time should be dynamically displayed.

------
GigabyteCoin
Pretty neat!

FYI: I just noticed your time calculations aren't working. Search for
"item.time_str" on your indexes.

~~~
federkasten
Thanks. I will fix it shortly.

------
joaomsa
Very nice. How you determine categories?

~~~
federkasten
Using natural language processing and supervised learning, it automatically
tagged news with 9 categories.

~~~
xux
That's really interesting. Will you open-source the NLP algorithm?

~~~
federkasten
I am planning to open source it in several months. (Our codes have not been
well-commented and well-structured yet...

Our implementation and algorithm detail is followings.

Its categorizing process is written in Python.

Using nltk, it makes corpus with TFIDF model from HN topics and comments. And
it generates classifiers from this corpus with SVM algorithm using scipy and
numpy.

FYI, its web interface is written in Clojure and ClojureScript.

~~~
hnriot
presumably you've trained it with hand annotated content, or bootstrapped from
a few choice hn searches (like ?q=jquery will give you a web tech category)

~~~
federkasten
Yes. You are right.

I trained classifiers with hand annotations (about 1000 contents or so)

------
harryf
Very useful! Would be good if you could adjust the time-range with respect to
the category i.e. for a popular category, 1 day of news is great but something
like jokes - currently there's just an empty page - [http://newshack.io/by-
categories?category=9&term=day](http://newshack.io/by-
categories?category=9&term=day) \- here a month is more interesting -
[http://newshack.io/by-
categories?category=9&term=month](http://newshack.io/by-
categories?category=9&term=month)

------
j45
It would be useful to have a category for startups. I find that's the content
I most look for here.

------
herokusaki
Out of all third party UIs for Hacker News I've seen (and have since forgotten
the URL for) you probably have the best domain name. It's clever and
memorable.

------
dclara
This is interesting. But it's not quite sync with the News on the first page.
What do you sort by? If it works well, I'd like to use it.

~~~
federkasten
Thanks.

It crawls HN topics every one hour. So news in our application would delay up
to one hour.

~~~
jkarneges
If you're hitting HN's access thresholds, you might consider incrementally
parsing data from [http://hnstream.com](http://hnstream.com)

That may help with analyzing comment content at least. Not so much with news
rankings though.

~~~
federkasten
Thanks. hnstream is really cool.

You're right. I have limited frequency of crawling due to HN's access
thresholds.

hnstream will help me. I will try it.

------
11185d
This is great! I would also love the ability to collapse entire threads, this
is something I never understood why it doesnt exist.

------
tempodox
I never got the original HN sorting algorithm. Looks quite arbitrary. So, any
alternative is likely to be an improvement.

------
minikomi
Just a thought - how about having the categories "toggle" rather than
"select"?

~~~
federkasten
It's nice idea. I will try it.

------
dhimes
I like this alot. BUG: item.time_str isn't evaluating when I select a
category.

------
federkasten
Updated.

1\. fix some bugs (e.g. "item.time_str")

2\. support responsive design (currently optimized with iPhone)

------
level09
looks like a lot of data have not been indexed yet, the concept is extremely
useful, and would like to see psychology as one of the categories, they are my
favourite type of content on HN.

------
tehaaron
Looks awful on mobile would be great if you could make it responsive.

~~~
federkasten
Yes. I will support responsive design in the near future.

------
TrainedMonkey
Awesome! Too bad jokes category does not have a lot of content, pity.

~~~
NAFV_P
I think something needs to be done about that.

But, humour is almost universally subjective. There are some aspects of Obama-
Care that I thought were highly amusing, especially the "500 million lines of
code" gag.

Here is a very serious article about the C programming language.

[http://uncyclopedia.wikia.com/wiki/C_programming_language](http://uncyclopedia.wikia.com/wiki/C_programming_language)

~~~
deevus
"Lisp - C for parentheses who want to meet other parentheses."

------
cogware
I find the font uncomfortably small - make the styling more like HN?

~~~
federkasten
Thanks. I will support responsive web design. It maybe solve it.

------
tomkinson
It is very useful, although I do love simplicity!

------
federkasten
Updated.

* fix some stylesheet issues (support Chrome on Android)

------
acd
Brilliant work thanks!

------
monsterix
This is super useful!

I have been forcing myself off HN lately. One thing that I miss the most due
to this is the SHOW:HNs of really cool products and hacks that hit the front
page. This gives me a great month-to-month access to those great stories at
one place.

Bookmarked!

~~~
duck
_I have been forcing myself off HN lately_

You might want to checkout my newsletter which includes the best "show hn"
links each week - [http://hackernewsletter.com](http://hackernewsletter.com).
It also helps with the HN addiction.

~~~
TeMPOraL
I've been subscribed to it for some time and while it didn't cure my HN
addiction, it lowered it a bit, and it also helps me find interesting articles
I missed during "more work, less procrastination" periods. I just want to say,
thank you for your work!

