Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Hacker News with categories and popularity (newshack.io)
149 points by federkasten on Feb 6, 2014 | hide | past | favorite | 65 comments



It's a sad commentary on the current demographics here that there is no "entrepreneur" category at all. And that the next closest thing, "business" has only 3 articles, none of which are related to running a business.

I'd noticed myself finding less and less to click on on the front page over the last year, but this has really driven home why.


Well, it seems you can't have so much success and preserve original community values. We've seen that over and over on various websites.

If you haven't seen it yet, try http://hackermonthly.com/. I'm not a subscriber, but the free issues I've seen had more business stories in them. You can download the current issue from http://hackermonthly.com/issue-45.html


I'm not sure that the "business" category means what you or I expected it to mean. It's certainly nothing to do with being an entrepreneur - as I write this there are four articles:

DEA redacts tactic that's more secret than parallel construction

Jamaican Bobsledders Ride Dogecoin Into Olympics

Swiss police: Screen in Tesla cars is too large

Ask HN: What will you pay for?

============

I guess maybe that last one could be somehow related but IDK. Seems that important things are missing from the categories (like entrepreneurship or running a business, which are not the same thing). In another comment on this thread, it was mentioned that these are auto-tagged - maybe they can train it a bit better. Those are, to me, off-topic for 2008 HN but right in the wheelhouse for 2014 HN.


> Those are, to me, off-topic for 2008 HN but right in the wheelhouse for 2014 HN.

The guidelines haven't actually changed any since then.


haha yes, but would they have made the front page in 2008? I think you know the answer to that one!


May I ask, how do you use HN these days?


Could you please add a category "All but NSA"


You could also call it "all but the single most important issue facing the technology and computing industries today," but your proposal is more concise.


Isn't that just news.ycombinator.com?


Nope, I see NSA news every other day on HN and it is getting rather annoying


Supposedly there is a filter already on HN for NSA articles, each vote only counts as 1/3 of a vote if the title contains the string "nsa", severely penalizing those articles.


If that's true, I understand why I don't understand the HN sort order.


I think that's the joke. :)


It would be useful if each article could have its tag displayed. In that way when looked at "all" I could start to get an idea of what the tags mean to the system. At the moment I'm reasonably often finding things tagged other than I would expect. Showing tags next to items would help me recalibrate.

Other than that - good job. Thank you.


Thanks.

You're right. That's nice idea! I will add that feature shortly.


Tagging of links (mostly hn), without the NLP. http://gcdc2013-hackerio.appspot.com/


What's the difference from the online bookmarks such as del.icio.us?


1) Tech/HN centric links and tags/categories 2) Besides bookmarking and tagging, it also support list (a collaboratively editable list on topic such as SEO tips)

http://blog.gcdc2013-hackerio.appspot.com/what-is-hackerio


You are doing something very similar to what I've done for two years. But I'm running a serious business and startup company with patent pending solutions.

Another similar thing between us is that when we showed HN for our projects, we both got 0 comment from the HN community.

So the question is why they are not more welcomed than OP's NLP solution, even if we provide more tools to save and share?

Do you have any idea? If you are interested in further discussion, I can be reached at danmark.clara _at_ yahoo.com.


Because NLP is more interesting? We suck at community building? Not enough content? No HN Luck? Perhaps it's neither interesting or good enough yet :)


Now the new search engine algorithms are gearing towards NLP led by Google's Hummingbird, see the article here:

http://www.wired.com/insights/2014/02/search-today-beyond-op...

The direction is right because we all need to improve the web towards Semantic Web which is more meaningful. However, the first round of Semantic Web effort is dead: http://bit.ly/JP3KQO

Now it's the second round. Still the machine intelligence is not comparable with human intelligence. So what I believe is the way we are applying: organize the web with human interaction and automated it with some facilities.

I don't think we suck. It takes time for others to dive into this world. So I'll continue to make this effort and push forward. The problems for yours and mine are also pretty common: UI is not appealing, lacking of data.

I know you are running it for fun. But if there is any chance for us to work together, I think it's the best, because you understand this area and solution so well. Check out my failed Kickstarter project here http://kck.st/JNqv8z and give a try of my existing website: http://bingobo.com. I'll appreciate your feedback.


Very nice, would it be possible to have a larger selection of categories, and the option to exclude certain categories for a more fine grained front page?

Also, I believe you haven't escaped the time field properly server-side, I see "item.time_str" where I imagine the item time should be dynamically displayed.


Pretty neat!

FYI: I just noticed your time calculations aren't working. Search for "item.time_str" on your indexes.


Thanks. I will fix it shortly.


Very nice. How you determine categories?


Using natural language processing and supervised learning, it automatically tagged news with 9 categories.


Should http://news.ycombinator.com/item?id=7183076 be categorized as Web Technology then?


I wonder if you clustered HN articles to find categories automatically. That may not be useful, but I'd be interested in seeing how HN articles are naturally categorized.


That's really interesting. Will you open-source the NLP algorithm?


I am planning to open source it in several months. (Our codes have not been well-commented and well-structured yet...

Our implementation and algorithm detail is followings.

Its categorizing process is written in Python.

Using nltk, it makes corpus with TFIDF model from HN topics and comments. And it generates classifiers from this corpus with SVM algorithm using scipy and numpy.

FYI, its web interface is written in Clojure and ClojureScript.


presumably you've trained it with hand annotated content, or bootstrapped from a few choice hn searches (like ?q=jquery will give you a web tech category)


Yes. You are right.

I trained classifiers with hand annotations (about 1000 contents or so)


Very useful! Would be good if you could adjust the time-range with respect to the category i.e. for a popular category, 1 day of news is great but something like jokes - currently there's just an empty page - http://newshack.io/by-categories?category=9&term=day - here a month is more interesting - http://newshack.io/by-categories?category=9&term=month


It would be useful to have a category for startups. I find that's the content I most look for here.


Out of all third party UIs for Hacker News I've seen (and have since forgotten the URL for) you probably have the best domain name. It's clever and memorable.


This is interesting. But it's not quite sync with the News on the first page. What do you sort by? If it works well, I'd like to use it.


Thanks.

It crawls HN topics every one hour. So news in our application would delay up to one hour.


If you're hitting HN's access thresholds, you might consider incrementally parsing data from http://hnstream.com

That may help with analyzing comment content at least. Not so much with news rankings though.


Thanks. hnstream is really cool.

You're right. I have limited frequency of crawling due to HN's access thresholds.

hnstream will help me. I will try it.


This is great! I would also love the ability to collapse entire threads, this is something I never understood why it doesnt exist.


I never got the original HN sorting algorithm. Looks quite arbitrary. So, any alternative is likely to be an improvement.


Just a thought - how about having the categories "toggle" rather than "select"?


It's nice idea. I will try it.


I like this alot. BUG: item.time_str isn't evaluating when I select a category.


Updated.

1. fix some bugs (e.g. "item.time_str")

2. support responsive design (currently optimized with iPhone)


looks like a lot of data have not been indexed yet, the concept is extremely useful, and would like to see psychology as one of the categories, they are my favourite type of content on HN.


Looks awful on mobile would be great if you could make it responsive.


Yes. I will support responsive design in the near future.


Awesome! Too bad jokes category does not have a lot of content, pity.


I think something needs to be done about that.

But, humour is almost universally subjective. There are some aspects of Obama-Care that I thought were highly amusing, especially the "500 million lines of code" gag.

Here is a very serious article about the C programming language.

http://uncyclopedia.wikia.com/wiki/C_programming_language


"Lisp - C for parentheses who want to meet other parentheses."


Too bad this isn't a site to find jokes - and yet more and more people want it to be like reddit.


I think only certain types of humour are suitable for this site, geared more to the interests of HN readers.

Having a giggle does not have to be a frivolous activity.


View the jokes over a year. There are 88 results. Not neglible IMO.


wow that's a joke every 4 days! Who knew we were having so much fun.


I find the font uncomfortably small - make the styling more like HN?


Thanks. I will support responsive web design. It maybe solve it.


It is very useful, although I do love simplicity!


Updated.

* fix some stylesheet issues (support Chrome on Android)


Brilliant work thanks!


This is super useful!

I have been forcing myself off HN lately. One thing that I miss the most due to this is the SHOW:HNs of really cool products and hacks that hit the front page. This gives me a great month-to-month access to those great stories at one place.

Bookmarked!


I have been forcing myself off HN lately

You might want to checkout my newsletter which includes the best "show hn" links each week - http://hackernewsletter.com. It also helps with the HN addiction.


I've been subscribed to it for some time and while it didn't cure my HN addiction, it lowered it a bit, and it also helps me find interesting articles I missed during "more work, less procrastination" periods. I just want to say, thank you for your work!






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: