It's a sad commentary on the current demographics here that there is no "entrepreneur" category at all. And that the next closest thing, "business" has only 3 articles, none of which are related to running a business.
I'd noticed myself finding less and less to click on on the front page over the last year, but this has really driven home why.
I'm not sure that the "business" category means what you or I expected it to mean. It's certainly nothing to do with being an entrepreneur - as I write this there are four articles:
DEA redacts tactic that's more secret than parallel construction
Jamaican Bobsledders Ride Dogecoin Into Olympics
Swiss police: Screen in Tesla cars is too large
Ask HN: What will you pay for?
============
I guess maybe that last one could be somehow related but IDK. Seems that important things are missing from the categories (like entrepreneurship or running a business, which are not the same thing). In another comment on this thread, it was mentioned that these are auto-tagged - maybe they can train it a bit better. Those are, to me, off-topic for 2008 HN but right in the wheelhouse for 2014 HN.
You could also call it "all but the single most important issue facing the technology and computing industries today," but your proposal is more concise.
Supposedly there is a filter already on HN for NSA articles, each vote only counts as 1/3 of a vote if the title contains the string "nsa", severely penalizing those articles.
It would be useful if each article could have its tag displayed. In that way when looked at "all" I could start to get an idea of what the tags mean to the system. At the moment I'm reasonably often finding things tagged other than I would expect. Showing tags next to items would help me recalibrate.
1) Tech/HN centric links and tags/categories
2) Besides bookmarking and tagging, it also support list (a collaboratively editable list on topic such as SEO tips)
You are doing something very similar to what I've done for two years. But I'm running a serious business and startup company with patent pending solutions.
Another similar thing between us is that when we showed HN for our projects, we both got 0 comment from the HN community.
So the question is why they are not more welcomed than OP's NLP solution, even if we provide more tools to save and share?
Do you have any idea? If you are interested in further discussion, I can be reached at danmark.clara _at_ yahoo.com.
The direction is right because we all need to improve the web towards Semantic Web which is more meaningful. However, the first round of Semantic Web effort is dead: http://bit.ly/JP3KQO
Now it's the second round. Still the machine intelligence is not comparable with human intelligence. So what I believe is the way we are applying: organize the web with human interaction and automated it with some facilities.
I don't think we suck. It takes time for others to dive into this world. So I'll continue to make this effort and push forward. The problems for yours and mine are also pretty common: UI is not appealing, lacking of data.
I know you are running it for fun. But if there is any chance for us to work together, I think it's the best, because you understand this area and solution so well. Check out my failed Kickstarter project here http://kck.st/JNqv8z and give a try of my existing website: http://bingobo.com. I'll appreciate your feedback.
Very nice, would it be possible to have a larger selection of categories, and the option to exclude certain categories for a more fine grained front page?
Also, I believe you haven't escaped the time field properly server-side, I see "item.time_str" where I imagine the item time should be dynamically displayed.
I wonder if you clustered HN articles to find categories automatically. That may not be useful, but I'd be interested in seeing how HN articles are naturally categorized.
I am planning to open source it in several months.
(Our codes have not been well-commented and well-structured yet...
Our implementation and algorithm detail is followings.
Its categorizing process is written in Python.
Using nltk, it makes corpus with TFIDF model from HN topics and comments.
And it generates classifiers from this corpus with SVM algorithm using scipy and numpy.
FYI, its web interface is written in Clojure and ClojureScript.
presumably you've trained it with hand annotated content, or bootstrapped from a few choice hn searches (like ?q=jquery will give you a web tech category)
Out of all third party UIs for Hacker News I've seen (and have since forgotten the URL for) you probably have the best domain name. It's clever and memorable.
looks like a lot of data have not been indexed yet, the concept is extremely useful, and would like to see psychology as one of the categories, they are my favourite type of content on HN.
But, humour is almost universally subjective. There are some aspects of Obama-Care that I thought were highly amusing, especially the "500 million lines of code" gag.
Here is a very serious article about the C programming language.
I have been forcing myself off HN lately. One thing that I miss the most due to this is the SHOW:HNs of really cool products and hacks that hit the front page. This gives me a great month-to-month access to those great stories at one place.
You might want to checkout my newsletter which includes the best "show hn" links each week - http://hackernewsletter.com. It also helps with the HN addiction.
I've been subscribed to it for some time and while it didn't cure my HN addiction, it lowered it a bit, and it also helps me find interesting articles I missed during "more work, less procrastination" periods. I just want to say, thank you for your work!
I'd noticed myself finding less and less to click on on the front page over the last year, but this has really driven home why.