Hey Jerod. I own the domain HNTrends.com, which I used for a similar project about two years back [1][2]. If you plan on keeping this project up, I'd be happy to transfer the domain name over to you. Shoot me an email: matt@leandesigns.com.
Don't forget thought that these trends include the user's own comments and submissions in addition to mentions of them in other users' comments.
suking, this is truly awesome. It immediately joined my bookmark folder of favorite HN tools. I'd be curious, though, if whether or not it would be possible to filter out user names when counting how many times a word has appeared on the site. Or would that be too much of a challenge?
I was surprised to learn that the search API returns hits on usernames when they appear outside of the post/comment content.
I'm looking now to see if there's an easy way to filter around it. I've also been emailing with one of the ThriftDB guys and I'll ask him as well.
If not, it'd be very difficult to filter those out because I'm not currently (nor would I want to) retrieving all the matching items, just the number of hits.
Yes, this is another HN resource that I forgot to mention. In fact, I just discovered last week and decided to sign up. I really enjoyed it - keep up the good work!
how is it normalized? every single thing I type in has an upward trend, which to me just suggests that interest in Hacker News has increased over the last few years... and I don't really need a chart to tell me that since I live in silicon valley :)
Came to say exactly this. The chart seems to start (by default anyway) at the beginning of 2007. I'm guessing the user-base and thus activity of HN has increased considerably since then. So yes it would be good to be able to re-normalise in some way. By the number of registered users would probably be the simplest. By number of comments posted would be another, by number of submissions or even number of words would all be interesting.
Btw - my suggestions got me thinking - I assumed this is looking at posts and comments. Whether just submission titles or if includes comments will affect best way to normalise.
In that case I would do it by number of submissions (posts) + number of comments for each period (3 months I think, looking at your graphs?). I would normalised all quarters to the last complete quarter.
So if say in q1 2011 there were 1m posts + comments, and in q1 2007 10k, multiply the count for q1 2007 by 100.
It would be truly great to have this normalisation optional, e.g option to normalise by comment+post count and maybe also normalise by number of users (could be an interesting comparison). And perhaps have them super-imposed on each other!
Thanks, I will definitely implement the multiplier as you suggested. I like this better than showing as a % of total.
I'm storing the total hits so I will do the calculations on the client-side, which means I just might be able to add some toggles for the normalization schemes ;)
Thinking a little bit more about this, there are of course lots of submissions that don't get voted up or commented on - spam, off-topic, not interesting etc. Maybe you should only use posts that have either at least one upvote or comment, just to filter out the crap.
A first version of this has been implemented. You can't toggle the normalizations just yet, but I figured I should get the fix out there as soon as I could.
Check the "Learn More" section where I give you props for the help :)
Coincidentally, I was also working on an HN Trends app a couple weeks ago (though I got distracted and haven't finished), and mine is normalized (by number of comments). For example: http://hn-trends.heroku.com/trends?q=scala%2Cclojure
Your app looks a lot better, though! (And my dataset ends around Sept. of last year.)
Divide by the weighted average change in the number of hits for the 100 most common English words? Basically, add up all the uses of "and," "or," "a," "the," etc. in each period. If Q3 2009 has 4X the sum of those words as Q1 2007, then divide all Q3 2009 results by 4.
Good point. I wasn't really thinking about normalization as I was more interested in comparing trends on multiple terms, but I don't think it'd be too much work to change the algorithm to account for HN's growth.
The trends on multiple terms are actually easier to read after normalization anyway. Right now the power-law curve growth of HN in total is making it hard to read the difference in growth between, say, FB and twitter. Easier for our brains to process if you took the ln() of them - which is definitely the fastest way to fix the charts - or normalized in some other way.
There are similar metrics for job openings. It seems plausible that Python is becoming an all-purpose language while the focus for Rails / Ruby is on web apps. If true, this would be reminiscent of Perl v. PHP.
Had the same idea myself as well, bought hnstats.com (although I admit, I looked for hntrend and hntrends.com)...not enough time in the world to do all of these things. Great job with this.
I searched for my own username expecting a flatline, but there were a few hits in the last few years. Flattering, but I'm fairly certain that hackers have not been talking about me. ;)
Perhaps you should only query titles and text, not usernames?
edit: for example, searching for pg would include his own submissions as well as mentions and Ask PG subs.
Pretty cool visualisations. Just out of curiosity, why did you choose HighCharts? Did you look at jquery FLOT and choose HighCharts over FLOT? Or were you unaware of FLOT altogether?
Is it possible to somehow normalize the numbers by the total number of posts/comments, so that trends across the years do not mostly reflect the user base/activity growth at HN?
Can you normalize it against total text volume? Number of mentions always goes up with time as HN grows, % of posts mentioning will be a better measure of mindshare.
Thanks so much! Hmm, good idea on % of posts, though most of the value I believe is in comparing multiple terms, in which case the raw number of mentions isn't as important as the relation of the trend lines.
For example, the first thing I searched for was "Haskell", curious to see if it had gained or lost popularity in the past couple of years. The graph has an upward trend, but without knowing what the trend for total volume on HN is, that information is useless.
[1] http://news.ycombinator.com/item?id=810112
[2] http://www.mattmazur.com/category/hntrends/