Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Reddit Insight (redditinsight.com)
51 points by gdi2290 on July 14, 2013 | hide | past | favorite | 54 comments



So two big design issues, if you scroll down from the wheel of terms you go wobbly, clicking the down arrow below that last wheel, also wobbly. If you use on of the tabs to go to a graph you create a 'back loop' where the back button takes you to the start, then back again takes you back to the graph, then back to the start then back to the graph ....

The data is interesting but I expect to see a bit of analysis in projects like this. Without it the project becomes a sort of data Rorschach test where the viewer projects their perceptions into it.


We are aware of a lot of the issues on the home page and plan to completely redesign it soon. However which graph are you talking about? If it's the post tracking we will definitely look into it.


I had clicked on the rightmost tab (labeled Graphs) and shows the Subredit graph and then got stuck in the back page loop. Although I just did it again and it didn't get stuck in a loop so you may have fixed that one already.


Thanks for your feedback. At the moment I'm not able to reproduce your issue, but I'll add it to our github issues


I can't even read this on my iPhone. All of the text is forced into a tiny column so small that it wraps after nearly ever word. A lot of the text is pushed right off the right edge of the page.

Please don't alter the viewport, you're doing it totally wrong.


We updated the landing page for better mobile support. Although we can't say the same for some of our d3 visualizations


> "Dogs" occurs more times in titles in r/aww than "cats" or "kittens"

That can be a fact.

> Despite their internet popularity cats are not submitted nearly as many times on this cuddly SubReddit.

Or dogs are rare enough that they're worth naming, because cats are default. Seriously, run whatever calculations you want, but be careful about what conclusions you draw from the numbers.


Yeah, there are probably more cat pictures just in the "If I fits, I sits" category, than dogs altogether


Good point. We like to stir the pot a bit though.


So you're knowingly misrepresenting the data?


Nobody skews cat data on the internet and gets away with it.


What do you mean by "stir the pot"? The purpose of this site seems to be data-driven analysis of reddit. If the conclusions being drawn aren't rooted in that data, I don't see the point.


By purposefully lying?


Nothing like a bunch of comments tearing apart a website built by people who, presumably, learned to code a few months ago and threw it together over the course of a few weeks. Yeah, it has a few issues, but it's significantly better than the first things I ever built and sent off to the world.

Keep it classy, HN.


Examples of "tearing apart"? At its worst, the feedback is terse. If all one wants are warm and fuzzies, one wouldn't submit here.


Constructive feedback rather than warm fuzzies are suggested.


What's the point in submitting this to HN if not to get feedback?


to get mindful feedback.


It's fundamentally horrible. The whole design approach is horrible. It's the contemporary equivalent of GeoCities, just using more recent but equally mindless design cliches.

In fact, it's so horrible it makes PowerPoint look good.


Your criticism is fundamentally horrible. The whole rhetorical approach is horrible. It's the contemporary equivalent of a flame on USENET, just using more recent but equally mindless rhetorical cliches.

In fact, it's so horrible it makes YouTube comments look good.


Rhetoric goes back centuries so it's unlikely I've come up with anything new. Otherwise, it contains genuine criticism that is rarely found in YouTube comments. The site is a patchwork of trendy design cliches that make the information less accessible than it would be in, for example, PowerPoint. And what we know about the mindless use of design cliches (see GeoCities) is that they date badly.

In sum, I think my complaint has more to it than your parodic reply.

There is in fact a more sensible response above: "We are aware of a lot of the issues on the home page and plan to completely redesign it soon."


Genuine criticism would simply explain the design cliches you didn't like and move on. Calling something horrible is not effective criticism. You're a horrible person! How are you meant to grow out of that? Comparisons to GeoCities and PowerPoint without explaining yourself are not doing much either.

The point of criticism is that you give something actually useful to the person you are criticizing. Unless your goal is simply to put down the other party. It's a cliche to just dismiss something out of hand, the same way I dismissed your arguments without justifying myself. The reason I replied like that was to demonstrate how fill in the blanks things were. You cannot do that with real criticism. For example, this critique of your rhetorical cliches could not be applied to the website in question.


I was being brief. The average HN user is easily smart enough to get the point. And at least it addressed the point at issue, which made it more of a contribution than your content-free response.

By the way, you're also ignoring the original context. Anybody trying to offer helpful criticism surely wouldn't have done that, would they?


Sorry, I genuinely don't understand. How am I ignoring the original context (in my followup reply)?

I'll agree that my response was content-free. It was even aggressively hostile, and I could have been kinder about it. I'm sure that it was not fun to have someone flagrantly mocking your words.


My original was a response to https://news.ycombinator.com/item?id=6039938 and the context was that the site was being criticized.

> I'm sure that it was not fun to have someone flagrantly mocking your words.

Seriously? You must have missed Usenet in the 1980s.


Oh I see. Yeah, I never really considered it. I like to think that we live in a more... enlightened age than when Usenet was around, and that was a genuine apology.


None needed, really. I didn't take offense at your comment, and I didn't think it was unfair. I'm not carrying any grudges ;-)

(Belated response because I've just spent 24 hours travelling, and my brain is still something like a wet dishrag....)


(For parties that wanted to downvote this comment more than once: you can do so by clicking "link", then clicking "flag".)


From your profile: "Cofounder of a dev training program called Hack Reactor"

Hacker Reactor is where this website originated from and you're behaving this way? You've been a HNer for over 2,000 days. You should know better.


Site was borderline unusable with Chrome on android. Text was overlapping, animations to display new text were being triggered after I scrolled past them and in some cases the text just flew in one side and right off the other side of the screen before I even got a glimpse of it.


We updated the landing page to better reflect your suggestions.


> r/Technology leads all SubReddits, with an average of 2,027 Karma per Post

Is this saying that all posts in r/technology get an average of 2027 karma?


My prior on that is extremely low, so my guess is that's like, of the posts that make front-page? Or of the top posts. What this really reflects, therefore, is something more like how up-votey people are.


Our data was largely from posts that were above 0 Karma which I think is reflected in the analysis that we did.


average is probably the wrong characterization of the distribution of karma per post.


This is great and all, but if there is one tool I'd like to see is a personal data exporter.

I want to get all the posts and messages I have ever posted, going back many years, and beyond the 1000 cutoff in their comment history feeds. I'd like to run some keyword analysis on my own data, search it, access it however I see fit. As it stands now, there is no way to retrieve it.


Thanks for your feedback, but what you're asking for Reddit doesn't provide in their API


How much of "the Reddit" did you actually use to calculate these stats? Is this based off a one-time snapshot from the API (which limits to 1000 posts, per type of query) or is it a longitudinal crawl (and how data was produced in that sample)?


One-time snap shot. Given longer time on this project we would love to do a larger analysis.


You should know this site isn't usable/viewable on iPhone Safari or Chrome.


sorry about that it's our initial prototype


For how long did you collect data? During previous crawls, I've found that one of my spider bots can scrape through about 11.5k submissions and 51k comments per day (while observing Reddit's API access rules).


Side note: everyone in our team is available for hire


What is your contact info?


email us at TeamReddit@gdi2290.com


This looks awesome. Next step is to make it "real-time."

I worked for an analytics company and got to build some pretty awesome visualizations using D3 and one of the problems I always ran into was that while the visualizations are cool, you rarely get any actionable information from the charts. I feel like this would be a lot better if at the end, there was some call to action.


Real time analytics is tricky due to API limits. (unless you can accept a "real time" of minutes/hours per update)

Example: Twitter's search API is limited to 15 queries of 100 Tweets every 15 minutes. Do you query 100 Tweets every minute, or 1500 Tweets every 15 minutes?


One other challenge is that some of the presentation (especially portions of the front page) isn't data-driven. It can be tricky to figure out which information to highlight, what the clustering topics should be, etc.


I've got an experimental, realtime comment search engine over at commentfindder.com. Perhaps some useful visualizations could be created off of it. Do you have any ideas for what kind of actionable chart would be interesting?


we should do one for hacker news too.


It's much harder to get data from Hacker News, as HN doesn't have a public API (and HNSearch's API limits make it infeasable)


>(and HNSearch's API limits make it infeasable)

I searched around and couldn't find information on API limits. Are these documented or just found via experiencing them?

The Hacker News karma tracker posted this past January seems to be able to get the information that would be necessary for writing something like this.


HNSearch has a limitation of 1000 maximum results returned (see [1]).

You can take a snapshot of the Front Page / New Posts every so often a la [2], but it's limited per the robots.txt. It's possible, but more time intensive (as in the linked post)

[1]: http://api.thriftdb.com/api.hnsearch.com/items/_search?q=med...

[2]: http://mayank.lahiri.me/writing/hackernews/index.html


Thanks for the information (including the links.)

Based on other comments on this post, it looks like HNSearch and Reddit APIs have the same limitations, so doing something like this should be feasible for Hacker News. Also, another post indicates that the data for Reddit was done with only single 1000-limit queries, so the time-intensive method shouldn't be needed to produce equivalent results (although it could provide more accurate results.)

Finally, for those looking for something like this for Hacker News right now, check out these sites:

http://www.hntrends.com/, http://hntrends.jerodsanto.net/, Hacker News hiring trends (https://github.com/adamw523/hackernewshires)




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: