StackOverflow and Github Visualized As Cities
Ekisto is an interactive network visualization of three online communities: StackOverflow, Github and Friendfeed. A graph layout algorithm arranges users in 2D space based on their similarity. Cosine similarity is computed based on the users' network (Friendfeed), collaborate, watch, fork and follow relationships (Github), or based on the tags of posts contributed by users (StackOverflow). The height of each user represents the normalized value of the user's Pagerank (Github, Friendfeed) or their reputation points (StackOverflow).

Sharing this because I thought it's really awesome - published Dec 1 by Alex Dragulescu.

So many white men. (just an observation, not commentary)

White male shaming: Now moving on to grander things like selfless community pursuits such as contributing to open source (just an observation, not commentary)

I'm a white male, very active on Stack Overflow, and working to become more active on Github. Still, I find acadien's observation very interesting. We should aim at trying to find the real causes of the situation, instead of being offended, just because someone pointed out something that clearly is a fact.

Suggesting that white males are more selfless is a bullshit answer. Maybe they (we) are the only ones who have the right combination of being educated and having enough free time? Maybe this has something to do with gender stereotypes? Or gender roles and woman usually having much more additional non work related responsibilities? I don't know. But it's worth asking.

> Suggesting that white males are more selfless is a bullshit answer.

I think parent was saying white male people shouldn't be shamed for participating in a "selfless" activity even if they're over-represented in that activity. I don't think parent was implying white-male people are over-represented in that activity because they're more selfless.

It may be worth asking why, but I don't think it's an actual problem, and especially not a problem that needs action to make the distribution of races/genders more fair. White men aren't preventing people from getting into open source software. Just the opposite actually - a few white men have made it extremely easy for people to get into OSS.

Being a 19-year-old black software engineer, from a "current reality" POV, I can say that acadien's observation is definitely spot-on. However, I can also say that it seems things are getting much much better. Just a little under 3 years ago, I'd attend a conference with 300, 500, or even 500+ attendees, and would EASILY count the amount of black folks on a pair of hands (yes) and the ladies on 2-3 pair of hands. This isn't necessarily the case anymore (it's still kind of a problem though). Also, a majority of the "white men" programmers I know in the industry are very welcoming and helpful in this sense, so I wouldn't blame "white men" since that's just too general, and quite frankly, wrong, for the most part. Now that that's out of the way, THERE ARE those "white men" who are not so welcoming and aren't afraid to show it either. I couldn't even tell you the amount of times I've been offended by such folks. There's even one conference I went to where I was literally stopped on my way to grab lunch and randomly requested to go to security to be searched while EVERYONE ELSE was enjoying the sessions and meals. I was searched, I went to bathroom and cried for a while (yes, I admit it), and immediately headed home afterwards. Which conference ... well, I digress. The point I'm trying to make here is that we shouldn't be blaming folks in a general sense, but instead find those folks who do have such behavior and try to help them change.

great post. i think the main cause is economic / structural, but the underlying problem may well be exacerbated by bad behaviour of individuals. code4lib have an anti-harassment problem to deal with the behavioural problems (https://github.com/code4lib/antiharassment-policy) I've never been to their conferences so couldn't say how helpful it has been for them. as for dealing with the structural issues, that's a bigger problem without any clear answers that I'm aware of.

Hi Jonathan. I would love to know which conference this was.

Yeah, it seems we are the suckers who keep giving away all the free code and free help.

LOL. I'm "black", but I had to upvote this comment.

Yeah.... I think I browsed through the SO top users list a year or so ago and at least at the time, I was the only one with a visibly female name in the top 100. The next one I found was around position 500-something.

Numbers don't lie. I wonder where are all the great devs from the largest outsourcing countries (just an rhetorical question, not a personal opinion).

It seems to me that lots of devs are from Europe, so that can be one explanation for them being predominantly white...

In case someone else wants to find github user ids, you can use this webapp: http://caius.github.io/github_id/

For Github you have to enter the username. For Stackoverflow you have to enter the numerical userid which can be found in the url of your profile page.

The Github API rate limiting made it impossible to keep up with the user/repository growth so I stopped in March 2012. Only users that were public at the date of the crawl, and are part of the largest connected component of the network will be shown.

I am the designer of Ekisto. Questions and feedback are welcome :)

Hey! Firstly, good to see some Romanian software news that's not linked to organized crime. I love your country! I was lucky enough to cycle around it for a month a few years ago and really loved the place.

Random suggestion: mapping a user metric such as dominant programming language to inform the building style. So maybe shell code would be more oldschool (Chinese/Japanese/Korean style wooden pagoda type look), perl would be hacky (like lashings of bamboo), enterprisey languages would be shiny skyscraper windows, and obscure stuff would be mud brick or straw bale or something!

The spatial proximity domain seems a little wasted with this mapping .. perhaps it could be improved by using direct links informed by real clones/contributions.

I hope Github gives you a data feed!

For my thesis I had some similar ideas, but they were more abstract, sculpture-like. The user volume was a cylinder, with varying radii (mapped to activity) at different heights (time). The mappings you are talking about are very subjective, but it's good brainstorming. Refining and iterating over such ideas, brings innovation.

You are spot on that Github has very rich relationships of collaboration and sharing that are not all expressed by the data I chose. I wrote a longer post about my goals and motivation with Ekisto that addresses that concern: http://processq.tumblr.com/post/69098066993/ekisto-design

I also write about future work which will improve the current version. I think some tagging functionality either precomputed or user-contributed will help a lot of newcomers understand the map better. Stackoverflow veterans immediately recognize the clusters and the avatars.

Thanks for the overview on the algorithms / approach used on your Tumblr - looking forward to your post on the visualization pipeline

I think it's a really well done visualisation. Perhaps you could improve it by making it easier to select a user and see their data. For example, instead of having to click on the query tool, then clicking on the user, then clicking on the link to their profile, you could just click on their profile picture and get a little box showing their top stats. Minor quibble though, great work!

Yes, great feedback. StackOverflow does something similar, it displays a badge on mouse over a user link. Very helpful for seeing the top tags where the user posts.

Thank you! Silly me only read the instructions for the stackoverflow data and assumed it was the same. This does explain why I couldn't find myself. I had assumed I jsut wasn't important enough to show up.

Well... now that github knows you are making something interesting they might just email you and tell you they will uplift this rate limit..

The is really cool. But what exactly am I looking it? Would be awesome if this was clickable somehow to see the projects/account linked to it.

On the left hand side there is a (Q)uery tool. Click it, and then you can click on any stack to find out who it is.

There is a shortcut: Press Q + click. Pressing Q turns on query mode for one click. I only included it that in the mouseover tooltips, fearing abuse. The server seems to handle it well, so query away.

The info overlays are also draggable, so one can make sense of their surroundings. See how this user mapped "his village": https://twitter.com/mc_sabourin/status/409592532452913152

This is really cool but makes my eyes feel funny. Something with the frame rate, feels as if the screen is shaking. But nevertheless amazing :)

I felt the same way - I think it's partially frame rate, but also scrolling on a 2d plane in a "3d" world.

Also the tiling effect might have something to do with it. There are multiple levels of zoom that get loaded and scaled as you zoom in.

Github's top 20 of 2012 only includes two humans, woops watir is a group, so here it is, only defunkt.

Would really love to see this with up to date data.

Edit: Just saw the author explain why that won't happen.

This is really, really cool. Never seen this before. Nicely done!

I'm under Jon Skeet's toe

