
Analysis of Stack Overflow's Survey of 10K developers - glaugh
https://www.statwing.com/demos/dev-survey
======
arasmussen
This is some really interesting data. I could spend hours on this site given
some interesting things to look at. I really wish SO had included salary in
their survey.

Some feedback since I see you are the cofounder of statwing:

1\. I'd really love to be able to share one specific stat/relation with a
friend. I found the relationship between "Career / Job Satisfaction and Number
of Employees at Company" to be very interesting, and wanted to share just that
one with a friend. Either include a share button on each relation, or update
the url to reflect exactly what I'm looking at so that if I send it to someone
it'll take them to the same page.

2\. Your front page looks like it was written for data producers. Write it
instead for consumers. I would be much more likely to come to your site again
and again if it showed something similar to Quora: a list of most popular (by
views, upvotes, recency, or ideally a combination of all three) stats. Stuff
that a lot of people have found interesting and I will probably find
interesting so I can go look for myself.

Edit: I just realized statwing is pretty much only for privately analyzing
your data. That's too bad. I could see it being a really entertaining site for
public data.

~~~
glaugh
Hey, thanks for the feedback.

#1 resonates a lot. If you upload your own data, then we make it really easy
to share a few specific analyses (sort of like how I shared a few analyses via
this link). But we don't currently have that ability to let folks share
analyses of a dataset that they didn't upload themselves. Sounds like we
should, we're wasting a good opportunity for you to tell your friend about us.

#2 is also interesting. Obviously from a marketing perspective it's useful for
us to get people sharing public datasets. Perhaps there's a way to walk the
balance between analysis of private datasets and sharing of public ones.

Thanks again.

~~~
bcbrown
Really fascinating website. I agree with arassmussen's feedback. I'd also add
that while spending the past five minutes playing around with the website, I'm
consistently hitting Describe when I mean Relate. I wonder whether, if you
looked at your logs, you'd find a high occurrence of Describe requests
followed by Relate requests within 1-2 seconds.

~~~
glaugh
Is that because of misclicking, the order of the buttons, or that it just
feels like "Describe" should be the way to get the result you're looking for?
(Or something else?).

Thanks for the feedback.

~~~
bcbrown
I think it's the order. The primary action I want is the one that connects two
columns, but it's second in the order. I click on the first button without
thinking what word is on it.

I'm not saying you should switch the order, though. In some way, it makes
sense that you should first Describe a column before Relating it to something
else. But I think some usability testing would be useful.

I also didn't immediately understand that the different actions would 'stack
up' on the right-hand pane, showing everything you've looked at so far. That's
really useful, and the pop-up tips were good too, although I had to restrain
my instinct to automatically close/turn them off.

It would be nice if, in the Describe summaries, you could sort/order the data.
E.g., I'd like to sort the Job Title by Count, instead of alphabetic order of
Job Title.

After clearing the data, I wanted to relate three different things together.
It wasn't clear how to do that; the Relate button was disabled. I tried
Describe, then figured out that Relate To Each (which is clearly labeled!) was
the right option. I think I would have preferred not splitting the Relate
button into two, Relate and Relate To Each.

It's a neat site, and I hope you continue providing demos.

~~~
glaugh
Awesome. Really appreciate that feedback. Thanks a bunch.

Edit: Also, we do actually allow you to sort the descriptive by Count,
alphabetical, or manually chosen. There should be a "Sort" button to the upper
right of the table.

~~~
bcbrown
The bar chart is sortable, but the table isn't, as far as I can tell.

Also the default sorting for compensation isn't that good. It should be in
numerical order, but it's in some sort of alphabetical order.

------
chrisaycock
From playing around with the data, _Stack Overflow reputation_ was not
strongly correlated with _compensation_. Instead, _experience_ (and naturally
_age_ ) exhibited a stronger relationship with getting paid.

~~~
GFischer
It's pretty obvious, but country is strongly correlated with compensation.

If we want to get paid, we have to move (or get a job) in the U.S. or
Australia (didn't know Australia was that well-paying)

~~~
icelancer
If you adjust for COL, it's not close. The U.S. dominates. AUS has very high
minimum wages which drives up costs. Look at Steam games in AUD vs. USD.

~~~
podperson
You might want to consider some other comparator — I like the cost of a combo
meal at Mcdonalds (The Economist uses the price of a Big Nac). In my
experience, consumer electronics are cheaper in the US but good produce is
cheaper in Australia.

~~~
rdouble
When I was in Geelong bananas were $12/kg, and in Atlanta they were around
$0.50/lb. Other produce seemed quite expensive, also.

~~~
joshschreuder
Was this sometime after the Queensland flooding? Bananas from a major
supermarket are now between $3 and $6 per kg depending on the kind, but it did
spike for a period due to the flooding damaging crops I believe.

~~~
rdouble
Yes, it was around that time.

~~~
tacticus
Yeah that destroyed a very large portion of the banana farms in .au and the
banana benders up in qld have lobbied well enough to restrict imports

------
Confusion
Properly interpreting this data is hard. For instance, the relation between
'Desktop OS' and 'Compensation' suggests Windows 8 users earn significantly
more, on average, than Linux users. However, if you dig a bit further, you
find that Linux use is much higher in countries where 'Compensation' is much
lower across the board. So the relationship between 'Desktop OS' and
'Compensation' is a proxy for the relationship between 'Country' and
'Compensation', which makes it much less surprising.

~~~
glaugh
Agreed. Right now the best way to handle that is to filter your analysis in a
way that accounts for confounding variables. So in the example you gave, you'd
run the Desktop OS vs. Compensation analysis then filter for Country == USA
only.

That's a pretty rough way to deal with that issue. Ideally you'd run a
regression that more subtly takes country into account, without losing data.
Unfortunately that's not possible in Statwing (currently...).

------
henryboston
"There is a very weak but statistically significant relationship between Owns:
iPhone and Using: Node.js"

Hipster Hackers.

~~~
ritchiea
But it's actually the devs that don't own iphones that are more likely to have
used node.

Since I'm here, I'll speak up as someone who has an iphone and uses node.

~~~
cheapsteak
Actually, the

    
    
        'percentage of iPhone owners using node' out of 'total iPhone owners'
    

is greater than

    
    
        'percentage of non-iPhone owners using node' out of  'total non-iPhone owners'
    

by around 1%

But since there are more than 3 times as many non-iPhone owners than iPhone
owners, a Node user is more likely to not own an iPhone.

------
asalazar
Ok here's an analysis on job satisfaction that I found interesting.

1\. very weak relationship to high compensation. Not completely surprising and
in-line with much of the research.

2\. Here's a hard one. No significance of consumer device (tablet, gaming
console, etc) EXCEPT for Apple devices (albiet a weak one). All the fanboys
are probably nodding their heads but I challenge you to explain it rationally.

3\. The importance of work/life balance seems to show in the data with weak
relationships to job satisfaction but its mostly around hours spent working
and commuting. Why is it only a weak relationship?

4\. Suprisingly, life at work seems to be less related to job satisfaction.
With only very weak relationships to things like quality of office space,
bureaucracy, quality of workstation, # of meetings, opportunities to work on
new tech, and opportunities for growth. I wonder if perhaps the data is skewed
by the averages. I wonder what the data would look like if you separated tier
1 developers from everyone else.

~~~
podperson
Windows 8 devs were as happy as Mac devs, which was interesting. No-one else
was close. Also note that Mac devs tend to work in smaller shops, which is
correlated to happiness.

Game devs were the happiest devs. So - work for a small mac game dev.

------
eel
I wish there was a way to filter by full-time vs part-time/intern. Filtering
by country USA, a plurality of developers under 25 are making under $40,000. I
am assuming that a good portion of them are interns.

------
JonnyB
People in the advertising industry who think their job is very important:

53.1%

~~~
hcarvalhoalves
That can't be serious.

------
asalazar
More exprienced developers spend more time refactoring code? Does that mean
that they're working on crappier code, established code bases, or higher end
applications where tech debt really matters?

Conversely, are the younger guys spending less time because they don't know
any better, working on small projects where tech debt isn't a priority, or
working on completely new code.

My guess it's a reflection of the type of work being done.

~~~
ctide
The more less experienced developers you work with, the more time you spend
refactoring the code they write.

------
silverlake
What does "non-negotiable" mean? It's positioned between "don't care" and "not
very important".

~~~
glaugh
Edit: This issue has been fixed.

Original comment: Sorry, that's a presentation issue on our end. Non-
negotiable is the highest value, and if you Relate one of the variables with
that as a response option to any other variable, you'd see everything in the
correct order.

We currently have a small issue where that order isn't being correctly
displayed by default if you do a Describe on one of those variables. We're
working on fixing that. In the meantime you can manually Sort --> Manual and
that chart will display appropriately.

Thanks for the feedback, much appreciated.

------
tel
I think I asked this last time I saw a Statwing analysis, but are you guys
accounting for multiple comparisons anywhere? The actual p-values are quite
robust here, but I can generate some pretty spurious analysis by just running
every comparison at once.

~~~
glaugh
We still don't account for them (except in the context of ANOVA post hocs).

The thinking is that (1) we're similar to other stats tools in that it's
incumbent on the user to account for that, (2) a 'practical' version of
accounting for multiple comparisons is to just be aware of them (as per your
"p-values are quite robust here" comment), and (3) eventually this will be a
really cool opportunity for us to stand out, and we do plan on eventually
accounting for them--we just haven't really been able to prioritize it at this
point.

Thanks for the feedback, we're very happy to have that comment brought up
quite a bit, it's definitely really important, especially given the goal of
democratizing data analysis.

~~~
tel
I think I just always bring it up since your tool moves directly to data
mining operations and I know my presentation of so many results will be highly
likely to have false positives.

------
JHof
I would expect age to be somewhat skewed toward 20-somethings, but not so
strongly. Makes me wonder if this is an accurate picture of developers in the
US.

~~~
sageikosa
Those with more experience (and/or older) spend less time on StackOverflow,
and/or less time answering surveys.

~~~
rkuykendall-com
This probably has more to do with the environment that these developers
learned to code in. Stack Overflow was a huge part of my journey as a
programmer, but their journey probably included more reference books and
colleagues.

~~~
VLM
That would assume that developers who stop learning remain employed as
developers.

~~~
jhgaylor
This assumes that Stack Overflow is the only way to continue learning as a
developer.

------
asalazar
This just in-- there is a strong and significant relationship between age and
years of experience. Who knew?!

Also, job satisfaction and new feature dev!

------
matt_heimer
5k+ using jQuery but only 3k+ using JavaScript?

~~~
as_if
Maybe about 2.000 people don't know jQuery IS JavaScript...

~~~
msoad
I'm sure there are more than two ;)

~~~
fuzzythinker
The decimal is used as a comma in a few places.

~~~
petsos
In quite a lot actually:
[http://en.wikipedia.org/wiki/Decimal_mark#Countries_using_Ar...](http://en.wikipedia.org/wiki/Decimal_mark#Countries_using_Arabic_numerals_with_decimal_comma)

------
alainbryden
This is disappointing for me, C# developers tend to make less. Then again,
young people tend to make less (definitely causation) and C# developers tend
to be young, so maybe there's no causation there. Or maybe there is and that's
why older people get out of C# development!

