The overriding problem is that you are trying to presenting conclusions based on 300 responses out of 10,000 people surveyed, but never mention the possibility of measurement error. Even ignoring intentionally incorrect responses, considering that these respondents appear "cranky", how many might inverted the 1 to 4 scale? How many might have been one row off?
To go further and break down this 3% of respondents into age, region, and income is very prone to overfitting. Are these conclusions consistent across the multiple years of the survey? How many "over 74" tree-haters were there each year? Are they more or less likely to miskey an answer?
How do you feel about the current number of trees in your city? Too many? About right? Not enough? You probably didn’t answer “Too many."
Why presume that everyone wants more trees? Is it hard to believe that there might be a point where a neighborhood could have too many, and that different people might have different thresholds for this? Your rhetorical question presumes a strange form of cultural diversity bounded by moral certitude.
I've met many Americans raised in the rural West who feel hemmed in by trees. I've met Australian ranchers who consider them to be weeds that steal ground water that could otherwise grow grass. There are numerous accidents caused by obscured street signs, and cities remove trees all the time. I know allergy sufferers everywhere who have very strong opinions on which trees they like and which they despise. And while I love fruit trees, many see them mostly as food for rats.
Going one step further, even if you think the study is completely accurate, and the demographic numbers large enough to be significant, the post misses what to me seems like the obvious question: do the responses correlate to the number of trees in each area? Do the areas with fewer trees have more people who want more? Do the areas with the most have the most who are satisfied? If not, why not?
Sorry for the vehemence. I'm reading lots of articles lately that misuse statistics, and I don't think this post puts your company in a good light.
For readability's sake we did not present the statistical side of the results, but they were unambiguous for everything we mentioned in the post. Perhaps in the future we should present those results in footnotes to assure inquiring minds that we addressed valid concerns like yours (e.g., overfitting).
Love the examples of valid reasons folks don't like trees. Almost any of them seem like viable hypotheses to explore for why folks fall in the 3% (though obviously we'd need very different data from that which we have handy).
To address the specific comment about trees per neighborhood: unfortunately we didn't have granular enough data about the locations of respondents to analyze that; agreed that there's likely something there.
Thanks again for the comments.
Late followup edit: Thanks to your suggestions, we added some clarifying footnotes to the post.
. Phone survey results were same as written survey results, so the issue isn't measurement error.
. All findings were backed by statistical testing, the p-values for which were almost always below 0.0001.
. Tests were replicated year over year; didn't add an additional note about overfitting but doesn't seem like a big concern with 300 datapoints and just a few variables of particular interest.
It wasn't that hard to stick this into the post as footnotes; in the future we'll do better with that. Thanks for the help.
The statistical testing is tricky, though. Did you go in with the thesis that there might be a correlation between age and ideal number of trees, or did you first pin down "too many trees" and then search the data for correlation? Or even more problematically, did you search for cross-correlations on all columns and then discover the age-tree link? If either of the second, your definition of "significance" needs to be custom.
We take a fairly pragmatic approach to the multiple comparisons problem you're highlighting. If p-values are < 0.00001 we don't really worry unless we're doing a crazy amount of analyses. And when we do get p-values that are borderline (like .01) and we're doing multiple comparisons we'll mention that there may or may not be something actually happening there (as we did in our post about Baltimore parking tickets: http://blog.statwing.com/baltimore-parking-tickets-revisited...).
For what it's worth, I think ceph_'s comment and link below (http://online.wsj.com/article/SB116165781554501615.html) about the relationship between trees and gentrification is the best available guess as to what's ultimately driving these attitudes. But yeah, it's probably a combination of a lot of things, many of which aren't actually about trees per se.
Actually I didn't even see any inferential statistics in the post. He just reported observed values and drew his conclusions. At least running something like a chi-squared test could give me an idea of how unlikely it would be to observe those different proportions among the different groups just by pure chance.