

Visualizing San Francisco Home Price Ranges In D3.js - shashashasha
http://trends.truliablog.com/vis/pricerange-sf/

======
ericson578
I like how powerful the d3 framework is, great use for it! I think the popup
when you mouse over the ranges on the graph is distracting, might be better to
have a fixed area that gets updated instead of the floating popup.

------
golike
There's also an associated blog post that describes the methodology and gives
some commentary on our findings: [http://trends.truliablog.com/2012/04/home-
price-range-index-...](http://trends.truliablog.com/2012/04/home-price-range-
index-sf/)

------
fennecfoxen
Well of course 94107 is going to have an odd range of prices. You're lumping
together the crunchy parts of downtown urban SOMA with the rarefied air of
Potrero Hill. You can _see_ how it's a distended blob, and if it were a
Congressional district, you'd worry about gerrymandering. Alas, ZIP codes
aren't always the best way to understand a place, though I'm sure it's ever-
so-convenient...

~~~
wtvanhest
Just curious, can you think of better ways to do it with available data?

~~~
tgrass
One could get the lat/lon coordinates from the address of each data point and
grid the city in any way you want.

One method to define neighborhood extents could be cultural instead of
geographic by crowd-sourcing the delineation. If one can ignore the desire to
draw a strict boundary at a major street, you'll find certain 'neighborhoods'
spilling out of their original blocks. Perhaps adjacent areas with distinct
histories start to define themselves by the same architectural historical
context. Or a once strong border between ethnic communities might disappear.
Gentrification might creep across a river closing the gap.

These could all be fuzzy-mapped by residents.

~~~
fennecfoxen
That would be better than trying to categorize things solely based on the
historical convenience of the US Postal Service.

Myself, I might approach the problem the other way - trying to infer the
existence of a neighborhood-like structure in terms of similar homes which are
close to each other (apply clustering algorithms), and then measuring to what
extent these structures overlap. This technique would hopefully place the 4th-
and-King-at-Caltrain high-rise apartments and condos apart from the 19th-and-
Arkansas single-family and converted-single-family homes.

~~~
wtvanhest
Another idea could be to just price individual homes based on last sale, then
map that, than use an algorithm to guess neighborhoods on price per square
foot?

IDK. But it is interesting none the less.

------
tgrass
Y scale in Log?

I understand wanting to fit it in the screen space. Is Log not generally
reserved for large differences in magnitude and for power laws?

(I don't have an intuitive understanding of logs, so this is a genuine
question).

~~~
golike
We actually started with a linear scale, but ended up switching to a log
scale. Scaling linearly says, "how much more expensive are the high end homes
than the low end (subtraction)?" With the log scale we're saying, "how many
more times expensive is the high end than the low end (multiplication)?"

Scaling linearly allows big values to skew the results. Imagine a neighborhood
where the high end is $20m and the low end $10m, giving us an absolute
difference of $10m. Another neighborhood has a $1m high end and a $500K low
end, for an absolute difference of $500K. Scaled linearly, the $10m range in
the first neighborhood would appear to be much much bigger than the $500K
range of the second.

But if we use a log scale, instead of asking what the absolute difference is,
we're asking relatively how much more expensive is the high than the low end.
Using our two example neighborhoods, both would result in a 2x difference, and
thus both would have the same range of prices.

It's easy to point out where all the most expensive homes are, and scaling
linearly does just that. But looking at the relative differences in prices
provides a much more useful way of comparing different neighborhoods (or even
cities if we look nationwide), because it accounts for the natural variances
in prices in different areas.

~~~
tgrass
Thank you very much for the thorough reply.

I don't suppose you have an image of the linear version?

~~~
golike
Sure do. <http://cl.ly/3L223H1f2v1u3t0I3C2s>

~~~
tgrass
Many thanks.

I prefer the linear story (not for the range-index though): it shows how
consistently extreme the high end is from the median relative to the distance
the low end is from the median (from a dense concentration of homes just above
the median?).

------
lowglow
Can we get the prices adjusted for square footage?

------
lrs
Extremely attractive presentation of interesting data - thanks for this. As
for where to go next, why not automate this methodology and do the entire
world? :)

~~~
golike
We can do it for the entire US. Stay tuned.

~~~
NonEUCitizen
Please do it for Silicon Valley (South Bay)...

------
zackzackzack
I've been working with d3.js often lately for some freelance work. Out of
curiosity, I viewed source trulla's script and have some comments and
observations:

1.) To convert numbers, they use +number, where number is a string. That is
some excellent short hand and relies on automatic type conversion in
javascript. Not sure what the speed is like, but I imagine it doesn't slow
down things much,

2.) They forgot that they had jQuery loaded in already. You can see this by
looking at the lines when they define w and h. They use d3 to get the height
and width in a awkward way. $("vis").width() would have done the same for less
thought.

3.) They are using some tools to check for or create automatic clean
javascript. Not a semi-colon or tab out of line. Probably jslint, because they
use a forEach at some point and I don't believe coffeescript uses forEach's in
it's output. Coffeescript is probably worth the time to learn then; All those
function(d){return d.a} become just (d)-> d.a.

Depending on your style, this could be useful as well: getter =
function(attr){ return function(d){ return d[attr]; } }

ds.attr("x",get("x"))

4.) I am embarrassed I didn't know d3.svg.axis existed. That saves time and
mistakes.

All in all, well done. Learned a good amount from reading through that.
Thanks!

~~~
golike
Thanks for the code compliments Zack. As I've gotten more experienced with D3,
I've been trying to use it exclusively, even for jQuery type things. In this
case I'm just using jQuery for old browser fallback, since D3 doesn't go to
great lengths to support old browser quirks.

And btw, no code cleanup tools in use, just my OCD. ;) I haven't used
Coffeescript yet. Will have to check it out.

------
xn
The y-axis scale distorts perception of the relative differences.

------
solsenNet
Data!

... but, not really sure what _meaning_ i am supposed to take away from this.

"Zip codes contain heterogeneous housing units that have a spread of prices"
??

------
jws
How about something like a 33% grey to 75% grey gradient left to right under
the zip code names which corresponds to the shading of the unselected map
regions? That way I could see spatial trends on the map that would track your
ordering preference. As it is I have to go dousing and try to mentally
integrate over time.

The 33% still leaves room for the selected zip code to leap out.

------
aliston
Very interesting data, and I know that the point of the graphic was to show
spread, but honestly the prices looked a little low to me... not that I could
afford it anyway, but the average home in Pac Heights at "only" 850k? Average
price in the sunset was the same as the mission?

~~~
alanlewis
1) The data is grouped by zip codes, not by neighborhood (notice that "pacific
heights" spans multiple zip codes.) Better would be to have data derived from
neighborhood polygons that delineate clear neighborhood boundaries 2) The
prices are from listing prices (homes for sale), which could obviously vary
from zip code to zip code and 3) You're relying on the accuracy of Trulia's
data, which... well I won't get into that.

------
Travis
I've been wanting to do something fairly similar. Specifically, I would love
to know how you got the zip code overlay data and built your map with zip code
overlays. It looks like a deceptively complicated thing. Any information you
can share on how you did that part?

------
glogla
Pretty!

I wonder whether this framework can make traditional display of such a data
(i.e. box plots with outliers) as pretty.

