
The Problem with Twitter Maps - lambtron
http://www.languagejones.com/blog-1/2014/12/24/the-problem-with-twitter-maps
======
datahipster
> Spatial statistics aren't the same as regular statistics

I've always been frustrated with the gap between statistics and spatial
statistics. For example, some of the methodologies with conducting hot-spot
analysis is somewhat misleading, especially to uninformed geospatial analysts.
For example, Esri [0] implements this first by conducting geospatial
aggregation, then calculating z-scores based on Gaussian assumptions, then
generates a corresponding "p-value" to extract "statistically significant
areas" that are coined "hot spots". At that point, an analyst typically color-
codes those p-values showing regions with low p-values as "extreme" areas of
interest. I'm really curious if there's any empirical or anecdotal research
that validates this methodology.

There are some attempts to try and normalize sampled data. Location Quotient
[1] (and Standardized Location Quotient), for example, compares a local
measure to a global measure. However, this too has Gaussian assumptions and
doesn't properly account for variance in the data.

I would definitely love to see a hierarchical Bayesian spatial model that
takes into account a geospatial prior (such as the overall density of tweets)
allowing you to solve for the posterior of cluster centers. Has anyone seen
this done before?

[0]
[http://resources.arcgis.com/en/help/main/10.1/index.html#//0...](http://resources.arcgis.com/en/help/main/10.1/index.html#//005p00000010000000)

[1]
[http://www.bea.gov/faq/index.cfm?faq_id=478](http://www.bea.gov/faq/index.cfm?faq_id=478)

