

What SimpleGeo could have been - danielrhodes
http://blog.danrhodes.com/what-simplegeo-could-have-been

======
dsl
_"it's likely that whatever Urban Airship is going to do with the service will
most likely revolve around location-based ads, rather than SimpleGeo's
original vision."_

The original vision for the company was to make mobile games...

SimpleGeo's biggest problem was a lack of any vision. They sucked in a bunch
of data from outside sources (Yahoo WoE, census data, weather APIs,
factual.com, etc) and got a bunch of engineers to just work on whatever
interested them.

The products they did manage to produce were intended for an audience
technical enough that they could source the original data and build it
themselves if they really needed to. When you play that game, you have to
price requests an order of magnitude or two below what they did.

------
nosignal
Nobody's going to make any money _just_ by aggregating & providing access to
data points. Geo isn't as hard as everyone thinks it is. PostGIS absolutely
does intersects with arbitrary polygons, for example.

Where people will make the money is by providing new, interpolated data
layers; it's one thing to say "get the weather at location X!" and another to
say "show me the best intersection to get a cab" or "show me where the closest
taco truck is likely to be given today's weather & past behaviour". I
understand Simplegeo was trying to provide the access to the data to enable
people to answer these questions themselves; why not just sell them the
answers?

------
newhouseb
> I don't know of any freely available database where you can provide an
> arbitrary polygon of geographical coordinates and it will return the points
> within that polygon.

MySQL geo extensions do this, explicitly. See:

[http://dev.mysql.com/doc/refman/4.1/en/relations-on-
geometry...](http://dev.mysql.com/doc/refman/4.1/en/relations-on-geometry-
mbr.html)

I'll give you that you can only really do MBR queries, but it's easy enough to
approximate this to arbitrary polygons (and functions to do so are google-
able).

I've always felt that all the worry about Geo being hard was mostly FUD. You
can easily build a highly location aware application on a grid system without
even using any true geospatial index and indeed one of the largest local sites
does (see my profile)! Shapefiles aren't really all that horrible either since
Tiger (i.e. US Census data) has gotten a lot better in the past couple years.
You can relatively easily hack together a geocoding service in a fews days
provided you have enough caffeine.

~~~
nosignal
I guess, to nitpick, he means "any currently existing, freely accessible POI
databases which support arbitrary polygon selection". Eg. most services which
support "geospatial queries" (Flickr, Yelp, etc) only support MBR or simple
geometry queries rather than true polygon intersects. Which I also
misinterpreted but is also true (to my knowledge anyway). He's not saying it's
impossible to set one up for yourself using PostGIS etc — if anything, that's
his point.

I agree with you that it's simple enough to set one up for yourself that the
value offered by a service is minimal.

~~~
newhouseb
Well he lists MongoDB and PostGIS as DBs that don't allow polygon selection,
but your alternate interpretation is more truthful.

Yelp does do arbitrary polygon intersections on the backend for neighborhood
queries, but I guess this isn't exposed in any API.

------
petedoyle
"Most location databases (such as PostgreSQL and Mongo) let you query based on
all points within a certain radius, but any features requiring more advanced
querying are hard to come by. For example, I don't know of any freely
available database where you can provide an arbitrary polygon of geographical
coordinates and it will return the points within that polygon."

PostGIS (an extension for PostgreSQL) does this really well.

The most notable thing about SimpleGeo (to me) was that they were able to
build PostGIS-like features on top of Cassandra. That's certainly attractive
from a scalability perspective.

I was really excited to try SimpleGeo out, but decided it was just easier--and
less risky since I don't have much NoSQL experience--to use PostGIS. It also
wasn't clear to me whether you could store NON-geographic data in SimpleGeo
(such as user accounts). A (very) brief look at their docs seemed to suggest
you couldn't, and using multiple datastores seemed messy to me...

------
zacwitte
I have to disagree with most of this except the data problem. Building out a
cloud database and API specific for storing and querying geo data is not
enough of a value proposition. It's just not that hard of a problem to roll
your own, which is what most serious applications would do. They're the ones
with money. In fact, in some cases using SimpleGeo is a limitation because it
doesn't have the robustness of full databases. The part I will agree with you
on is the data problem. There's a huge amount of great geo data out there with
no clean repository. I kept telling them to focus on data and sell that, not
the service. Oh well. Someone (factual?) will get it instead.

------
captain-asshat
While certainly not free, SQL Server 2008 introduced some pretty nice spacial
data tools. It can find points within a radius, capture points within a
polygon, intersections/unions, and quite a bit more. It also has full support
for true spacial indexes which makes dealing with large amounts of data very
fast.

I've just completed a project using these tools quite extensively, and found
them a really nice way of handling all my geo data. Being T-SQL, there's not
much to learn to make use of it either.

------
gord
Geo-data is effectively 'read-only', it seems to me..

There are things local people know about, which still diffuse by word of mouth
- and I still see paper notices posted on lamp-posts and at supermarket
boards. This is why I did an experiment and built lokenote.com

I havent nailed it with lokenote, but it hints theres something there... Im
aware this is a first, imperfect approximation of the kind of tool that will
enable people to annotate locations. Sometimes you need to build these
experiments to see what works or doesn't.

I dont think the storing/retrieving of geo-data is that hard a problem, you
can roll your own nested squares approach, or reuse whats there now, eg. Mongo
2d indexes.

Rather, I think the problem is making a nice way to integrate the location
dimension into our tools/web apps more seamlessly. eg. a dating chat app might
just favour partners close to your location without being told what postcodes
to look in. I see this as a 'too many knobs' type problem ( a bit like those
search forms you see that have options for 10 different dimensions to filter
on, which are better replaced by a single text search field, with hidden
smarts. )

So how to bring location to the people, so its useful, effective, un-intrusive
and read-write ? I dont think that question has been answered.

[edit readability]

~~~
buro9
I'm not sure I'd regard it as read only.

Whilst it's true a lot of data appears to be read only (of the kind, what is
where?), even geological features change given enough time (or little time if
we're talking water features in a world of global warming).

Given that what we're discussing is mostly man-made features, such as physical
buildings and the businesses that occupy them... these are open to change.
Buildings get built, demolished, and far more frequently modified.

And then there are problems like "Where can I park?". Most parking in London
is street parking rather than car parks, and parking restrictions change
frequently enough that it isn't static data.

Once you concede time is a dimension that affects spatial information, you
open yourself up to far more interesting possibilities: What is happening
nearby? Where are my friends? There's traffic up the road, what's causing it
and should I detour?

This is massively changing data. It could be expressed as a read-only stream
of 'events' that 'occur' at different places (check-ins, tracking data)... but
the data is refreshed so frequently that storing as a read-only audit trail of
events just disguises the fact that people will perceive the information as
permanently changing.

I think the hardest part of dealing with geospatial data is ensuring that it
is fresh enough to be valuable data. And that is to work from the point of
view that everything changes constantly.

~~~
gord
yes.. I didn't mean to say location information is time-static.. I want there
to be better tools / web UI to supply current location information. So the
person in the street can flag something, and other people can make use of
that.

By 'read-only' I mean its not easy for people to supply location information
thats current. I put into Lokenote app an expiry date tumbler - eg. Street
party for a few hours, blocked drain for a few days, building site for a few
weeks.

Because of overhead at the moment, youd probably only enter location
information that is going to be there for a long time.. but if it were easier
to share that, wed see a more dynamic and relevant picture over smaller time
scales.

Wikipedia, etc, have made an effort to standardise so that wiki articles can
be tagged to a location, but I think theres a whole mass of less formal
information thats really useful thats not being captured or used - ants would
leave pheremones to signal other ants, we currently leave paper notes on
lampposts, there should be a better way.

------
nikcub
Does anybody know which database was powering SimpleGeo, the tech/stats etc.
behind it and if they wrote any of it themselves?

~~~
krisw
I seem to remember that they were using a customized version of Cassandra to
store the geodata

------
krisw
This is disappointing. We have a webapp and (small) GPU datacenter that can
extract tens of billions of polygons per day from remote sensing imagery
(gis.incogna.com). I felt quite happy to see SimpleGeo working on a
decentralized approach to the storage component.

------
jwallaceparker
I agree completely. I hope this article has legs.

