

Ask YC: Expressive languages for spatial nodes - robmnl

What do you consider flexible languages for spatial maths? I have a set of nodes in a space, they are connected to each other in a graph, have have different bond strengths. I need to navigate through nodes quickly, for instance grab a node and grab certain nodes that match a certain node strength that is adjacent through one or two dimensions/nodes.<p>Each node has attribute/values pairs, and I want to be able to query nodes by those attribute/values.<p>AWS SimpleDB is too slow.<p>I am an amateur python programmer, this is the last language I've learnt, this is my third week using it.<p>Before that I came from ruby, which I abandoned for it's speed.<p>Pretty good at jquery/javascript, which I spend time in, and, almost don't want to mention this, PHP. Of course SQL amateurish (learned joins only 4 months ago) but I really can't make a case for SQL being suitable for my needs. I have studied "SQL for smarties" by Joe Celko.<p>I have Ruby, Erlang, Smalltalk, or Lisp, or R(?), io(?) in mind. I do just know Ruby in this group, the others I have not much clue about. But I'd learn in a heartbeat if you can make a good case for your language of choice for my problem set.
======
bayareaguy
MonetDB could be useful here.

MonetDB[1] is good for applications where you load the data once and then run
lots of joins over binary relations, assuming your application works well with
an RM/T[2] style schema.

In RM/T you have one entity relation that holds your entity ids and one
property relation for each attribute. All your queries become many-way joins
as they must traverse all the property relations. This is something you would
never do with an ordinary RDBMS but is exactly what MonetDB's binary-
association table scheme is designed to handle[3].

With MonetDB, you use SQL to define your schema in the familiar way. At the
physical level MonetDB stores your data in an RM/T-style representation.

You may also want to see if Proximity[4][5]'s QGraph is suitable for your
application. It uses MonetDB in it's implementation. The tutorial[6] includes
an example of how to use it with Jython.

[1] <http://monetdb.cwi.nl>

[2] <http://en.wikipedia.org/wiki/Relational_Model/Tasmania>

[3]
[http://monetdb.cwi.nl/projects/monetdb/Assets/monetdb_lectur...](http://monetdb.cwi.nl/projects/monetdb/Assets/monetdb_lecture.pdf)

[4] <http://kdl.cs.umass.edu/proximity>

[5] <http://kdl.cs.umass.edu/proximity/about.html>

[6]
[http://kdl.cs.umass.edu/software/documentation/tutorial/ch06...](http://kdl.cs.umass.edu/software/documentation/tutorial/ch06s08.html)

~~~
bayareaguy
For spatial processing, you may find this MonetDB application a lot more
interesting:

Sloan Digital Sky Survey / SkyServer provides public access to SDSS for
astronomers, students, and wide public. A project to make a map of a large
part of the Universe: 230 million object images, 1 million spectra, 4TB
catalog data, 9TB images

<http://www.win.tue.nl/~tcalders/presentations/ivanova.ppt>

------
mooneater
If this is perf. critical and your data easily fits in RAM, then your problem
does not sound exotic enough to resort to obscure or special-purpose
languages. You should be able to whip up efficient data structures for this in
python without much trouble. In fact depending on your data set size and query
specifics, it may be acceptably fast in any of the listed languages (including
Ruby!) assuming you have the right data structures. You really wont know until
you have tried it. I would suggest you start in whatever language you know
best.

(If it doesnt fit in RAM, I would consider it a much harder problem and I can
imagine resorting to an at least partially SQL-based solution).

Btw, for a very similar problem, I used C++ w/STL for pure performance. It
worked well. But I'm migrating it to python to shorten dev time. I figure in
the same amt of time in python, I can try a variety of approaches and algo
optimizations, for the same time it would take to do a single one in C++
w/STL. And for algos on complicated data sets, you want to really understand
your data set. That means playing with it a lot, which is easier in python.

The data structures you use often have a much bigger impact on performance,
than the language.

~~~
mooneater
Tip for doing this kind of thing, that works in any language:

In some cases, using more that one type of data structure at the same time
will allow you to get the best of both worlds in terms of performance. Ie, a
spatial index _plus_ a hash table as an index, over the same data, would allow
efficient lookups on spatial predicates as well as attributes, over the same
set of nodes.

A simple insight maybe, but not one I discovered without a lot of thought.
Typically in university they teach you to use a single data structure type at
a time, but really that's like teaching kids the alphabet, without the concept
of combining the letters into meaningful words.

~~~
robmnl
Brilliant, thanks. Yes, RAM is a good option for me.

I figure 2 GB will last me 4 months. Then I can go up to 4, 8, 16, ...

Pretty good.

~~~
robmnl
mooneater, do you have any suggestions on how to actually store and persist
such a data structure in RAM?

Would I use something like a wsgi webserver, that runs as a daemon, and
occasionally dumps the data structure to disk?

Or is something like memcached better?

My preferred way of storing into RAM would be of course only storing changes,
instead of storing the whole tree into RAM on every write.

~~~
mooneater
Depends how often you are getting new data. If its not too often, you could
write directly to SQL. Otherwise writing in batches like you suggest could
work too.

After looking at your post again, and depending on data volume, you might be
ok with a purely SQL solution too (ie, finding nodes 1 or 2 steps away on a
graph is easy in SQL if your adjacency data is all in the db).

~~~
robmnl
Yes, that's true. But I still have a requirement for a schemaless db, so sql
doesn't cut it.

~~~
bayareaguy
By schema-less, do you mean that it's possible that every new record in the
database could have a few attribute-values that no other record has, or that
you simply don't know what your attributes are at this time but at some point
you will know them all?

------
giardini
Prolog. For info see "P-99: Ninety-Nine Prolog Problems" at
<https://prof.ti.bfh.ch/hew1/informatik3/prolog/p-99/>

Scroll down to the section on Graphs (after problem 73).

~~~
robmnl
Appreciated, definitely clean code to express nodes and relationships.

Is there a modern implementation that's well supported by a community?

~~~
giardini
comp.lang.prolog is the prolog newsgroup. The comp.lang.prolog FAQ contains
information about the various implementations:
<http://www.logic.at/prolog/faq/>

SICSTUS Prolog ( <http://www.sics.se/isl/sicstuswww/site/index.html> ) is a
widely used commercial implementation. There are others.

FOSS alternatives include:

\- Ciao-Prolog, which might be noted for it's various extensions (breadth-
first search, fuzzy logic, WWW programming interface, concurrency,
persistence, and more) <http://www.clip.dia.fi.upm.es/Software/Ciao/>

\- SWI-Prolog <http://www.swi-prolog.org/>

------
bayareaguy
AllegroGraph[1] sounds like it could be suitable as well.

[1] <http://agraph.franz.com/allegrograph>

------
robmnl
Your advice is deeply appreciated, thank you everyone who has posted so far.

