

The Neo4j REST API. My notebook. - timf
http://plindenbaum.blogspot.com/2011/07/neo4j-rest-api-my-notebook.html

======
espeed
In addition to the Neo4j REST server, you can also connect to Neo4j through
Rexster ([http://neo4j-user-list.438527.n3.nabble.com/Neo-Rexster-
bein...](http://neo4j-user-list.438527.n3.nabble.com/Neo-Rexster-being-ready-
to-test-td792156.html)), an open-source REST server optimized for
recommendations and paging (binary bindings are also in the works).

Rexster is part of the TinkerPop stack (<http://www.tinkerpop.com/>).
TinkerPop is a developer group founded by Marko Rodriguez and Peter Neubauer,
and Peter is also a founder and the COO of Neo4j.

Here's the basic Rexster API
(<https://github.com/tinkerpop/rexster/wiki/Basic-REST-API>). In addition,
Rexster has tight integration with Gremlin
(<https://github.com/tinkerpop/gremlin/wiki>), the graph query language, which
also came out of the TinkerPop group.

Gremlin makes traversals easy and allows you to do stuff like calculate
PageRank in 2 lines. Here's a short screencast that will give you a flavor for
what you can do with Gremlin (<http://www.youtube.com/watch?v=5wpTtEBK4-E_>).

There is a Python persistence framework in the works called Bulbs that
connects to Neo4j through Rexster, and there is also a Web toolkit called
Bulbflow (<http://bulbflow.com>) that is based on Bulbs and Flask. Both are
due to be released next week.

If you are interested in the emerging graph landscape, I encourage you to join
the discussions going on in the Gremlin User group
(<https://groups.google.com/forum/#!forum/gremlin-users>) -- it's not just for
Gremlin stuff (anything graph-related goes).

------
snprbob86
A Neo4j REST API seems strange to me. I could see, as it evolves, it being
pretty useful for some use cases. Especially with these maturing graph query
languages.

However, it just seems counter productive in light of the many lost benefits
of the embedded approach. I quite like the software transactional domain
objects wrapped around nodes and relationships.

I think it would make more sense to develop a distinct distributed graph
database, rather than force Neo4j to support both embedded and server use
cases.

~~~
espeed
Even though it's over REST, you can still have "transactional domain objects
wrapped around nodes and relationships." Neo4j is working on providing RESTful
transactions for the Neo4j REST server ([http://www.mail-
archive.com/user@lists.neo4j.org/msg08253.ht...](http://www.mail-
archive.com/user@lists.neo4j.org/msg08253.html)), and you can already do this
with Rexster.

Here's a Bulbs (Python) example of what a domain object looks like wrapped
around a node:

    
    
      from bulbs.model import  Node, Relationship
      from bulbs.datatype import Property, Integer, String
    
      class Person(Node):
    
          element_type = "person"
    
          name = Property(String, nullable=False)
          age  = Property(Integer)
    
      james = Person(name="James", age=34)
      julie = Person(name="Julie", age=28)
    
      Relationship.create(james,"knows",julie)
    

As I said in another thread, Bulbs connects to Neo4j through the Rexster REST
server, and binary bindings are in the works.

But with Bulbs and Rexster you are not limited to just Neo4j -- you can
connect to any Blueprints-enabled graph database
(<https://github.com/tinkerpop/blueprints/wiki/>). In addition to Neo4j, this
also includes TinkerGraph, Dex, OrientDB, OpenRDF Sail, and others.
InfiniteGraph (<http://www.infinitegraph.com/>) has a Blueprints
implementation under development (see
[http://blog.infinitegraph.com/2011/02/04/infinitegraph-
annou...](http://blog.infinitegraph.com/2011/02/04/infinitegraph-announces-
release-1-1-with-new-indexing-options-and-improved-performance/))

~~~
snprbob86
With an embedded graph database, you can treat your data as if it were all in
memory. If things are not in memory, they will be loaded from disk on demand,
and then kept in a cache, just like virtual memory. This enables you to create
persistent representations of real data structures, such as linked lists,
trees, etc. You can traverse over the data, executing code in a full
programming language, with your real domain model objects.

You simply can't work the same way once network latency is involved. This is
doubly true when you get automatic generated queries, such as with an Active
Record pattern.

Let's say that you want to traverse a linked list and apply a complex logical
filter dependent on related data. You have to...

(A) Craft a very precise query -- This can be difficult if the query depends
on business rules which you'd need to implement in both your main language and
in your query language.

(B) Query for too much and throw away what you don't want -- In the SQL world,
this is the 'OMG JOIN EVERYTHING' approach. It can work when datasets are
small and infrequent, but does not scale well at all. It also requires you to
do extra work upfront of declaring all the information you might possibly ever
need in a particular situation.

(C) Query for too little and issue additional queries as you realize you need
more info -- This is sort of the default for rich client libraries. Naive ORMs
routinely produce N+1 nightmare perf bugs in many web applications.

With an embedded graph database, you just do (A) in your standard programming
language, using your real domain models, without duplicating any effort, using
algorithms taught to freshman year college students in their first C++ class.
And everything just kinda works.

~~~
espeed
If you don't require a server environment, having direct access to data that
resides in memory or on your local machine will obviously be faster. But,
whether this is an option depends on your use case -- if Neo4j is backing your
load-balanced Web servers, this isn't really an option, but you could create a
local mirror of the database for running queries (similar to OLTP vs OLAP
systems).

Gremlin is really just Groovy with domain-specific constructs. You can write
multi-line Gremlin/Groovy scripts that apply complex filters and send them to
the external server. You can also use any JVM langauge to create the
equivalent of stored procedures on the server to pull data from multiple
sources.

Another approach would be to query the external server and use TinkerGraph to
keep a local in-memory cache, lazy-loading it as needed.

