

Show HN: GraPHP, a PHP graph DB web framework - mikeland86
https://github.com/mikeland86/graphp

======
pushtheenvelope
I’m really excited this was finally shared. A little backstory. This framework
is the manifestation of ideas that Mike developed at Mixtent, a startup that
produced three products in 2 years and eventually got acquired by Facebook.

The biggest advantage this framework has over other more traditional ones like
RoR or Django is being able to model product ideas as a graph in code
abstractions. This enables product engineers to rapidly prototype ideas (no
need to interact with the DB), and jump into features built by other engineers
(the node-edge API is standardized).

While the first product Mixtent built used more traditional django-style
models, it resulted in features that became hard to manage over time. Each
model had its own DB table and making changes was painful. The next two were
built using a similar graph framework on top of CodeIgniter, and the benefits
to prototyping speed and ease-of-understanding were visibly felt by all
engineers (including myself).

~~~
mikeland86
:)

------
emehrkay
Wait, so this is a graph-like layer on top of an SQL database? I don't quite
understand this claim:

"DB API is designed for fast performance. No implicit joins or other magic,
but expressive enough for nice readable code."

When you have a database with a node, edge, and node_data (EAV) tables.

What am I missing? How would I get a node and its properties including edges,
other nodes, etc. without joining other tables or flat-out magic?

~~~
mikeland86
What I meant by that is the joins happen explicitly in code (as opposed to
implicitly in queries). I got burned using another framework where my lack of
knowledge of how the ORM managed DB queries would lead to really inefficient
stuff - basically really bad joins that I could have avoided if I knew what
the framework would do.

When I say no magic, what I mean is how you can understand what data is being
loaded every step of the way so the mistake I mentioned previously is harder
to make.

Also, I wanted to avoid query joins for easy sharding (if needed), but that
adds an extra round trip so if needed the developer can write their own joins
(giving up a bit of flexibility in the process).

~~~
emehrkay
Thanks for replying. Have you tried this at scale? What is the performance
like when you have thousands of records? Have you looked into traversals? I
ask these things because I tried to build a product using MySQL that should
had used a proper graph db and I ran into a lot of issues. I think now that
I'm a bit wiser, sql may be able to do similar things, but I haven't tried.

edit: the model layer is the most impressive part about this. You should
consider making it a stand-alone package.

~~~
mikeland86
In general it scaled pretty well if you avoided loading tens of thousands of
edges in a single call. A similar system was used on an app that would try to
find connection strength between people using sent emails as signal. At it's
peak the node table had tens of millions of rows, with some of the nodes
(users) having thousands of edges each. The main pitfalls are

* loading too many edges (10K+) and associated nodes will be slow. * Traversing nodes in a meaningful way can be difficult.

To solve these, the schema has the following indices:

On edge table: (`from_node_id`,`type`,`updated`) On node_data table:
(`type`,`data`(128))

Since edges rarely change, the first index allows you to paginate over edges
by using updated as the order. As long as you request a reasonable number of
edges, things should work OK. The second index is needed to get a node given
some data, but a secondary use is sorting. By precomputing some score and
saving it in node_data you can traverse nodes in that order (this is not
currently built but is simple to do in SQL).

All this being said, the schema is pretty index heavy so if MySQL is forced to
kick some of those out of memory it may lead to a bad time.

Thanks for the kind words on the model, I never thought about it being
separate, but it makes 100% sense.

------
apinstein
I don't understand; what does a graph DB library have to do with an MVC
framework? Why not just write a nice graph schema / model library?

~~~
amirouche
based on the example code: nothing. In theory, you could map urls to the graph
but here it looks like an HTTP framework that use MVC where the Model maps to
a graph database (supported by mysql).

------
vog
Recently I had some related ideas, although from a completely different
background. Here is what I would have done differently:

1) Implement other storages. Although databases are a natural part of web
applications, a direct storage in files and/or directories may be justified.
Also, for most applications the full dataset fits easily in the RAM on modern
systems.

2) Keep full history. Or provide for this possibility at least. Adding history
features to classic database models becomes cumbersome quickly, but for a
simple schema it may be provided directly by the framework. This provides for
a great audit log if something went wrong. Probably to be disabled if really
unwanted and/or storage size is an issue.

Regarding the different background: Although most people want graph databases
if they don't want to enforce a certain schema, my desire is the exact
opposite. I want more constraints, data integrity as much as possible. I want
more than can be easily reached even in PostgreSQL with user-defined
functions, such as constraints across foreign keys. So a separate checker is
needed, and I believe graph structure with plain links provide a good, simple
base to define a constraints framework upon it.

~~~
mikeland86
1) I like the RAM idea and in general more storage adapters. I hadn't really
thought of local storage, but it makes sense for super simple prototyping.

2) I want to add DB profiling at some point (right now Libphutil does this for
me at a query level, but I want to do it at a graph abstraction level).
History could presumably be similar to a permanent profiler.

I think constraints vs flexibility is always a tradeoff. The main benefit of
this model is very rapid prototyping where design decisions can be changed or
reversed with minimum effort.

~~~
vog
_> I think constraints vs flexibility is always a tradeoff._

Well, that depends on the application. In the applications I have in mind, I
have to make tradeoffs in the opposite direction: I know that some more
constraints make totally sense. But: Is it worth implementing them,
considering how hard it is to express them in the database?

So I have to make tradeoffs between missing constraints and ease/feasibility
of implementation. Or checking some constraints only in the application layer,
but sometimes it's even more cumbersome there, than with triggers/etc. in the
database.

------
jakejake
I love seeing new, interesting work done with PHP. I'm particularly interested
in ORMs so I'll definitely give this a try. Thanks for sharing.

------
jwatte
The Readme talks about a bank account. How would transactional update look? If
I want to transfer $20 from user A to user B, the following four things need
to happen transactionally:

\- make sure user A has the necessary funds (reserve)

\- put money into user B account (debit)

\- take money from user A account (credit)

\- commit

At the same time, nobody else must make an operation where they see partial
results, and we can't let two operations reserve the same funds in parallel.

I don't see how this use case is supported, but that may just be because I
don't know where to look?

~~~
mikeland86
Yeah back account may have been a bad example, but still solvable:

You can use MySQL's SELECT FOR UPDATE (not currently implemented in graphp) to
do this transaction safely.

Alternatively, within the framework you can do the reserve amount as a time
stamped node and connect it to the user. Then you query for reserve nodes and
process them in order.

------
maxdemarzi
Would be really nice to see this framework used with an actual graph database
in addition to MySQL.

~~~
mikeland86
I agree. I started with MySQL because it is what I know best, but it should be
fairly simple to create new DB adapters.

------
marknadal
Wow, I am glad that graph stuff is really starting to take off, for all the
JavaScript people out there I wrote
[http://github.com/amark/gun](http://github.com/amark/gun) .

Great work, I really hope you help usher in the era of graphs!

