Glad to see this post trending on HN. I'm around if you want to ask any questions, and will share whatever I can about my experience with graphs and the decisions at Google.
Since leaving Google, I've built Dgraph . It is open source, designed for horizontal scalability, ACID transactions and provides incredible performance for both reads and writes. And of course, it solves the join-depth problem as explained in the post. So, do check it out!
And do remember to give us a star on GitHub . Your love helps other open source users give Dgraph a real shot in their companies with similar leadership struggles as I experienced at Google.
I suspect your project would have been really powerful had you gotten the support you needed, but without a (6) or a (7) next to your name it's really hard to convince people of that. I know a number of PAs that would benefit now from structuring their problems in a graph store with arbitrary-depth joins and transactions. I work on one of those and cringe at some of the solutions we've made.
We've forgotten what it's like to need novel solutions to performance problems. Instead, we have services we just throw more RAM at. Ganpati is over 128GB (it might even be more) of RAM now, I suspect a solution like dgraph could solve its problems much more efficiently not to mention scalably.
Good on you for taking your ideas to market. I'm excited to see how your solution evolves.
As Google is growing, there are many T7s around. So, you really need T8 and above to actually get something out of the door, when it comes to impacting web search.
P.S. Dgraph is Apache license, written in Go/Protobufs/Grpc. Google might have an appetite for that, worth pushing internally. https://github.com/dgraph-io/dgraph
I think that's rather uncharitable. I work in T.I. and while I can't namedrop a number of internal projects I can assure you there's a lot of deep innovation happening in our corner. Are they poised to get Spanner-esque adoption across the whole company? Unlikely. But it's unfair to knock the infrastructure and assume we're way past some innovation renaissance.
I know that it does not involve storage, but Tensorflow is a purely infrastructure project. And it's certainly newer than the Freebase story.
i.e. if all S/P/O were stored in sorted order such that a given S/P/O were on one machine, you no longer have to do broadcasts and can instead issue queries to just the machines holding that range of the keyspace. Since most "scalable" systems use hash partitioning, they have to broadcast, whereas dgraph uses something more like range partitioning.
Or is this too simplistic of an explanation?
Dgraph does these:
1. It shards by predicate so a full join can be executed by one server.
2. Storing SPO as a record (or row) would still be slow because, in graphs, a single SP can result in thousands or millions of objects. That would involve a lot of iteration and multiple disk seeks to read, which gets slow. So, Dgraph stores them in a posting list format, using SP -> list of O, as a single key-value.
3. That can then result in values which are really large. So, Dgraph converts all nodes into 64-bit integers.
4. Intersections are really important in graphs. So, posting lists store all ints in sorted order. To allow quick intersection between multiple posting lists.
5. Add indices, then replication, HA, transactions, MVCC, correctness, linearizable reads, and ...
voila! 3 years later, you have Dgraph!
P.S. I intend to find some time and write a paper about the unique design of Dgraph. There's a lot of original research involved in building it.
Anyway, good luck with Dgraph! It looks very useful.
Do you know what sort of graph solution Blaze uses to handle 2 orders of magnitude more code, ie. manage 100's of millions of lines of code? I always assumed it was a distributed graph database, but your article seems to indicate something else.
I think if Blaze is using some graph solution, it must be custom built.
Some time before I started working at my current employer, I had a chance to talk to an architect on an EdTech knowledge graph project who advised giving up on these types of efforts due to the complexity of the technology, the politics of the organization, the skills gaps in the engineering corps, and the expectations of customers.
So when I ignored that advice and started this project I did have a chance to look at Dgraph, but ultimately avoided it to help relax some of those political and skills constraints. We ended up building a graph serving system out of Cayley processors, but maybe as the skills gap closes from experience on Cayley and the political forces wane as revenues increase then using Dgraph will become viable -- it would help us solve more problems and simplify certain operational headaches.
Thanks for writing this experience up. It's encouraging to learn that even giants like Google struggle due to internal factors in this space.
It surprises me how many large companies do not have a ‘knowledge graph strategy’ while everyone is on board with machine learning (which is what I currently do, managing a machine learning team). I would argue that a high capacity, low query latency Knowledge Graph should be core infrastructure for most large companies and knowledge graphs and machine learning are complementary.
Once built, we engaged Kyle and got Jepsen testing done on Dgraph. In fact, Dgraph is the first graph database to be Jepsen tested. http://jepsen.io/analyses/dgraph-1-0-2 (all pending issues are now resolved).
Dgraph is now being used as a primary DB in many companies in production (including Fortune 500), which to me is an incredible milestone.
The design is very well suited for just building apps (scalable, flexible schema) as well, given the early choices to use GraphQL (modified) as the query lang and JSON as the response. So, we see even Fortune 500 companies using Dgraph to build mobile apps.
Most open source users use the simplest 2-node cluster, but we easily see enterprise customers use 6-node (High Availability) cluster or 12-node cluster (HA + Sharding). Given synchronous replication, query throughput can scale out linearly as you add more replicas/machines (each replica can reply without worrying about issues with typical eventual consistency models. Dgraph provides linearizable reads).
Write throughput wise, Dgraph can sustain XXX,XXX records/sec in the live path (and millions in the offline path). See my recent commit: https://github.com/dgraph-io/dgraph/commit/b7189935e6ec93aec...
Some recent public usage mentions of Dgraph:
Throw in probability and vagueness (history examples: this happened sometime in the 1950s / we're only 50% certain that Henry VII was the father of this child) and it becomes a whole lot more complicated; yet what can be inferred increases in usefulness.
My inspiration is http://commons.pelagios.org/ and the digital humanities field.
Feel free to use it! :-)
I looked at a number of other graph dbs, and landed on dgraph for a few reasons.
a) I like the query language. It felt natural to me. That's just my opinion though. Other languages felt cumbersome, and I didn't feel like I was operating on a graph.
b) I needed write performance. I'd talked to colleagues about their experiences (I have some friends at graphy companies) and every time I asked "what about X graphdb? could it handle this write load?" the answer was "no, definitely not".
I chose dgraph and it's been working like champ. Support via slack has been solid, they're focusing on the right things (performance, correctness), and I have a lot of confidence in the future of the project.
In fact this year, we plan to align GraphQL+- even more with the official GraphQL spec. Stay tuned!
re: write perf, it just got a whole lot better with 3 changes I made recently, and there're still a lot of room for optimization: https://github.com/dgraph-io/dgraph/commit/b7189935e6ec93aec...
And that is one hell of an improvement. Nice work.
My one nit-pick is they say distributed joins are hard though. But they don't really justify it other than the assumption other engineers have come from a traditional DB background. It is not a hard problem if you try to solve it on your own though: You do the join on the client, incrementally (or debounced if necessary) as the data is streamed. This works very well at scale in production (TB/day) for our system.
>If you are sneering at the memory requirement, note that this was back in 2010. Most Google servers were maxed at 32GB.
As I recall, Nehalem EP (launched 2008 or 2009?) could handle in excess of 100GB per socket? Not cheap necessarily, but definitely still counted as "commodity hardware" for that era. I say this recalling that even my mid-tier workstation from then could handle 48GB (in that wonky era of triple-channel RAM), though I only had it loaded with 12GB. Then again I could see, if said servers in 2010 were at the end of a purchasing cycle from 2007 or so, that they were "maxed" at 32GB?
Anyway, my from-memory nitpick doesn't detract from the article's ultimate point, though: distribution was an obvious need that would only become more pressing.
With the relative stagnation in per-core performance since about 2010 I could easily see something built then expecting to last to 2025, power consumption improvements nonwithstanding.
The only way to avoid joins is when you specialize your data to a particular vertical. Therefore, your flat tables are then built to serve say movie data, and can avoid some joins.
But, if you're building something spanning multiple verticals, like Knowledge Graph is, which houses movie data, celebrity data, music data, events, weather, flights, etc. Then building flat tables specific to each vertical's properties is almost impossible.
Now, with the graph serving system in place, all they need to do is to slap a new UI for the vertical, while the backend remains the same. Of course, there's real effort involved in building a new UI for the vertical, but it's a lot smaller compared to building a whole stack for each vertical.
Not to mention, just the movie vertical itself has many "types" of data. Movies, Actors, Directors, Producers, Cinematographers, etc. -- all of these have different properties. By the time one is done flattening all of these into relational tables, they've built a custom graph solution -- which is what happens repeatedly in companies.
DGraph's write perf seems to be considerably better - I haven't benchmarked formally, just going off of discussions I've had with others.
When Neptune was launched, I wrote another article critiquing its design. Worth a read: https://blog.dgraph.io/post/neptune/
In general, that approach doesn't lead to good performance, as I explained in the blog post.
Just followed your link and saw this comment from one of the original authors in Jan: https://github.com/blazegraph/database/issues/86#issuecommen...
Since Amazon is effectively selling machine time to run Neptune with no cost for the db, wouldn't it be excellent if the original author and team continued to contribute to Blazegraph. I'm not sure if the linked note is a potential nod in that direction or not. (We have customers that also want to have on-prem deployments, which you obviously can't do with Neptune)
Plug: some people are working to fork/restart Blazegraph development, which is the database Amazon took to make Neptune. Its GPU features are missing, as they were originally kept proprietary. If you have any interest and know how, this could be an exciting project to contribute to!
Imagine doing a search-engine in a rdbms without special datastructures.
> Say, you want to know [people in SF who eat sushi]....If an application external to a database was executing this, it would do one query to execute the first step. Then execute multiple queries (one query for each result), to figure out what each person eats, picking only those who eat sushi.
A query like that in SQL could also suffer from "a fan-out problem" and could get slow. It's often faster to put subqueries in the From clause than the Select. It's certainly faster than an app taking the rows of one query and sending new queries for each row, as many developers do. For example:
from visits v
where v.person = p.id
) as last_visit
from people p
where p.born < '1960-01-01'
select p.name, v.last_visit
from people p
select person, max(visit_date) as last_visit
group by person
) v on p.id = v.person
where p.born < '1960-01-01'
Therefore likewise with this San Francisco sushi query, I was thinking that if it were SQL then I would (1) get all people in San Francisco, (2) get all people who like sushi, and then (3) join them, to find their intersection. Lo and behold, I then read that it is the same solution in this humongous graph database:
> The concepts involved in Dgraph design were novel and solved the join-depth problem.... The first call would find all people who live in SF. The second call would send this list of people and intersect with all the people who eat sushi.
For e.g., Stack Overflow normalizes their data to avoid doing joins: https://archive.org/download/stackexchange
We also get to hear benchmark conclusions from users who tried out Neo4j, Janus, Datastax DSE Graph, etc. They typically find Dgraph to be the best performing one.
There's a comparison chart here:
As I've shamelessly advertised elsewhere in these comments, Blazegraph is in the early stages of a fork and/or reboot, as Amazon's acquisition severely hampered progress on the open source project.