Hacker News new | comments | show | ask | jobs | submit login

Last thing I heard, was, graph-dbs don't scale well.

I there any information about this?

I wanted to build a system with tagged content and thought about using a graph-db. (Soft-)realtime querys etc.

I thought so too, but then I saw for example this [1]. They use Titan (multi-machine setup) to scale out to 64 billion vertices. Very impressive.

[1] http://thinkaurelius.com/2013/05/13/educating-the-planet-wit...

It depends what you mean by 'well'.

Neo4j offers Master-slave replication for efficient scaling of reads. Horizontal scaling of graph databases often involved partitioning, which is a hard problem and an active area of research.

I would say this however:

- If your data and query workload is a natural fit for the graph model then the speedup you get offsets a huge amount of the advantages offered by horizontal write scalability in other DBMS.

- A single Neo4j instance can store and query a very great deal of data indeed (in personal testing I have imported low 100s of millions of nodes, and I am given to understand it can go much further still). For many use cases this is sufficient.

Well, this sounds nice. This amount of nodes is more than sufficient for my needs. The problem would probably be the reads. "give my everything that is tagged X, Y and Z", "give me everything that is tagged A, B, X and G" etc.

But I will look in neo4j, thanks :)

Obviously I don't know the specifics of the data you are going to be modelling but I would suggest thinking of many tags 'properties' as part of the topology of the graph.

For instance (warning contrived example ahead) if you wanted to say "Give me all people that live in Germany" then Germany would be a node (and Lives_In a relationship) rather than a property on each individual person node.

Graph databases are optimised for thinking about data in this way. So you might start your query at the node with the label Country and the name property Germany, then return all connected lives in relationships. This obviously considers far fewer nodes than if you loop through all nodes with the label Person.

Yes, I was thinking about doing it like this.

This will probably the most flexible way.

I hear this about relational databases too. Still to see an example though.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact