A plug: if you are looking for TPC-style application-level benchmarks for database systems, check out the LDBC Social Network Benchmark [1]. It has workloads for both OLTP and OLAP systems. We designed both of these to prevent many of the common benchmarking mistakes. To ensure that implementations follow the specification and their results are reproducible, we have a rigorous auditing process (similarly to TPC's benchmarks) [2].
LDBC is really great, it is and it should be the standard non-biased benchmark to look at. The point with https://memgraph.com/benchgraph is to be more tilted towards some specific workloads, easy to extend and run, etc. Time will tell to which extend that could be achieved, but definitely a place to look for some interesting workloads and results. + the whole thing is super early and it's going to evolve!
Again, definitely take a look at LDBC benchmarks they are in general the most relevant.
While this seems to be a pretty egregious example of a vendor benchmark misleading through cherry-picked unrealistic results, I'm not sure I share the author's pessimism about how these kinds of stunts will hold back the graph database market.
Why? Simple: pretty much any benchmark I've seen of anything, ever, was similar nonsense -- give people numbers to game and they'll do so, enthusiastically. Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.
The only way to get usable benchmark data is to run your own workloads in your own environment: everything else is pretty much noise.
Yes, running a benchmark on your data is the only way. I've taken a look at both benchmarks (the one from OP and the one from Memgraph). They seem like different types of benchmarks and different approaches. But I still find it interesting that although the numbers in OP's are not so much in favor of Memgraph it turns out that Memgrpah is faster than Neo4j in large number of benchmark queries. So yes, it all comes down to type of benchmark and data that you use.
I've also noticed (from OPs tweet https://twitter.com/maxdemarzi/status/1613075177704677376) that he used Enterprise version of Neo4j, but it doesn't say which Memgrpah version was used. I don't have experience with this two databases, but usually ENT versions are somewhat better than community ones.
Memgraph compared the freely available open source editions of both databases. Neo4j Enterprise seems to have more performance optimizations compared to the community Edition.
I'd assume it's what the disclaimer at the top was about: that the code is in Python which he's not familiar enough with. The license bit on not having the right to integrate it if you're building a competing product might not be decisive but a good enough reason to not invest more time into running the code.
I mean it should come as no surprise that an in-memory graph DB outperforms one that stores data on a hard disk, even an NVMe SSD.
I would also add that the primary sell for Memgraph seems to be “fast enough that it can process data as it comes in via a stream, and present it to the user in a reasonable timeframe”. Anyone facing this use-case would want to use Memgraph regardless of how much faster it is than Neo4j.
That's their claim, but who knows. The article shows:
* Memgraph's benchmark only show SQL ~where clauses, not graph ones
* (nor streaming ones)
* The existing memgraph numbers are questionable, and if the competitor tuned, who knows
* The memgraph team refuses to use community-defined graph benchmarks for these articles.. so we won't know
* Memgraph uses weird patterns like doing bulk loads as a query stream of atomic singleton creations vs batching (csv, arrow, ...), so even if it was graph/streaming, a proper benchmark would show tools going way faster b/c the relevant task would instead be for csv/arrow/etc bulk loaders or some other form of micro/macro batching
It's not just this article but the others too. It's frustrating to watch the memgraph leaders take their VC money and dump it into a big negative campaign lying about basically anyone in the community. They even spend money punching down at academics doing OSS. I haven't been this annoyed at a seemingly real tech company in a long time.
DISCLAIMER: I'm a cofounder and the CTO at Memgraph.
The workload and software used to benchmark are public on Github, which means they can be validated and tested. Memgraph as a company is committed to improving Memgraph and benchmarking further. That's why, in addition to other reasons, we raised funding. We have made no false statements and our findings are replicable. Everything, Memgraph source code + benchmark methodology, is public.
Benchmarks are always workload dependent and we always encourage people to test on their workload. The workload in the benchmark closely resembles the ones our customers have most often (mixed highly concurrent read/write with real-time analytics), and we perform well on it. Our default Snapshot Isolation consistency level further enables a vast class of applications to be built on top of our system which would simply break due to the weak consistency guarantees of legacy graph databases. That's precisely the reason why our customers choose us. You should always test on your workload because your mileage may vary and Memgraph might not be the right fit for you.
The main reason Memgraph is performing that much better is that Neo4j Community Edition 5.0 is limited for anybody in terms of how it uses available resources. On the other side, Memgraph Community (equivalent offering, it's not 100% the same, but it's closest to compare, no two systems are the same) does not restrict the performance of our public offering, and that's also something we want to highlight as just one of Memgraph's competitive advantages. So, all this is about comparing offerings rather than the underlying tech. Even if you take Neo4j Enterprise (which Max did, on completely different hardware, which is... "creative"), Memgraph has an advantage.
Fair enough. I didn’t realize they were being so shady with the benchmarks.
Isn’t Neo4j written in Java and Memgraph written in C++ (with lots of Python extensibility)? By that alone I would think Memgraph would be more performant most of the time, unless Memgraph is poorly-written/optimized vs Neo4j, which is very possible.
I work on the “R&D” team for my company so we spend a lot of time researching and building PoC apps. I did one with Memgraph a few months ago after concluding it ought to outperform Neo4j, however I did not build the app with Neo4j to do a side by side comparison of performance. Both support Cypher so I wasn’t attached to one or the other, but I’ve always liked the idea of using in-memory stuff (like RAMDisk) to achieve extreme performance, and I figured at worst Memgraph would be “as fast” as Neo4j… that is 100% an assumption though and assumes that Memgraph is well-written. It sounds like it’s not though.
Totally agree with doing your own benchmark, and when performance matters, work with someone who knows the systems
I'm not a neo4j expert, and am not paid to write this. That said, their GDS subengine from the last couple of years appears to be distributed in-memory, essentially a view, and their year-over-year improvements there have been substantial. There might be no difference at the checkbox level. Likewise, when we did billion-scale work here with a variety of common queries, we found that the existence of basic features like indexes quickly changed what was fast vs slow. Historically, C++ vs Java is often < 2X of a difference, so when we're talking parallel & distributed hardware with tricky query planners & data representations... I have many questions beyond the language. If they were targeting something like FPGAs, I might feel differently.
> Even supposedly gold-standard benchmarks like the TechEmpower framework benchmarks quickly devolve into "application server handling HTTP requests by responding with predefined strings", which is as fast as it's utterly useless in most people's version of the real world.
It sets an upper bound on a server's performance given that page generation completes instantly. Sure it won't reflect real world performance, but in this case the benchmark should be read as "higher requests per second = lower resource footprint for the server".
Engineering is about being able to understand what a benchmark or measure truly means, and what useable information it contains.
"egregious example" is unfair because there was and will be a huge effort in comparing different options. Ofc it's biased. Every single benchmark is biased towards something, but take a look at the specifications and what has actually being compared.
100% agree that everyone should run their own benchmark, that's not possible to do from the vendor perspective, it's just not possible for every single usage. Public benchmarks are something to look into if you want to get cheap info on how different vendors might do for you. Huge but is that it's just a guide, not the only piece of info you should base your decision on.
Author here to clear up a few questions: I did not run any benchmarks for Memgraph, just Neo4j on my machine and compared them to their numbers on their machine. My 8 faster cores to their 12 slower cores, so not apples to apples, but close enough to make the point that Memgraph is not 120x times faster than Neo4j. I used to work at Neo4j, then at AWS for Neptune, I work on my own graph database http://ragedb.com/, and work for another database company https://relational.ai/
Let me (try to) be your hero, Marzi. (Insert favorite reference to famous cheezy pop music song, if you like.)
Couldn't you use GraphBLAS algorithms, like they do in RedisGraph (which supports Cypher, btw) to fix that problem with "death star" queries?
Those algorithms are based on linear algebra and matrix operations on sparse matrices (which are like compressed bitmaps on speed, re: https://github.com/RoaringBitmap/RoaringBitmap ). The insight is that the adjacency list of a property-graph is actually a matrix, and then you can use linear algebra on it. But it may require the DB is built bottom up with matrices in mind from the start (instead of linked lists like Neo4j does). Maybe your double array approach in RageDB could be made to fit..
I think you'll find this presentation on GraphBLAS positively mind-blowing, especially from this moment: https://youtu.be/xnez6tloNSQ?t=1531
Such math-based algorithms seem perfect to optimally answer unbounded (death) star queries like “How are you connected to your neighbors and what are they?”
That way, for such queries one doesn't have to traverse the graph database as a discovery process through what each node "knows about", but could view and operate on the database from a God-like perspective, similar to table operations in relational databases.
Benchmarks are generally useless unless they test real world scenarios. The DataBricks data warehouse record costed $5,190,345 USD to run over a period of 3 years. If I spend that amount of money, I will get fired.
Such benchmarks also ignore the engineering expertise an organisation has. Do you need to be an expert to fine tune 6000 parameters or can you tune the system to an acceptable standard by reading a few blogs.
Some people pointed out the actual query only coated $242. My counter argument is that this appears to be based on buying reserved instances from AWS for 3 years. In real life this query would also run daily, or at least you would need several iterations to get the results you want.
The costs also include a super low budget laptop ($279). It is more than fine for running the query, however, you wouldn't use it a development machine. This shows these results have been heavily massaged.
Not to mention that engineering expertise is just the potential. You then also need the time and the willingness to actually do that kind of tedious and potentially slow moving work instead of all the other things on your list. And as we all know, the list of things that can be improved in any system typically grows over time.
The 'out of the box' or naive and un-optimized performance of something is the baseline. And with something as huge and self-contained as a database you want the happy path to be fine in terms of performance.
I was curious about it, so I tried to figure out where you got this number from. It looks like your source is https://www.tpc.org/results/individual_results/databricks/da..., but you interpreted it wrong. The number you quoted is the projected 3-year ownership of the system configuration that was used to run the test, so the actual cost is a small fraction of the number you quoted.
It is worth noting the compute cost appears to be based on purchasing reserved instances from AWS. The price of on demand instances is much higher.
The laptop is also very low budget. I am sure it is fine to run the final query, however, you would unlikely to be able to use that as a development machine.
That number should be "The total 3-year price of the entire Priced Configuration must be reported, including: hardware, software, and maintenance charges", so they just took the cost of the hardware used for benchmark, and extended it to 3 years.
Yes the final run, establishing the record, costed $242. I would love to know what the total compute costs for this project was. In real world situations, you run this query daily, or at least multiple times to fine tune it. The point still stands that I can't afford to run on this type of hardware, as it is too expensive, nor do I have such heavy workloads, so these results are not relevant.
yes, but it's a best thing of the cloud - your cluster doesn't run when you don't need it, plus you can advantages of spot instances, autoscaling, etc.. And you won't do TPC test each every half an hour.
Thanks for digging and sharing, I enjoyed your snark.
> They decided to provide the data not in a CSV file like a normal human being would, but instead in a giant cypher file performing individual transactions for each node and each relationship created. Not batches of transactions… but rather painful, individual, one at a time transactions one point 8 million times. So instead of the import taking 2 minutes, it takes hours.
Yeahhh I noticed this too when I looked at the repo when their blog was posted a couple weeks back. Running a transaction for each object will of course be very slow and
real production code will (hopefully) not do this.
> Those are not “graphy” queries at all, why are they in a graph database benchmark? Ok, whatever.
I’m definitely interested in seeing more realistic scenarios of actual “graphy” queries with batched transactions comparing the two. Oh, and comparing against Neptune would be cool too since that supposedly uses openCypher now (which I hear is kinda close to neo4j cypher?).
This is true regarding the transactions and cypherl. All data is cypherl transactions because memgraph can handle a large volume of transactions. mgbench was designed to run in-house CI/CD, and mgBench is still tightly coupled with Memgraph. That is the reason we are still running everything in transactions.
We did open an issue where we plan to improve things, adding CSV support for faster imports being one of them. https://github.com/memgraph/memgraph/issues/689 Feel free to suggest things, some things Max suggested we will add.
Agree on the more complex queries, and different vendors.
He does "disclose" in related blog post [1]: "I don’t work for Neo4j anymore, why am I here defending them? Well… that and the fact that I still have a dinghy load of vested shares I have to sell so I can buy a place in the Villages and begin a new life as a golf cart driving day drinker."
This seems like a series of post on benchmarking results from different vendors so if he "disclosed" it once I don't think that there is need for another one.
Author here: I did not write that I was proud of not knowing Python. I just wrote that I don't know Python. The thought of trying to understand 2k lines of it looking to see where Memgraph 'cheated' to make their product look good and the other bad was beyond my current capabilities.
So what you're saying is that he is opinionated about popular languages just like every other developer in the world. You don't think you can trust him to write databases because he doesn't like a language that he would never use to write a database.
Weird. Linus Torvalds also hates one of the most popular languages in the world: C++. He must not know anything about operating systems...
The author is discussing facts and providing replicable results. Disclosing their "conflict of interest" more clearly would be laudable, but even if they lied to us and pretended to be an independent journalist, that might sway our opinion of their character, but it would have no effect on the veracity of their writing.
What they say can be classified in three categories: Objectively right, objectively wrong, or subjective claims. Their conflict of interest only affects our evaluation of subjective claims.
Things that can be assessed as right are right even if they were said by Vladimir Putin; things that can be assessed as wrong are wrong even if they were said by Florence Nightingale. It is an ad hominem appeal to motive to suggest otherwise.
The author disclosed it in the first line of the article, albeit in a joke about death row records. Looks like someone needs to brush up on their west coast rap discography.
I get that this is trying to point out that neo4j shouldn't be that far behind, but why are the i7/gatling test numbers being directly compared to memgraphs g6 test results? The conclusion is a bit premature without the other half of the test... What performance does memgraph have on the newer, single socket hardware?
Yeah that was strange, it's my understanding that you can't compare benchmarks between different machines, especially if they're not 1:1 identical hardware.
If you're referring to this line, then it struct me as very odd.
> Instead of 112 queries per second, I get 531q/s. Instead of a p99 latency of 94.49ms, I get 28ms with a min, mean, p50, p75 and p95 of 14ms to 18ms. Alright, what about query 2? Same story.
> It looks like Neo4j is faster than Memgraph in the Aggregate queries by about 3 times. Memgraph is faster than Neo4j for the queries they selected by about 2-3x except...
Unless that's meant to be a joke? Maybe they were dunking on the "bullshit" benchmark with a worse comparison.
Oh, then I'm not sure what you mean. That line makes sense - the final benchmarks were performed on the author's machine. That's where the conclusion comes from.
In theory, the spread between two benchmarked programs is not going to be hugely different between machines unless one is taking advantage of the hardware of one of the machines where the other doesn't (e.g. new syscall mechanisms such as io_uring, SIMD support, multithreading in some cases, etc).
2-3x is a much more reasonable spread than 100x. If they really did have 100x speedups, then the culprit may be the fact they're using obscenely old hardware, which would be disingenuous given that people are not typically running graph databases on such infrastructure anymore.
> the final benchmarks were performed on the author's machine
No, the author didn't run any benchmarks for Memgraph AFAICT, only for Neo4j. The numbers for mempgraph at the end are from the old benchmark, so on the old hardware.
I think that part is still fine, because there he's only saying that he got different results for the same test on his hardware, which might help to set a baseline. Really weird is the table after "Let’s see the breakdown". It's not super clearly labeled so I'm not fully sure which data is which, but it looks he's comparing neo4j on his machine to memgraph on their older hardware, that would be very silly.
Looking at the source for the benchmark, that also seems to hold.
Author is just stating the differences between the benchmarketing hardware and his own. Not comparing new hardware and one DB with old hardware and other DB.
But those are needless comparisons that make no sense to even mention. Of course the performance profile is different. That was the original point, if I understand correctly.
EDIT: GP followed up, I did not in fact understand correctly.
Having re-read it, I now can't decide. It would be a little silly if the author is completely different devices, so I'm going to stick to that interpretation.
I wouldn’t say the benchmarks put out by graph databases are bullshit. But there is a need for a standardisation of how they’re produced.
The main problem is that when you’re comparing two products you’re bound to be comparing apples to oranges. Every product solves a slightly or majorly different challenge.
So when you run n tests on two different products some tests are bound to perform better on one product and some on the other. Misleading marketing comes into the picture if you only publish the ones that went your way or just partial results.
But that’s why if you believe in your own product and want benchmarks you hire a reputable third party to do them on their own accord.
I use Neo4j to create a CMDB that pulls in data from Active Directory, File Shares, Cloudstrike API, Okta API, Windows Services, Processes, and TCP ports, VCenter, Cisco CDP , ARP tables, Routing Tables, and MAC address tables from routers and switches. Powershell get-foo commands combined with the ConvertTo-JSON makes it very easy to import data from Windows.
A possible query would be match (host:ESXihost)-[:running]->(vm:WindowsVM)-[:running]->(:Process {name:$processName}) return vm,host
I feel graph databases work very well to document the myriad dependencies in a enterprise IT stack and to integrate siloed data.
That's an interesting idea. Having done CMDB stuff in a previous life and also used Neo4J in my last job, I appreciate that one. I don't know whether you'd gain much vs using Postgres with JSON fields, but I bet the ergonomics are better, and if you do need a big relationally recursive query then it'd work well.
Graph databases are very cheap to traverse relationships between things, but slower to do per-item-type operations. So finding your friends of friends of friends is cheap, but finding the mean age of everyone in the database is slow.
They are useful specifically in the intelligence field like NSA(no wonder they have so much graph stuff opensourced). Let me share one obvious use case you have data on a lot of people like call data records, Facebook friends list, Twitter followers/following list and potentially a lot of other data as well. Now you have two targets person A and person B with graph databases it is a trivial one liner to find how these two people are linked. They can be linked directly or they could have 5 people between them doing the same in SQL recursive CTE is a major PIA and takes a lot of time(see degrees of kevin bacon using graph database). There are very niche companies that are making big bucks by just selling libraries/softwares just to plot these graphs and most of their customers are government agencies with a lot of funds.
A more technical use case that I liked was a system that can analyse the configuration of resources across and entire network and find a "path" from a normal user account to a full admin privilege.
Something like: "Helpdesk user A can reset the password of a service account that can write to a file share that contains a script that is run on logon by every user including the full admin, allowing user A to trigger an action in the context of an admin B, making them equivalent to an admin."
You map out "things" on the network like file shares, security groups, accounts, etc... with links between them, and then ask for the shortest path from A to B.
Just as Facebook uses MyRocksDB (a KV store) underneath MySQL. There is a definite turtles all the way down.
But where do you draw the line? Is your Ruby on Rails CRUD app that exchanges JSON documents a document database? Fundamentally, what's the difference between said Rails app and TAO, aside from one being centred around documents and the other graphs?
I'll give you an example of a graph database use case.
The police have a ton of data lying around, and the consensus in the industry is that the 80/20 rule applies to criminals as well ie: 20% of the population takes up 80% of the police resources. You could probably also posit that 20% of that 20% are "peak criminals."
Anyway, they would like to track interactions of "things."
Say a car is involved in an incident. They normally track the make, model, plate, and color of the car - on paper. There's a lot of other info they they track: who owns that car? Who's in the car? Where is the car? Where does the owner live? Where do the occupants live? What other incidents has that car been involved in? Given the addresses of the people involved, who else is known to be around them?
All this relationship information can give someone a better understanding of the relationship between criminal elements in an area. If a car is being used in lots of crimes, it's easier to find out using a DB than some cop going "I recognize that car." If lots of people are being picked up and all live in a 2 block area, it'll be easier to see that if it's in a DB than a cop recognizing that fact from multiple incident reports.
I actually tried doing this in SQL, and it's super slow because you have to iterate over your tables over and over. With the graph database this becomes, well, substantially easier if you model it correctly.
This product, BTW, is known as CopLink by IBM.
As an aside, fusion centers have this problem too but worse, because they're supposed to coordinate information between different police departments in a region...all of whom don't particularly give a shit.
They are a joy to use for ontologies or certain types of metadata.
At my last job we had a bunch of entity categories, and each of those had a huge number of individual entity types. When an entity was picked up through the data pipeline we'd query the graph db and convert that entity into whatever the "base" entity is for the category.
It also allowed us to easily query for strange connections or one off transformations that our customers frequently had without worrying about having a more rigorous and structured RDBMS schema for relatively uncommon queries.
Finally it made using algorithms like PageRank in our data science pipeline an absolute breeze.
I loved it, but we never used it as our primary database (postgres & athena in this case)
my feeling is that graph databases face an uphill battle for mass adoption not because their architects or vendors doing anything wrong but some intrinsic aspects of information exchange in most current situations and use cases
* information tends to be private and/or commercially sensitive, this severs the links that graph dbs are good at representing (and made the "node focused" SQL approach the ubiquitous model that it currently is)
* objects in typical schemas have many more attributes that relations. while you could model things RDF style where everything is a relation, it is not the most intuitive for people
* when the previous constraint does not apply (e.g. data from a centralized social network), it is typically not too hard to emulate an adequate graph structure on a Pareto 20/80 basis using an RDBMS
so graph dbs end up being optimal only for a niche of situations and probably not the impact that the people / investors involved in their development would be happy with
on the other hand the ginie is out of the bottle and the decades-long SQL monoculture seems to be coming to an end. but maybe what results is a relational database+ type thingy [0] rather than two disconnected paradigms
I don’t know, they’re the fastest growing database segment by far according to DBEngines.
I saw very niche use cases 10 years ago but have seen more and more common use cases recently. Knowledge Graphs in particular have a lot of use. I’d expect to continue to see explosive growth in graph over the next 10 years.
As ubiquitous as SQL or document databases? Probably not. But very likely more and more use cases will be uncovered.
> but maybe what results is a relational database+ type thingy
This is called already called "object-relational" model. It was invented by Postgres in the 1980s. The relational model / SQL absorbs the best part of alternative systems and get better over time. SQL:2023 is adding support for graph queries (SQL/PCG).
I find projects like apache/age [0] are very promising in this direction. But I wouldn't call graph db's a passing fade. A more appropriate description might be "too important to be left alone, yet not important enough to form a second type of mass market database engine".
On a tangent, what Graph Database would people recommend in 2023? In particular, I would like something that's linked in like SQLite rather than a full blown service like MySQL etc
If you only need a few graph queries then you could just use SQLite, it’s capable of doing it (I have done it before). But writing graph queries in SQL is painful, so I wouldn’t do it if you need more than a handful.
I also really like Neo4J, but the community version is very crippled. Still though, I've worked around it and do hot backups with ZFS and run multiple database processes to support multiple customers.
For simple cases, you can get pretty far storing relations 6 times in SQLite, or any old key/value store. (a-b-son, b-a-father, son-a-b, father-b-a, a-son-b, b-father-a)
What about https://arcadedb.com ? Open Source, Apache 2, Free for any usage. It supports SQL but also Cypher and Gremlin (and something of MongoDB and Redis query languages)
There are lots of what I would call "grey" marketing/sales type articles like this across virtually every saas business, it's how they get people onto their site.
Unfortuantely, an article that overstates benefits without any caveats is not illegal so it will carry on.
Many of us would like disclaimer e.g. "I work for the company" but also a much more bounded discussion, "this performance test works for this particular scenario" and perhaps "Please note, your scenario might be very different" and especially "Please contact me if you think I have missed something out".
I have worked for a business where you felt compelled to amplify the good and not talk about the bad but the world keeps spinning...
Completely useless tangent: the word "benchmark" comes from a mark that surveyors would make in rock so that they could place a leveling rod for surveying. Benchmarks are made relative to other benchmarks so that surveying can be done relative to the height of one known fundamental benchmark.
It could be argued that it isn't really a benchmark unless you can accurately calculate the result based off of a common fundamental benchmark.
The real problem with these kinds of "benchmarks" is that either the company doesn't have anyone on staff that's calling "bullshit" on it or the marketing people don't care that it's bullshit.
Either one is a bad sign if they're going to be a vendor. At that point how can you trust their SLAs and/or their presales team?
In this thread I've seen comments from Memgraph CTO and founder (mbuda), CEO and founder (dtomicevic), and one of the developers that worked on mbench (mapleeman). To me it seems that they are addressing all of the questions in comments section.
One of my favorite things is "the thing that sounds obvious when I say it, but you didn't think of it before". Here's one related to benchmarking: For A to be 120x better than B in a comparable task, that has to mean that B is leaving that much performance on the table in the first place.
Now, let's combine this with one of the persistent tendencies of developers to take one specific benchmark as indicative of the overall performance, which is often preyed on by benchmarkers trying to sell things.
Is it really plausible that neo4j takes 120x longer than it needs to on all operations? A dedicated graph database that has been tuned and optimized for that task for quite a while now?
I'm not quite going to rate that a 0 probability, but it's definitely a very big claim. While the probability is not 0, it is comfortably below "someone's gaming the numbers" and "the benchmark is not as comparable as claimed". There's a faint chance the latter may match a production use case; for instance, certain comparisons of NoSQL DBs and SQL DBs are "not fair" in that they won't be doing remotely the same things for the queries and the performance landscape is very complicated, with one side winning handily for some tasks and the other side handily winning for others, but if your use case falls into one of those big wins you may not care about the "fairness". But it's still a pretty big chunk of probability mass that it's just plain not comparable; how many times have we seen a ludicrous benchmarking claim of relative superiority just for the losing side to pop up and say something to the effect of "Hey, did you consider adding the correct index to the data, oh look if you do that we win by a factor of 4."
Tell me you're 1.2x or 1.5x faster or something, or that your clever compression means I can remove 1/3rd of my systems or something. Keep it in the range of plausible.
While I'm sure this won't affect the marketing of this company any, ludicrously large claims of 10x+ speed improvements actually turn me off, not attract me. You'd better have some sort of super compelling reason why you somehow managed to be that fast over your competitor, like, "we're the first to successfully leverage GPUs" or something like that. Otherwise I'm going to guess "Actually, you have an O(log n log log n) algorithm over their O(log n log n) algorithm and you cranked the data set up to the ludicrous sizes it takes to get an arbitrarily large X factor improvement over your competition" or something like that.
(Always gotta love people comparing two completely different O(...) algorithms against each other and declaring one is X times faster than the other. This is another major source of "10,000x faster!"... yeah, O(n log n) is "10,000 faster!" than O(n^2), sure. It's also 100,000 times faster, 10 times faster, and a billionkajillion times faster, all at the same time.)
I am really stunned by this story. It made me check the MemGraph benchmarks section. Don't get me wrong, it may be 10-100x faster than Neo4J in even the most basic operations. Moreover, given the quality of Neo4J, it is hard not to be that much quicker. Even Postgres and MySQL are better at storing graphs than Neo4J.
---
Disclosure: I have worked on Graph Algorithms, Graph Databases, and Database Engines for years, and we are now preparing a commercial solution based on UKV [1]. I don't know anyone at MemGraph or Neo4J. Never used the first. As for the second, I am not a fan.
---
Aside from licensing, there are 3 primary complaints. I will address them individually, and I am open to a discussion.
A. Using Python for Benchmarks instead of Gatling. I don't entirely agree with this. Python still has the fastest-growing programming community while already being one of the 2 most popular languages. Gatling, however, never heard of it. Choosing between the two, I would pick Python. But neither works if you want to design a High-Performance benchmark for a fast system. Without automatic memory management and expensive runtimes, you can only implement those in C, C++, Rust, or another systems-programming language. We have faced that too many times that the benchmark itself works worse than the system it is trying to evaluate [2].
B. Using hardware from 2010 [3], weird datasets [4]. This shocked me. When I looked at the charts [5] and the benchmarking section, it seemed highly professional and good-looking. I wouldn't expect less from a startup with $20M VC funding. But the devil is in the details. I would have never expected anyone benchmarking a new DBMS to use now 13-year-old CPUs and an unknown dataset. Assuming current developer salaries, hiring people to design a DBMS doesn't make sense if you will be evaluating on a $1000 machine is just financially irresponsible. We buy expensive servers, they cost like sports cars or even apartments in poorer countries. It is hard to maintain, but they are essential to quality work. It is sad to see companies taking such shortcuts. But to be a devil's advocate, there is no 1 graph benchmark or dataset that everyone agrees on. So I imagine people experimenting with multiple real datasets of different sizes or generating them systemically using one of the Random Generator algorithms. In UKV, we have used Twitter data to construct both document and graph collections. In the past, we have also used `ci-patent`, `bio-mouse-gene`, `human-Jung2015-M87102575`, and hundreds of other public datasets from the Network Repository and SNAP [6]. There are datasets of every shape and size, reaching around 1 Billion edges, in case someone is searching for data. For us the next step is the reconstruction of the Web from the 300 TB CommonCrawl dataset [7]. There is no such Graph benchmark in existence, but it is the biggest public dataset we could find.
C. Running query different number of times for various engines. This can be justified, and it is how current benchmarks are done. You are tracking not just the mean execution time but also variability, so if at some point results converge, you abrupt before hitting the expected iterations number to save time.
---
LDBC [8] seems like a good contestant for a potential industry standard, but it needs to be completed. Its "Business Intelligence workload" and "Interactive workload" categories exclude any real "Graph Analytics". Running an All-Pairs-Shortest-Paths algorithm on a large external memory graph could have been a much more interesting integrated benchmark. Similarly, one can make large-scale community detection or personalized recommendations based on Graphs and evaluate the overall cost/performance. It, however, poses another big challenge. Almost all algorithm implementations for those problems are vertex-centric. They scale poorly with large sparse graphs that demand edge-centric algorithms, so a new implementation has to be written from scratch. We will try to allocate more resources towards that in 2023 and invite anyone curious to join.
A. If you take a deeper look, the benchmarking part is implemented in C++ (client + benchmark session management + measurements), Python is just layer on top to orchestrate everything but it's not on a critical path at all
B. Yes, old hardware is 100% a problem, that's why the benchgraph web is extensible with more hardware options, stay tuned for that, it's going to come soon!
This focuses on untyped, unattributed graphs and includes algorithms which are often formulated following a vertex-centric programming model (BFS, PageRank, community detection, connected components, etc.). The SNB Business Intelligence already covers a few of these algorithms (BFS, weighted shortest paths) and in the future it may incorporate more.
We plan to run a Graphalytics competition in the spring (on data sets with up to tens of billions of edges) - let me know if anyone is interested in participating in this.
[1] https://ldbcouncil.org/docs/presentations/ldbc-snb-2022-11.p...
[2] https://ldbcouncil.org/benchmarks/snb/