Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What Is Going on with Neo4j?
102 points by dirkdiggler2 on Dec 9, 2022 | hide | past | favorite | 103 comments
Neo4j Inc just laid off 10% of its employees.

The negative reviews in GlassDoor are piling up, and seem to share common themes - and it is not pretty.

There are accusations that management tried to lie to early investors about revenue sources.

Does anyone have any idea or insight into what is going on with them?




Part of our team uses Neo4j. It's a giant pain in the ass. The amount of time we've spent on the phone with their support trying to unravel bugs that they caused is insane. We don't get that with a paid product like MSSQL. Hell, Postgres isn't that bad and it's FREE!

The cost of Neo4j also went up with their new model. (see https://neo4j.com/blog/open-core-licensing-model-neo4j-enter...)

And they did the thing with the closing their source which nasty.

Then there's the separation of OnGDB which we looked at, but that didn't go well either. One day they deleted all of their packages. All gone. Thank God we had caches, but it took them a while to come back online. In hindsight because Neo4j had sued them. I understand that but that caused a LOT of headaches.

I feel that Graph databases are one of those things like Document databases. You probably don't need it...


> I feel that Graph databases are one of those things like Document databases. You probably don't need it...

I got a really good chuckle out of that.


I am curious for your insights on why Neo4j is still used despite all the pains it incurred? What would be a better approach to solve the same problem with a more conventional DBMS like Postgres? Thanks!


Spicy take: graph databases are the blockchain of the database world. Sounds sexy, scales badly, has very few if any use cases that traditional databases can't handle better.


I don't understand where this comes from. If the problem warrants the use of a graph data model, property graph databases provide an efficient solution for that. Graph databases also excel at discovering distant relationships between loosely coupled data entities and deriving previously unknown facts about the data that would be otherwise too cumbersome to unravel using document or relational database queries. The graph databases also easily allow one who knows the answer to a question to arrive at a set of of one or multiple original questions that have yielded the single answer; it is somewhat niche although is incredibly useful in knowledge graph scenarios.

Just like a document database is not a good fit for a data model with inherent relations between data entities (simple or complex; the reverse is also true), a graph database is not a butt plug for every butt. Every problem requires an appropriately fitting solution for it.


> If the problem warrants the use of a graph data model

I think this is the crux of the problem. I once worked on MegaBank's peer to peer payment app, where somebody had figured that the people sending money to each other was a directed graph, so they should use a graph DB to store it. And when Azure's sales team convinced them that CosmosDB could handle relational data and graphs and documents, they bought it hook, line and sinker.

Needless to say, this was a terrible idea: an RDBMS could have handled it just fine, and because everything else was stored in an RDBMS (which despite the marketing fluff is quite different internally in CosmosDB), now doing any kind of join was a huge pain in the ass. As a cherry on top, they were now locked into CosmosDB, which has completely incomprehensible ("request units per second") but very, very high pricing particularly for graphs. Whee!


Oh. Payments, wanks (and Megawanks) and CosmosDB – a unholy trinity, bless them all. I think I know what the Megawank was up to.

Since Apple (not entirely sure about Google/Android) denies anyone direct access to the NFC hardware in an iDevice, and presents a (mostly) anonymised unique payment token to the wank instead, reconstituting a people connection graph via tracking the fund movement across card accounts poses a challenge. Tracking the fund movement across conventional wank accounts is easier.

But, if the graph data model is devised correctly, it is still possible to incrementally build it out into a rich graph outlining social and material world connections for a given customer either for product placement or nefarious purposes (wanks do sell the transaction history and more to external parties such as Equifax without obtaining the customer's consent). Akin to a Facebook social graph. A gradual graph build-out is, in fact, a great feature of graph databases – a fluid «schema» (for the lack of a better term) that can evolve incrementally in place as new facts about the data become known, without causing a disruption to a production system. If the overall design is sound.

The problem is that traditional wanks are not well poised when it comes to technology related matters due to technology… not being their core competency. Quite the opposite, they see tech as a liability as such projects are driven by financially competent, somewhat business competent but entirely technically incompetent folks. Therefore such projects nearly always fail, technology is blamed in the end, and the CEO/CFO draws approriate (typically, inappropriate) conclusions. So the Megawank in question likely tried to shoehorn a poorly designed graph data model (more likely, an existing relational model) using an A.M.A.Z.I.N.G! multimodel! graph! CosmoDB database whilst being clueless about what they were trying to do. Of course, Microsoft sales people, unwittingly slouching nearby, were singing melliferous songs joy and delight reciting telltale stories of CosmosDB. Profit.

Neobanks, on the other hand, are driven entirely by technologists, and they can pull off such a feat easily or more easily.


I have worked in a Maps company - with the core business model being literally - building maps and adding traffic services. The sort of thing that's provided as the Graph 101 example.

I can assure you, nobody used any Graph database to achieve any of it.


> […] nobody used any […]

With all due respect I don't know what to make out of absolutist, generalised statements such as this one.


GP made a statement about the entirety of the team that they were familiar with. It's the same way that I can tell you with decent certainty that nobody's using MongoDB at my place since `kubectl get pods -A | grep -c mongo` prints 0.

EDIT: Okay, joke's on me there. It turns out the automated frontend tests use a Mongo for some reason. :)


It was more of an observation by looking at most production systems, reading tons of docs, talking to a LOT of people who built the legacy - the stuff that's on your phone and in your car right now - and building a few modern systems over the course of ~2 years.

So a bit more than `kubectl get pods -A` :)


> Graph databases also excel at discovering distant relationships between loosely coupled data entities and deriving previously unknown facts about the data that would be otherwise too cumbersome to unravel using document or relational database queries.

You can do all of this in the relational model, with the new support for recursive CTE's that now enables arbitrary queries to be performed. Even seamless inference of "additional" data points (often given as a unique selling point of "semantic" solutions!) is just a view, plus indexes on the underlying query if you want it to be fast.


> You can do all of this in the relational model […]

That… depends. Right now I am dealing with the customer's 6NF relational model which I had up until now thought was purely theoretical and was not a naturally occurring phenomena (my previous record was a 4NF some years back). ERD's for core data entities span several screens across and are, in fact, a thing of beauty, but… It is difficult to reason about the data that has historically grown over the last 20 years. The incoming data model is a document data model, therefore dependencies between data entities in such a highly normalised model need to be analysed first. A graph database turns out to be a good fit for the data relationship analysis in highly normalised relational data models as well since normalised entities effortlessly map onto graph nodes and relations onto graph edges. The document data model can be thrown into the mix on top of the relational model with new relationships being incrementally added, linking the relational and document data model entities together. Ad-hoc queries are also much simpler in the graph DB as less interesting relations can simply be ignored for a moment. In the end, graph nodes and/or edges can be enriched with extra useful properties giving one nearly complete data model migration mappings.

There is nothing with the relational model, and it is an appropriate fit for many use cases. However, if one has never gone beyond joining 3 tables in a single query for a web app, or if one has never encountered a extremelly normalised data model, it is difficult to see where the relational model falls short. Relational are also rigid and do not accommodate changes easily whereas graph models allow for new relationships to be added incrementally as the data model evolves. Also not to mention that most relational models do not venture past the 2NF model, and the dataset is typically an entangled mess of organic or historical growth.


Modern databases give you tools to evolve and refactor a relational schema over time. Views can be such a tool, Postgres also has transactional DDL changes.


Tooling has nothing to do with the schema evolution, normalised forms (4NF, 5NF, …) do. The trouble is that almost no-one does that. Relying on the tooling alone is either a self-delusion or the lack of experience. Usually both.


It adds complexity to a narrow use case.

If it wouldn't be narrow neo4j wouldn't need to lay off stuff.

Your examples do not refute this


> It adds complexity to a narrow use case.

It also simplifies the unnecessary complexity in many cases, and I have witnessed both. Just like one should not use an expensive Zeiss microscope to hammer nails into a concrete wall as a hammer substitute, one perhaps ought not to stick a graph database everywhere where it does not belong. Engineering (including software) is about selecting the appropriate tooling for each job.

> If it wouldn't be narrow neo4j wouldn't need to lay off stuff.

I fail to see how the two are related. If a company struggles with the execution of their incumbent business model, perhaps it is not necessarily related to the product (may or may not be though)?


It doesn't seem that spicy. Graph databases are a niche thing. While I can't think of a problem I've had to deal with that would require one, I'm sure there are some problems where they make sense. In those cases you could just charge people whatever you feel like. That does come with expectations, and from the comments it seems like Neo4J haven't been able to deliver. Anyway, pricing, for niche product, you don't have a $64.80/month offering, bump that to a $1000 for a developer version and just stick the "Contact Sales" on everything else. If you need a graph database (or any specialized database) then pricing is almost irrelevant.


I have only encountered one problem that was best solved with a graph style query and I did that in sqlite.

So what would likely be nice is a better query language that can compile to sql.


What was the use case? I agree, a lot of stuff that is supposedly magic in graph dbs aren't that hard to do in SQL.


Nodes with specific properties within a certain number of connections from a specific node, only going through nodes with specific other properties, ordered in fewest connections to most.

I was looking for profitable trade routes in EVE Online.


Such as multiple inheritance?


SQL is getting standard syntax extensions to make graph-like queries a bit more intuitive (Property Graph Query). But you're absolutely correct, Postgres with the right schema is a very serviceable graph database already.


But Postgres also scales (out) badly. You need external tools (and it's non obvious to choose which one of the external tools) to achieve even a basic active/passive setup, let alone a more complicated one with read replicas.


The Postgres docs describe how to set up read replicas and HA. They're admirably clear and high quality, more so than many FLOSS projects. You can also set up sharding to scale out further.


I imagine that you have to implement the logic about relationships and queries yourself and spread the I formation across at least two relational tables. I’d hope for graph databases to do that for you. Is that not the case?


In infosec stuff it is useful for stuff like bloodhound for example.


We're in the process of migrating off Neo4j/OngDB to Postgres. Happy with how it's going so far.


Interesting, can you share more details? Is your data graph-like or more like tabular but you chose Neo4j anyway? What kind of functions from Postgres are you using to emulate Neo4j features?


Had the exact opposite experience with N4j.

Easy to operate, scale and run. We started in 2014 and in 2018 did a large scale enterprise rollout with a large customer. The performance test we put it through loaded millions of nodes and millions more edges with non trivial data and scaled to 800 concurrent users (could have been even more but for the fact that the web servers we had for this test scenario started to max out since the system was scaled for 200 concurrent and we were basically stress testing it at this point).

In the early days, there were a few edge cases of query incompatibility between versions that we caught with unit tests, but otherwise very stable, easy to operate, and easy to use. Cypher is one of my favorite query languages.

Very surprised that people had issues with it.


800 concurrent users and millions of nodes sounds quite small. I would expect 800 users to fit on a single Postgres or even SQLite box, if you’re read-heavy.

> Very surprised that people had issues with it.

They may have several orders of magnitude more data & users than you


> I would expect 800 users to fit on a single Postgres or even SQLite box

You can serve thousands of concurrent users on less-than-laptop resources -- we do. If you get a big dedicated server you can serve more concurrent users than you'll likely ever have customers. Modern relational databases are just very good.

StackOverflow's stack (https://stackexchange.com/performance) is 4 huge Postgres servers. They probably cost less than our AWS bill for a couple months, which makes me jealous. Postgres scales.


AFAIK they use MSSQL, not postgres. And they only use one (another is a replicated hot spare) for Stackoverflow.


> StackOverflow's stack (https://stackexchange.com/performance) is 4 huge Postgres servers. They probably cost less than our AWS bill for a couple months, which makes me jealous. Postgres scales.

Yeah, because everything is actually served from their caches because they're extremely read heavy.



I guarantee that whatever traffic is getting past their cache is still 100x higher than 90% of startups.. People really do try to run before they can walk.


Our customer's use case only specified 200 concurrent so the hardware was set up accordingly.

We stressed it to 800 and N4j was fine and had tons of headway while the web servers started to give out (high CPU and mem).

Customer use case would never see 800 concurrent so it was a non issue in our case.


TLDR: your size is not size


You're kind of missing the point. All performance tests are relative to the expected load and infrastructure design to serve that load. No, it's not Twitter because it wasn't a public facing web app. The point being that in an infra designed to serve 200 concurrent in an enterprise scenario, Neo4j could have served multiples of that for the specified infra OR could have been deployed on smaller infra. We found it quite efficient and easy to tune.


Could it because many people are evaluating the community version of the database? The enterprise/Aura (cloud) is definitely designed to be more scalable


There's no such thing as "scale" with community edition.

There's no clustering. There's no monitoring. The only user is admin.

But for the low, low, yearly price of half a million dollars you can get the very basics required to run reliable production workloads.


Could be.

We only used the community version for dev, but even that could scale quite well for a single node on SSDs.

But we also spent a lot of time learning how to map our use case to it.


My app is serving thousands of concurrent users with tables of billions of rows with postgres powered by just 4 CPUs and 20GB RAM.


We just got done doing the exact same thing i.e. Neo4j to MySQL a few weeks ago. The whole process took around 5 months.


Too many of these "Company X laid off y%, what's going on?" threads.

In general, a healthy emerging technology workforce should likely have ~ 20% turnover annually to stay fresh and modern. That means an average outside knowledge age of five years, which is quite long.

Some percentage of that 20% should be voluntary. If everyone stays 5 years before moving on, that's 20% turnover, with people leaving in 2 years balanced by people staying eight years, a long time in software years. Some percentage really should be so-called desired attrition, helping people find a better place.

It's unlikely all hires are great fits -- impressive if only 1 in 10 would be a better fit somewhere else -- so unlikely that 10% is as indicative of problems as you worry. For reasons, most firms are incapable of grappling with that day to day, so it takes adverse externalities to push them to encourage fit and upskilling mobility that should be normal.

If a firm can learn to help people find better fits and bring in current outside skills as a regular everyday part of business (rather than once a year layoffs), the firm will be much healthier.

// Finally, consider Postgres. ;-)


When it comes to graph databases, my favorite is still ArangoDB, definitely worth checking out if you are worried and looking for alternatives.

https://www.arangodb.com


Thank you! (ArangoDB employee here...)


I think probably memgraph.com is what's going on with neo4j


I work closely with Neo4j and Memgraph.

What is happening to Neo4j is not directly related to Memgraph. Neo4j raised a lot of cash and their investors have a lot of expectations now, this puts their sales under a lot of pressure and has pushed them to raise prices.

On the other hand, Memgraph is cheap and aims at being compatible with Neo4j from an API point of view (even though their don't share any tech background : Neo4js is Java, Memgraph is C++).

Memgraph can be a good replacement for Neo4j, but is not yet popular enough to be a menace for Neo4j in the short term.


Interesting. You have any experience with Memgraph?

Did Neo4j remove the enterprise edition source code? Maybe Memgraph was the inspiration for that too hmm


Yes they closed the enterprise source code.

They first tried adding the commons clause, then when that did not accomplish what they wanted, they closed the enterprise source code.

The kicker is that this was not because of behemoth like Amazon or Oracle, it was because of a single individual… (guess who?)

Good read to get a better picture:

https://sfconservancy.org/blog/2022/mar/30/neo4j-v-purethink...


Thanks for the link, interesting to read.

"The issue as to whether the clause can be removed is still pending" -- that was in Mars, is it still pending now in December?

My guess was that indeed it could have been you :-) (which it was) since I couldn't think of anyone else.

Annoying behavior by Neo4j, and also interesting that AGPL didn't work for them


The negative glassdoor reviews seem to be mostly if not almost from sales staff. Not sure what that means but it's a bit weird to see such a skewed ratio!


If a company is pivoting from growth (previous priority) to financial sustainability (current priority), sales is the first area to get the axe.


Their sales had a strong uphill battle to fight...

The company was totally infleible with their very outdated licensing model and it constantly lost them potential customers.


Any relational database can be represented as a graph, which means any fresh hot dev is going to want to port the entire company over to a ‘graph’ database. That’s just common sense right. It’s a slam dunk when pitched to management.


I was leaning towards Neo4j as it had support for graph algorithms such as WCC, louvain etc

What would be a good alternative for it?


Blazegraph / amazon neptune (they’re the same thing)

Or checkout tinkerpop and the databases that support it.


They are not the same thing. The BlazeGraph team was acqui-hired into the AWS Neptune team, but Neptune is not based on BlazeGraph tech.


Oh, I misunderstood that then. I thought it was a plain servicification.


Not really, they don't support graph algorithms.


Neo4j was a terrible experience as a developer. It crashed constantly, local dev required me to finagle around with Java SDK versions and packages. I'm not gonna mess about with that. Their managed offering was equally as shitty, and their own Go SDK was so poor that I ended up scrapping it all together.

We landed on running RedisGraph atop Redis, and got it up and running in 45 minutes. Zero downtime. Zero complaints. Awesome.


I am curious: was your company using the free version or the commercial product?

I have enjoyed using the free version of Neo4J on my laptop but never at scale.


Judging by what they said: "Their managed offering [1] was equally as shitty ..." the gp was referring to the commercial product.

1: https://neo4j.com/cloud/platform/


Funny I just had a recruiter reach out about a 6mo CTH position at Neo4j. Never took it seriously but wonder how that squares with these layoffs.


Graph, alone does not make a selling feature list. You need to bring more to the long term enterprise problem space. i.e. otherwise it is just one more specific database for one or two projects, with annoying sales and contract negotiations. And each time that contract comes up for renewal the project will be looked at and investigated for migration to a more common stack.

At the same time they seemed to have put out quite a bit of marketing to developers, but hard to see their pitching solutions for the "enterprise" problems. Comparing this to the RDF Graph players, whom seem more focussed on playing well with all the other parts of the existing infrastructure. e.g. Virtual graphs on SQL dbs etc. (Personal bias to RDF so take that into account).

In the end we will see if the 500$ million investment in market share will materialize as long term sound investment.


They certainly want to be charging enterprise database pricing when they don't really have enterprise solutions.

The way they build and pitch their product is straight out of the 1990s Oracle playbook.


This post is several years old, but it shows what their prices were back when neo4j enterprise was available commercial or AGPL.

As soon as end users learned they were getting the same software either way, guess what they picked?

(This is my blog post - iGov Inc is just me)

https://blog.igovsol.com/2018/01/10/Neo4j-Commercial-Prices....


I use neo4j in a sideproject, they did a bait and switch with the license model, I was not happy about that.


What does neo4j have that networkX or other free graph python packages don’t? I tried using neo4j api from dataiku and it scaled so poorly compared to networkX which was such a breeze and interacted seamlessly with other python objects in the code.


I touched on the negative Glassdoor reviews in this thread. Do you see a common pattern/theme? https://twitter.com/jmsuhy/status/1600093327130431488?s=61&t...


I once wrote a project (that is still in production) that used a graph database. It was definitely a graph workflow and I thought using a graph database was the right solution. I apologize to whoever has to maintain that system now, and if they have not already replaced it with SQLite or PostgreSQL, I hope they will.


About 7-8 years ago it was almost impossible to avoid Neo4j in tech circles, certainly in London. They must have had the most incredible funding - the PR was nonstop and wall-to-wall.


I have been involved with graph databases and always thought Neo4J was a joke, it was based on an execution model which is not scalable at all. I heard story after story from people who tried it for projects which were just too big and wanted to ask, “what did you think would happen?”


what about the execution model is not scalable? I've got a side project using neo4j, maybe I can avoid a major mistake here?


You can't shard it. I think you can now maybe, but last time I looked you could only scale veritically, not horizontally


I think sharding graph models are actually difficult problem.


Any graph dbs that so scale?


Not really, but that depends on your definition of “scale”. To make one that scales well, you’ll need to solve a few difficult computer science problems that conventional database architectures don’t need to consider and therefore don’t address. These are problems like graph cutting, multi-attribute search without secondary indexes, cache-less I/O schedulers, and a couple others. Scalable solutions to all of these problems exist independently but I’ve never seen any graph database implementations that even attempt to address most of these, never mind all of them, and you kind of need to remove these bottlenecks.

As long as most graph databases are just a layer of graph syntactic sugar sprinkled on top of a conventional database architecture, they won’t scale.


Horizontaly scaling graph DBs is incredibly hard.


What's your thought on AWS Neptune?

From the marketing page below:

"Scale your graphs with unlimited vertices and edges, and more than 100,000 queries per second for the most demanding applications. Storage scaling of up to 128Tib per cluster and read scaling with up to 15 replicas per cluster."

https://aws.amazon.com/neptune/


That isn't real horizontal scaling, they are just doing a single vertically scaled writer and read only replicas, just like they do with RDS. It probably is using the same infrastructure for it.


1,000,000 events per second with https://Quine.io

https://www.thatdot.com/blog/scaling-quine-streaming-graph-t...

Full disclosure: I work on this project.


When I first saw the benchmark result I was pleasantly surprised by the performance, you rarely see that on a single large server but it is achievable if the implementation properly does all the hard bits.

Then I saw that it required 140(!) servers to achieve that result and now I’m wondering what all that hardware is actually doing. On a per-server basis, that is very low throughput, even for graphs. Efficiency that low will make it uneconomical for most graph applications.


That’s not an optimized benchmark, just a demonstration using a real customer workload. Throughput depends on the workload but goes up to about 20,000 events per second per machine. All this is while simultaneously querying the graph and streaming out 20,000+ events per second. All that includes durable storage.

Price it out against Neptune instead and Quine is much less than 1% of the cost.


On a previous project, we reached the limit of 32 billions of unique IDs (Neo 3.2 if I remember well). And had to wait for the next version so we could add more data.

I left the project. But, as far as I know, there are still planes maintained and authorized to fly in the sky, so I suppose the DB is still up in production.


And I am pretty sure the number of IDs in the DB has skyrocketed since that time.


JanusGraph may or may not fit your needs, though there's quite a learning curve even figuring out how to set it up.


Tao


so what graph database you end up with?


It's been a long strange trip.

I don't think a single product is going to satisfy everyone and that's a problem with the category. If by "graph database" you mean you want to do completely random access workloads you are doomed to bad performance.

There was a time when I was regularly handling around 2³² triples with Openlink Virtuoso in cloud environments in an unchanging knowledge base. I was building that knowledge base with specialized tools that involved

   * map/reduce processing
   * lots of data compression
   * specialized in-memory computation steps
   * approximation algorithms that dramatically speed things up (there was a calculation that would have taken a century to do exactly that we could get very close to in 20 minutes)
Another product I've had a huge amount of fun with is Arangodb, particularly I have used it for applications work on my own account. When I flip a Sengled switch in my house and it lights up a hue lamp, arangodb is a part of it. I am working on a smart RSS reader right which puts a real UI in front of something like

https://ontology2.com/essays/ClassifyingHackerNewsArticles/

and using Arangodb for that. I haven't built apps for customers with it but I did do some research projects where we used it to work with big biomedical ontologies like MeSH and it held up pretty well.

I came to the conclusion that it wasn't scalable to throw everything into one big graph, particularly if you were interested in inference and went through many variations of what to do about it. One concept was a "graph database construction set" that would help build multi-paradigm data pipelines like the ones described above. One thing I got pretty sure about was that it didn't make sense to throw everything into one big graph, particularly if you wanted to do inference, so I got interested in systems that work with lots of little graphs.

I got serious and paired up with a "non-technical cofounder" and we tried to pitch something that works like one of those "boxes-and-lines" data analysis tools like Alteryx. Tools like that ordinarily pass relational rows along the lines but that makes the data pipelines a bear to maintain because people have to set up joins such that what seems like a local operation that could be done in one part of the pipeline requires you to scatter boxes and lines all across a big computations.

I built a prototype that used small RDF graphs like little JSON documents and defined a lot of the algebra over those graphs and used stream processing methods to do batch jobs. It wasn't super high performance but coding for it was straightforward and it was reliable and always got the right answers.

I had a falling out with my co-founder but we talked to a lot of people and found that database and processing pipeline people were skeptical about what we were doing in two ways, one was that the industry was giving up even on row-oriented processing and moving towards column-oriented processing and people in the know didn't want to fund anything different. (Learned a lot about that, I sometimes drive people crazy with, "you could reorganize that calculation and speed it up more than 10x" and they are like "no way", ...) Also I found out that database people really don't like the idea of unioning a large number of systems with separate indexes, they kinda tune out and don't listen until you the conversation moves on.

(There is a "disruptive technology" situation in that vendors think their customers demand the utmost performance possible but I think there are people out there who would be more productive with a slower product that is easier and more flexible to code for.)

I reached the end of my rope and got back to working ordinary jobs. I wound up working at a place which was working on something that was similar to what I had worked on but I spent most of my time on a machine learning training system that sat alongside the "stream processing engine". I think I was the only person other than the CEO and CTO who claimed to understand the vision of the company in all-hands meetings. We did a pivot and they put me on the stream processing engine and I found out that they didn't know what algebra it worked on and that it didn't get the right answers all the time.

Back in those days I got on a standards committee involved w/ the semantics of financial messaging and I have been working on that for years. Over time I've gotten close to a complete theory for how to turn messages (say XML Schema, JSON, ...) and other data structures into RDF structures and after I'd given up I met somebody who actually knows how to do interesting things with OWL, I got schooled pretty intensively, and now we are thinking about how to model messages as messages (e.g. "this is an element, that is an attribute, these are in this exact order...") and how to model the content of messages ("this is a price, that is a security") and I'm expecting to open source some of this in the next few months.

These days I am thinking about what a useful OWL-like product would look like with the advantage that after my time in the wilderness I understand the problem.


Fun read above - very descriptive and interesting... thanks for sharing!

OWL? RDF? Were you an RPI graduate perhaps? (I wasn't but did visit them once as part of research project).

At the end of the day, triple stores (or quad stores with providence) never quite worked as well as simple property graphs (at least for the problems I was solving). I was never really looking for inference, more like extracting specific sub-graphs, or "colored" graphs, so property attribution was much simpler. Ended up fitting it into a simple relational and performance was quite good; even better than the "best" NoSQL solutions out there at the time.

And, triple stores just seem to require SO MANY relations! RDF vs Property Graphs feels like XML vs JSON. They both can get the job done, but one just feels "easier" than the other.


Where are you based? Id love to hear more about this set of adventures over tea or coffee sometime


Check my profile and send me an email.


This is interesting. What is OWL?


I think he's referring to Web Ontology Language; IIRC it is a kind of schema for relations in graphs. It was a big part of the Semantic Web surge from 10+ years ago.


RDF is the JSON of graphs. OWL is the Json-Schema of RDF.


Asking “what did you think would happen” is arrogant. They probably did not know as much as you


If you pick a technology to deploy within a company, you need to own the outcomes of that choice.

“I didn’t know” isn’t an acceptable excuse. When faced with unknowns, it’s your job to anticipate, mitigate, uncover problems, etc.

No one can know everything. That’s a fact. But these issues with Neo4j aren’t exactly hard to find. There are loads of folks who have talked about their negative experiences with it. Setting up a proof of concept would confirm them.


Why did you ask? Are you a customer? Or an (ex)-employee?


>R is extremely slow at a lot of tasks, for one thing, even more than Python.

Base R is quite slow. R + data.table is faster than Python + Pandas in a benchmark that I did recently.

For a 1 million row CSV file, Read + Sort + self-Join + Write took on a Windows box:

Base R: 47.56s

Python + Pandas: 6.44s

R + data.table: 2.99s

More details at:

https://www.easydatatransform.com/data_wrangling_etl_tools.h...


I think you've commented on the wrong thread. :)


Oops. That was meant to be a comment on the 'Every modeler is supposed to be a great Python programmer' thread.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: