Hacker News new | past | comments | ask | show | jobs | submit login
Knowledge Graphs (arxiv.org)
160 points by mindcrime 3 months ago | hide | past | favorite | 28 comments

I’m currently taking a category theory course based on Fong & Spivak’s textbook An Invitation to Applied Category Theory [1] and this reminds me of chapter 3 which introduces functors and natural transformations by way of using them to describe schemas, database instances and migrations.

[1] https://arxiv.org/abs/1803.05316

I absolutely love that book. Category theory has changed the way I think about so many things. Maybe the most prominent way category theory has influenced my thinking is that I always ask "what is the dual situation?" now, often leading to productive insights.

Yes. This (Spivaks ideas on using categories to define and reason about database stuff) was also my first association.

Haven’t read the paper yet, but a great list of authors! I have worked on Knowledge Graphs at two companies, and I am fairly opinionated about them. I think that large fast KGs (i.e., no OWL reasoning) are a good tool for organizing different data sources, but I have lost my interest in KGs for Knowledge Representation (in the AI sense). Just a personal opinion, lots of people in the field would disagree with me.

Mark would you say more about use of KGs for organizing data sources, or give an example situation? Thanks in advance.

Not Mark, but I'm one of the original leads on Google's Knowledge Graph.

KGs are great for data integration because they are naturally composable. If you have two data sources represented as graphs, and their node identifiers are in the same range (ie, they have been reconciled), and their edge labels are mapped to the same schema, then you can combine them just by taking the union of all their triples.

This works across different vertical domains -- for instance, you can create a joint film + music KG just by unioning all the triples from a film KG and a music KG, as long as you've mapped the node identifiers to a common namespace (eg Wikipedia). You can then do cross-vertical queries like soundtracks for Will Smith movies, etc.

Thanks for having made the SemWeb/LinkedData a reality for people.

>I have lost my interest in KGs for Knowledge Representation (in the AI sense)

Hi! Can you say more about that please?

What about embedding KGs? That could be useful?

still many bridges too far to make it clear what connection these data models have to with what the rest of programmers do. i continue to believe theres a huge amount of overlap, that these are not really different topics, but that there's some large gulf we have yet to see ourselves across. nice to see a survey, a broad examination, but still no real ground gained that i can see in doing the core thing: explaining how these semantic systems are just a re-expression of already familiar programming language systems (or otherwise unified).

There's a part of me that wonders if this was roughly how people originally responded to EF Codd's 1970 paper.

You read it, and it all seems very abstract, and it's hard to understand why any of it would matter much to a programmer.

But look at things the other way around: assume you're a late 20th or early 21st century developer who grew up with the relational model, and now you have to work with some 1960s-era data modeling technology such as COBOL. You'd probably find that certain things that were easy in a relational database are suddenly really, really hard with this other tech stack. And probably, when you try to explain this to colleagues who'd never worked in anything but COBOL, they would respond with something like, "It's not hard, all you have to do is [really fiddly, error-prone, and difficult-to-maintain design pattern]." It can be difficult to distinguish between difficulty that's inherent to the problem, and difficulty that's only due to limitations of the tools that you know.

This is not by way of trying to imply that knowledge graphs are like the relational model in that they're definitely better than previous tech. But, assuming for the sake of argument that they do do some things better, perhaps they are like the relational model in that you need to spend a certain amount of hands-on time with them before their value becomes readily apparent.

> perhaps they are like the relational model in that you need to spend a certain amount of hands-on time with them before their value becomes readily apparent.

... except they cannot possibly be like the relational model in that way, because the relational model isn't like that. The value of relational databases becomes readily apparent when you spend five minutes fiddling with a appropropiate small-but-nontrivial relational database; you do not need to spend a [nontrivial] amount of hands-on time with them before their value becomes readily apparent.

As someone who teaches python+sqlite in an intro to cs course, I can't disagree with this statement more.

SQL is amazing but the mental model beyond inner joins is not at all obvious. Things like outer joins (and when you need them), aggregation, partitioning etc... I am sure I've missed a few things.

But SQL is absolutely amazing.

You have to understand the data and the relations, what's not in the data that you'd expect to be etc...

>It can be difficult to distinguish between difficulty that's inherent to the problem, and difficulty that's only due to limitations of the tools that you know.

This distinction does not exist. Any difficulty apparently inherent to the problem can be addressed by new tools. We do not do this all the time because some issues are one off, and designing a tool(/any automation) for it would be time consuming.

Taking these concepts and trying to tie them back into what programmers do, so that the experience of building a knowledge graph database is not alien is essential if this is going to become mainstream tech.

We[1] started building with OWL, the web ontology language, to represent the shape of the graph. This made sense because OWL is a very rich language for describing graphs. However it also has drawbacks. It is very hard - and alien to common experience - for developers to read OWL. It was not built to describe schemata but rather ontologies (to describe what could be represented, rather than what must be represented). It also had no concept of a document, and as we were trying to build a document-oriented knowledge graph, we had to graft one onto it, which became a source of confusion for our users.

Eventually - with much pain and time - we decided to simplify the interface, make the concept of the document more central, make the primary interaction method be through json documents and create a schema language that looks like the JSON you hope to build (and feels more like one you might write in a programming language).

It is early days for the relaunched version (and we had to swallow the frustration of such a deep breaking change), but it certainly feels like regular programmers are now able to quickly build knowledge graphs. The combination of graph, schema, and document is powerful.

[1] https://github.com/terminusdb/terminusdb

It's a tough operational problem for knowledge graphs to know where updating starts and ends.

For instance you might put a "record" into a knowledge graph that describes some topic like


and then later decide to delete it. The "record" consists of not just the

   ?s ?p ?o
patterns such that

   ?s == https://www.wikidata.org/wiki/Q108937326
but also other patterns that involve "blank nodes" that are necessary to make statements that go beyond what a single triple can express.

You can reconstitute the relational database without "rows" (e.g. turn a relational database into a graph and do OWL inference on that graph, or run SQL queries against a database with a columnar organization) but the row concept, like the document concept in document-oriented databases provides a boundary for updating records that (mostly) works even in the absence of transactional semantics.

Many of the older approaches to implementing transactions were row-centric, although newer MVCC approaches apply just fine to graph systems.

We have a immutable chain structure in TerminusDB which allows for straightforward uncoordinated multi-read access or multi-version concurrency control. This approach also makes branching simple to implement. Any number of new layers can point to the same former parent layer.

You might like this white paper (but for reasons above you will have to overlook some of the OWL information): https://github.com/terminusdb/terminusdb/blob/dev/docs/white...

> tie them back into what programmers do

While the technology is built on the back of what programmers do, there is nothing inherent to knowledge graphs that imply that building them is a programmer's task. It's very possible that that task and responsibility falls to someone else and programmers are left building the interface and access portal to a tool used by a different specialization.

Why do programmers want to do everything?

Mostly the ability to think in abstractions and imagine what might be technically possible! Certainly not exclusive to programmers but more density there.

Because other people don't want to. Like Barbie said in the 1980s, "Math is Hard".

Agreed. Programmers don't like math either. And because of that, this task that is specialized business domain system building is likely to be given to specialists -- not programmers. It lives in the data science/business/logistics analysts space.

The whole point of this type of systems analysis is to be able to lift and shift the task from a group of people who can but also don't want to do it to a smaller group of people who have chosen and specialized to do it.

My experience is that programmers (if you can generalise them like this) breath a sigh of relief when I as an ontologist (someone who builds knowledge graphs and makes sure that they are logically sound) tell them that I will work with domain experts to get their knowledge into a structured form that the programmers then may consume programmatically.

looks well written. may be good brain training.

very valuable resource, good starting point for anybody looking for an overview of this evolving and promising field

one thing missing (from my perspective) is some sort of informed assessment / lessons learned about the (relatively) limited adoption of such methods in practice and the role of blind spots or academic biases in this respect.

Knowledge Graphs are more and more adapted in practise. There are some nice in-use papers about it, for example one with Pinterest (https://arxiv.org/pdf/1907.02106.pdf).

Can this be ralatable to Wolfram's graph theory for (trying to) representing fundamental physics?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact