Hacker News new | past | comments | ask | show | jobs | submit login
Why Nasa Converted Its Lessons-Learned Database into a Knowledge Graph (nuclino.com)
292 points by zzaner on Feb 6, 2019 | hide | past | web | favorite | 79 comments

The tech story by NASA's chief knowledge architect is more detailed on https://linkurio.us/blog/how-nasa-experiments-with-knowledge... and https://neo4j.com/blog/nasa-critical-data-knowledge-graph/, with a presentation video on https://www.youtube.com/watch?v=vwJyU9vsfmU

Disclamer: Linkurious CEO here, the tool used to explore the Neo4j graph database used at NASA.

Since linkcurious is for the enterprise, What would be your recommendations for a personal knowledge database for individual users ?

Not your parent, but I've recently toyed with TiddlyWiki, which shows promise, but requires heavy configuration.

There is also an extension to it called TiddlyMap which displays several of the properties mentioned in this article (edges with properties, etc), but again, requires configuration to get just so.

If you're game to do some tinkering, I've found it to be hackable to some very deep levels. Another nicety is that it's all just a single HTML file, so it's madly portable (I can use the same site on my phone and laptop).

All this being said, there is a growing list of features that I would like to see in Tiddlywiki that I'm not sure I can hack in myself, so I suppose I, too, am looking for the "one true knowledge management" solution

Just start with Apache Jena, standards based with RDF as the exchange format and SPARQL for the query language. Others solutions may use proprietary stuff for better vendor lock-in: This is completely up to you if you want that. But with Apache Jena you can change later to other KG databases. Also: Apache Jena is easy to work with, since it includes Fuseki to start directly using it as a web API.

https://jena.apache.org/ https://jena.apache.org/documentation/fuseki2/

Once you need "big data" for your personal Knowledge Graph, you can use other RDF stores, without vendor lock-in.

There is TheBrain, which has a free-of-charge personal offerring, as well as aid service tiers.

Jerry Michalsky is among the more notable users.



I've been working on building a personal knowledge database tool recently, feel free to shoot me an email at antimatter15@gmail.com if you'd like to be one of the first to try it out.

Due to the number of crawlers on this site, I recommend (if it's not too late) you edit your post to use the format

address at domain dot com :)

PS: Sent you an email. :)

if you are after a graph database for personal use - Segrada (Segrada.org) is a nice open source UI on top of OrientDB.

Otherwise see Marviel and DredMorbius's suggestions both are worth checking out.

Segrada looks very interesting, thank you very much !

It depends on your needs, maybe try the SaaS app https://kumu.io/ or https://graphcommons.com/

"Chief Knowledge Officer", cool title.

A know a guy who's title is "President of Intelligence".

It's hard to beat a NASA job title, "Planetary Protection Officer". [1]

[1] https://sma.nasa.gov/sma-disciplines/planetary-protection

a.k.a a librarian.

Using a librarian to manage knowledge in an organized organization sounds like a no brainer. They are trained for just that stuff!

I found your product this week when searching for a neo4j visualization tool but I couldn’t try it on anything other than an example database. Is there anyway to try/use it as a researcher?

Amazing, thanks for sharing!

I was hyped to read a cool article about NASA and tech, but this just reads like an advertising for a software that I've never heard of.

Was about to post the same thing, multiple submissions by the same person for the same product over the last 2 months.

First, there would be nothing wrong with it.

Second, I have no idea what you're talking about. The submitter seems to have exactly six submissions so far, and not a single other one about this. [Edit: Obviously mistaken on that point]

I don't know who's submission history you're looking at, but zzaner's history shows 4 of 6 links pointing to nuclino.com and a 5th is a link to Nuclino for iOS. Only the 6th seems to have nothing to do with Nuclino.

“Content Marketing” articles are proliferating on HN currently...

This is the main problem when downvoting articles is not possible.

You can flag them if you like https://news.ycombinator.com/item?id=12173809

Yea, they definitely got me too. I was all set to nerd out when my "sales alert" went off...

I spent a bulk of my programming career modelling business processes in a graph database with strong schema, lifecycle control (state machines) and formal change control (revisioning).

I was always blown away with how easy it was to turn around a very stable and useful system where the customers could actually understand the data model and refactoring was easy to reason through.

Graph databases FTW.

What tools you used for this?

It's not commercially available unfortunately when I last checked. You have to buy business software and licences to get access to the DB kernel and api's. The main product is called Enovia, formerly called eMatrix by matrix one. Dassault Systemes in France.

Would be sweet to have a similar system in FOSS middleware on top of Neo4J or OrientDB.

[update] Dassault Systemes purchased Matrix One about 10 years ago. They still use their Graph based DB kernel in many of their products. From my understanding, this DB kernel was written in the early 90's and targeted PDM (Product Data Mangement). They now target a broader category of PLM (Product Lifecycle Management). Again, there is no way to purchase their Graph DB Kernel last time I checked which is a shame because its so awesome to develop with :(

[more info] Some notable customers of Enovia (eMatrix): Boeing, Honeywell, Tesla, Bombardier, Airbus.

Disclaimer: I don't work presently work for Dassault. I just really like their DB kernel ;)


Grakn (Grakn.ai) offers a strongly typed graph database that is open source :) (disclaimer: work there)

Took a look. Interesting project. The abstraction model available to declaring relationships/vertices was hard to grasp at first glance but as I thought about it further, I can see some interesting benefits for pattern matching grammar. eMatrix did not have abstraction for relationships/vertices types but did for object/edge types.

Thanks. I'll take a look :)

I didn't know the LLIS was available online.

The first one I managed to click on was related to a fire in an employee's car: https://llis.nasa.gov/lesson/943

"Employee Falls Down Steep Ramp": https://llis.nasa.gov/lesson/21803

That's as thorough as documentation gets https://llis.nasa.gov/lesson/589

That’s been known for hundreds of years about linseed oil, and is one if the reasons it’s no longer used to waterproof clothing. I wonder what the circumstances were and what it was being used for.

its still used very extensively in the surface treatment and sealing of woods. also in steel finishing processes for rust prevention.

> Spontaneous heating occurs when a self-ignition combustible reacts with sufficient oxygen

Certainly something NASA employees need to be aware of.

I'm unable to find the lessons mentioned in the article about uprighting systems. Any ideas?

So one thing I still don't understand is whether Neo4J a pure graph database is better than using something like AegensGraph[0] or Cayley[1], which uses a pre-existing DB engine for their persistent layer. If yes, what are the advantages? Is it something that totally depends on the use case? If it is, what criteria should be used to make a decision?

[0]:https://github.com/bitnine-oss/agensgraph [1]:https://github.com/cayleygraph/cayley

There's pros and cons to deciding whether to go "graph native" or existing DB.


You can optimize for exactly the types of queries that you want graph databases to answer: shortest path, path finding, etc. Relational databases / document databases are (generally) very poop at those types of queries because those are not the types of queries people want to run on those databases. In a "graph native" database, everything down to the storage on disk can be optimized to perform graph algorithms.


There's years, sometimes decades, of engineering that goes into databases (I'm thinking of PostgreSQL and Cassandra, both of which have graph "layers" available). A lot of the engineering work is non-graph specific: ACID, how to handle transactions, distributed computing, WAL, replication.

Why re-engineer all of those just to perform graph operations? More quickly.

Also, I can send you a good paper by the founder of DGraph Labs if you're really curious.

I would love to read the DGraph paper.

indexing and search specialized to graph operations is a thing; no experience with those projects, but familiar with some workarounds in Postgres. Basically, the deeper the graph searches, the more the performance drops for relational DBs. This is a seriously studied topic, so refer to research for more details

If you want to do a real deep dive into the architectural differences of graph databases, the book "Designing Data-Intensive Applications" by Martin Kleppmann is a great resource. https://www.oreilly.com/library/view/designing-data-intensiv...

Thanks for that book recommendation.

The real NASA knowledge graph, with actual technical detail... https://www.stardog.com/categories/nasa/

Yes, and thanks. This bit is really important, I see too many people who don’t understand the difference between a graph database and a knowledge graph.

“So how did we build this thing with the smart folks at NASA as partners and customers? The key takeaway here is that a Knowledge Graph platform is a Knowledge Toolkit plus a Graph Database, and all of those components are critical at NASA.

Doing this with a plain graph database isn’t going to work unless you want to do all the heavy lifting of AI, knowledge representation, machine learning, and automated reasoning yourself, from scratch. I’ll wait while you decide…didn’t think so.”

I wonder if there is an enterprise app that does this for you?

I can think of plenty of examples at my work where spidering a website and displaying it in a graph would be really cool.

Our wiki would be one for sure.

To answer your question, yes, there is. The combination of tools used by NASA is Neo4j (database) + Linkurious (enterprise graph exploration tool).

Links: https://neo4j.com/ https://linkurio.us/

More info about this use case here: https://linkurio.us/blog/how-nasa-experiments-with-knowledge...

The screenshot in the article is from Linkurious (without any mention in the article, which is strange).

Spoiler: Linkurious co-founder here.

True, I've been using mind mapping tools but it's not the same.

Nuclino (https://www.nuclino.com/) looks promising, trying it out now.

New account + gratuitous namedrop of the product whose corporate blog is linked in the submission.

This seems to be an advertisement albeit a strange one. They make it clear that NASA used Neo4J rather than Nuclino. Neo4J is a true graph database, but I didn't find anything on the Nuclino website that suggests what Nuclino really is or what technology it uses.

Nuclino is a tool to write documentation and the only thing "graph" about it, from my understanding as a user, is you can link to different documents within nuclino which then generates a graph. This graph nuclino visualises so the user can explore the documentation.

In my experience this exploring thing kinda only makes sense when you want to document doing/trying the same thing again ( which NASA probably is). If you are just documenting how to connect to a database, set something up or similar it, to me, falls pretty glat. Maybe I'm using it wrong...

No idea what they use under the hood.

Source: Use it where I work

Wikipedia says it's written in Javascript. If true, that's kind of horrifying. https://en.wikipedia.org/wiki/Nuclino

Why so? From their site, it looks like they are not really selling a grpah db but a webapp. Lots of application are written in NodeJs.

What I am looking for is a nice way (graph) where I can connect all kinds of events/people/commits/bugs/tickets and jump between them.

Currently I am putting links on GitHub PR descriptions so I know in my deployment GitHub repo, Who releases What, When and in Which cluster (where)

The PRs contain links to Jira tickets.

So all in all if you “sprinkle” enough links on GitHub Jira, I essentially can click through them and answer the question, how that ended up here? What changed? Where is the bug?

But I feel like this set of links referencing GitHub, Jira, PRs, Commits, Error Reports would be really fitting in some kind of graph

This kind of reminds me of the FMEA and its web structure, which is very useful.

It does share the big weakness with all the other such databases though, very hard to convince people to use it, specially to add and maintain content.

Does anybody here have a 'canonical' application or example in mind that shows me what neo4j can do that matches my intuitive understanding better than the 'regular' RDBMS?

That can be non-obvious, so fair. We (graphistry) get pulled into a lot of investigative scenarios -- account takeover (web logs), malware/phishing analysis (host/network logs & feeds), AML, claims fraud, etc. I found the problems being solved to be some combination of: awkward to express with SQL, too slow to run in a RDBMS, or hard to visually explore relationships/correlations.


=== Shortest Paths

1a. Referral: "Who on our team connected to which leadership at Apple?"


1b. Supply Chain, AML, entanglements...: "How are these companies related, even if 5 companies away, and across all sorts of relationship types?"


=== Neighborhood (incl. multi-hop):

2a: 360 context on a security/fraud/ops incident:


+ (hacked:Computer[ip=""])-[Login]->(u:User)-[e:Alert]-(metadata:)

2b: fraud rings:


+ (fraudster:clientIP)-[x:http[method="POST"]]-(p:Page)

2c: Journeys (customer, patient, ...)


=== Whole system optimization / compute:

Personalized pagerank, supplychain optimization, business process mining, ...


The above can be extended, such as by adding in compute (correlation, influence scores, ...). That feds into viz / recommendations / decision making.

or: Not all uses of graph are end-to-end. We often get used with a graph db to improve understanding it (our viz scale 100-1000X over the tools here via GPUs)... but folks may instead plug their graphdb into a tabular frontend. Or use us with a tabular system like Splunk/Spark/Elastic. So the above can be hard to write in Splunk/SQL, or slow to run, or hard to visually understand.

This may not be a canonical application of a data model, but expressing graph queries using "Cypher", the graph query language invented by Neo4j, is very intuitive to my mind. I find the use of ASCII art to help visualize the relationships welcome.

For example, say we have a graph of movies, actors, reviewers, producers, etc. Here's Cypher query that returns the names of people who reviewed movies and the actors in these movies

  MATCH (r:Person)-[rev:REVIEWED]->(m:Movie)<-[:ACTED_IN]-(a:Person)
  RETURN DISTINCT r.name AS Reviewer, m.title AS Title, 
                  m.released AS Year, rev.rating AS Rating,
                  collect(a.name) AS Actors
Another example: You want to know what actors acted in movies in the decade starting with the year 1990

  MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
  WHERE 1990 <= m.released <= 1999
  RETURN m.released, collect(m.title) as titles, collect(p.name) as actors
  ORDER BY m.released

I'm new into this "knowledge" space - but I've stumbled upon structr.com - It's open source - you can use this as an extremly flexible content management system for buisness processes and stuff like that - check out their website - however can't tell you much more it's on my todo list.

RDBMS are developed with joins in mind, but also die of the complexity involved resulting from these joins (both from developer perspective, as well resources.)

Now imagine your join to become a primary perspective to look at your data. Then you'd see creditcard transactions (who buys what when?) or maps are better represented as a graph. I know for example TomTom uses neo4j to validate map edits in production.

Say you have transactions which follow a complex supply chain... Sure you can reconstruct the path taken using recursive SQL, but you're also joining lots and lots of things together at runtime.

In a graph database, you've effectively taken your 'join' penalty at the point of ingestion and you have an expressive query syntax to describe the pattern you're trying to match.

One problem I've seen in start ups as they scale isn't the lack of good documentation but the lack of information organization and hierarchy. The cost you pay is repeating experiments/trials, and generally slower development. The best way, I've found to overcome this, is to just talk to people and construct an information map/hierarchy as a mental model. Obviously, this process can't scale with the business. I wonder if this tool would be useful for software/product dev in start up environments?

Has anybody ever seen a knowledge database for a large organization that actually works? I always see these efforts but usually they turn out pretty useless because nobody keeps them up to date.

Does Wikipedia count?

Wikipedia is pretty phenomenal but I was thinking more about a company. I have never seen a company that has the culture of continuously contributing to these efforts. They all fizzle out quickly.

In the two companies I've worked for, I was/am one of the main people that try to update documentation. I should ask if my previous company still uses the wiki that I kind of revived (it was full of useful info, but most people preferred to bother others, and if they searched the wiki before, they would not update the wiki with the newly learned info). By making it a useful resource, I felt like more people would start to use it, but my six months of employment were probably too short to really make that a reality.

I'm working in an even smaller company now - previously it was about 40 employees, now it's a handful. There are docs for the important things so there is no single point of failure, but very few day to day things are written down (like whom to report to that you're ill). As we grow, it's slowly becoming worth it to document that (single source of info instead of either having to bother the big boss or having different sources) and I'm looking at options to organise it. Organising it topic-based (graph(-like)) is an interesting alternative to the standard info dump with a search feature (wiki).

Trying out Nuclino just now and putting some items into it, I additionally noticed that having a separate system from your actual knowledge database can also be useful: info pages are on the wiki, custom tools are in different git repositories, project info might be in some task manager... If you have a separate system (such as a graph) that just points you to the right URL (wiki/task manager) or folder within a git repository, the system can outlast any of the individual products being used. Then again, having a layer of indirection makes it more time-consuming to use when you know that your info page is going to be in the wiki. I guess it will have to be very quick to call up and integrated nicely to make it worth it for others to use.

The problem with separate systems is that when you want to look up something you don’t know where to look unless it fits exactly into certain categories. Right now we use OneNote within the team for everything. It’s not perfect but at least I know where to look.

Having a system that organises the knowledge/info available should of course be comprehensive. If not everything is in there, there is no point having it.

But it's a good point you make. Now that I write it like this, it makes it more clear for myself how the system would work. It wouldn't just be another separate system, it would be the index for all the systems and whenever someone writes a new page where-ever, they should be required to link it into this graph (or whatever form it takes).

Also make sure that it’s really easy to figure out where to put new information and then it should be easy to put the information there. From my experience even the slightest friction or confusion will make people stop using the system.

not very convincing. If the differentiator was the correlation of data in a more meaningful way - It doesn't matter if you display the correlating data in a list or a graph...

Is there any way to view the knowledge graph in the new design? Lot's of other people linking the database itself, but I can't actually find a link with the new design...

This reads like SCP but with NASA and sometimes more scary.

Never heard of this in my four years there. Hrm.

It's not like space-travel knowledge-bases are rocket science ... oh, wait.

Really rather not read a bunch of content marketing on this site. Could we stick to news?

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact