
KnowledgeNet: A Benchmark for Knowledge Base Population - miket
https://blog.diffbot.com/knowledgenet-a-benchmark-for-knowledge-base-population/
======
miket
When people think about using computers for Natural Language Processing, they
often think about end-tasks like classification, translation, question
answering, and models like BERT that model the statistical regularities in
text. However, these tasks only measure indirectly how much the system has
understood the meaning of the text, are largely unexplainable black boxes, and
require reams of training data.

NLP is good enough that we can now explicitly measure how well a system reads
text in terms of what knowledge is extracted from it. This task is called
Knowledge Base Population, and we've released the first reproducible dataset
called KnowledgeNet that measures this task, along with an open source state-
of-the-art baseline.

Direct link to the Github repo: [https://github.com/diffbot/knowledge-
net](https://github.com/diffbot/knowledge-net) EMNLP paper:
[https://www.aclweb.org/anthology/D19-1069.pdf](https://www.aclweb.org/anthology/D19-1069.pdf)

------
g82918
Mostly an article pushing their benchmark and article:
[https://www.aclweb.org/anthology/D19-1069.pdf](https://www.aclweb.org/anthology/D19-1069.pdf).
In the article they compare existing benchmarks against a criteria they create
to show their benchmark is the only one that features the things they say are
important. All the others are somehow deficient by the totally objective
metric they create.

------
bhl
Reminds of a submission from a year ago on autogenerating knowledge base from
articles from the web [1]. I think it'd be neat if Q&A Nets and other
techniques sufficed to the point where we would prefer using "knowledge
engines" over search engines, like a generalized Wolfram Alpha.

[1] [https://primer.ai/blog/quicksilver](https://primer.ai/blog/quicksilver)

------
nl
_State-of-the-art models (using BERT) are far from achieving human performance
(0.504 vs 0.822)._

This is moderately surprising.

In question answering (QA) style tasks (SQUAD, SQUAD 2) we see state of the
art models approach human performance. QA is similar to KBC in the sense that
the answers are usually extracted from text in a similar way.

I'd imaging there is potential for fairly rapid improvement in this (Knowledge
Base Population) task.

~~~
g82918
As long as we haven't reached AGI I feel like this is true of any new
benchmark. BERT wasn't trained or designed for the task, give some smart folks
a few months and they can now beat the task. The bigger question is what would
we like an AI to be able to do. Is this benchmark a good one? Is there maybe a
better choice of questions to get the type of NLP we want?

~~~
nl
_Is this benchmark a good one?_

Any benchmark which reflects a task that humans do is a good one, unless it
has specific weaknesses that a computer exploits.

I'd use models that are written for this is my work, so I find it useful.

 _I feel like this is true of any new benchmark.... give some smart folks a
few months and they can now beat the task._

In NLP work this use not to be the case. 5 years ago we were stuck at a local
maximum.

And this undervalues this task - this bridges the gap between unstructured and
structured data. In many ways it is the holy grail for many tasks.

------
sdan
Amazing! Love using Diffbot and although I'm not too deep into the NLP space
yet, finding the relations of the text itself is a pretty important task.

