
Abstract Wikipedia - infodocket
https://meta.wikimedia.org/wiki/Abstract_Wikipedia/June_2020_announcement
======
keane
Example notation for the project, called AbstractText:

————

Input 1:

Subclassification(Wikipedia, Encyclopedia)

Result 1:

English: _Wikipedias are encyclopedias._

German: _Wikipedien sind Enzyklopädien._

————

Input 2:

    
    
      Article(
       content: [
         Instantiation(
           instance: San Francisco (Q62),
           class: Object_with_modifier_and_of(
             object: center,
             modifier: And_modifier(
               conjuncts: [cultural, commercial, financial]
             ),
             of: Northern California (Q1066807)
           )
         ),
         Ranking(
           subject: San Francisco (Q62),
           rank: 4,
           object: city (Q515),
           by: population (Q1613416),
           local_constraint: California (Q99),
           after: [Los Angeles (Q65), San Diego (Q16552), San Jose (Q16553)]
         )
       ]
     )
    

Result 2:

English: _San Francisco is the cultural, commercial, and financial center of
Northern California. It is the fourth-most populous city in California, after
Los Angeles, San Diego and San Jose._

German: _San Francisco ist das kulturelle, kommerzielle und finanzielle
Zentrum Nordkaliforniens. Es ist, nach Los Angeles, San Diego und San Jose,
die viertgrößte Stadt in Kalifornien._

————

I didn’t understand quite what the proposal was until I saw these examples
from
[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples](https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples)

~~~
StavrosK
I wonder what happens in more literal languages, where "center" doesn't mean
"main area".

~~~
dan-robertson
Also the grammars of English and German are pretty similar. How well would it
scan in other languages? Perhaps “well enough” is sufficient.

~~~
shadowgovt
The key idea is that if the semantic description is abstracted enough, a
grammar engine can convert the ideas encoded in it into the right structure
for the language.

Not all languages have "X is Y" constructs, but all known human languages have
_some_ structure to declare that object X has property Y. Capture the idea
"Object X has property Y" in your semantic language, and a grammar engine can
wire that down to your target language.

The largest risk is that the resulting text will be dry as hell, not that it's
an impossible task.

~~~
popinman322
Though being dry doesn't diminish the value of the text, though. Very
exciting.

I'd also be worried about ambiguity; humans can (sometimes) detect when they
may be parsed the wrong way in context. I wonder if there will be a way to
flag results that don't properly convey the data. How would that be integrated
into the generator? (There's probably an answer in the literature.)

Lots of fun questions to explore.

------
Ninjaneered
For reference, this is from the same developer [1] that created Semantic
MediaWiki [2] and lead the development of Wikidata [3]. Here's a link to the
white paper [4] describing Abstract Wikipedia (and Wikilambda). Considering
the success of Wikidata, I'm hopeful this effort succeeds, but it is pretty
ambitious.

[1]
[https://meta.wikimedia.org/wiki/User:Denny](https://meta.wikimedia.org/wiki/User:Denny)

[2]
[https://en.wikipedia.org/wiki/Semantic_MediaWiki](https://en.wikipedia.org/wiki/Semantic_MediaWiki)

[3]
[https://en.wikipedia.org/wiki/Wikidata](https://en.wikipedia.org/wiki/Wikidata)

[4] [https://arxiv.org/abs/2004.04733](https://arxiv.org/abs/2004.04733)

~~~
9nGQluzmnq3M
As a long-time Wikipedian, this track record is actually worrisome.

Semantic Mediawiki (which I attempted to use at one point) is difficult to
work with and far too complicated and abstract for the average Wiki editor.
(See also Tim Berners-Lee and the failure of Semantic Web.)

WikiData is a seemingly genius concept -- turn all those boxes of data into a
queryable database! -- kneecapped by academic but impractical technology
choices (RDF/SPARQL). If they had just dumped the data into a relational
database queryable by SQL, it would be far more accessible to developers and
data scientists.

~~~
mmarx
> WikiData is a seemingly genius concept -- turn all those boxes of data into
> a queryable database! -- kneecapped by academic but impractical technology
> choices (RDF/SPARQL). If they had just dumped the data into a relational
> database queryable by SQL, it would be far more accessible to developers and
> data scientists.

Note that the internal data format used by Wikidata is _not_ RDF triples [0],
and it's also highly non-relational, since every statement can be annotated by
a set of property-value pairs; the full data set is available as a JSON dump.
The RDF export (there's actually two, I'm referring to the full dump here)
maps this to RDF by reifying statements as RDF nodes; if you wanted to end up
with something queryable by SQL, you would also need to resort to reification
– but then SPARQL is still the better choice of query language since it allows
you to easily do path queries, whereas WITH RECURSIVE at the very least makes
your SQL queries quite clunky.

[0]
[https://www.mediawiki.org/wiki/Wikibase/DataModel](https://www.mediawiki.org/wiki/Wikibase/DataModel)
[1]
[https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Fo...](https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format)

~~~
boxed
The sparql api is no fun. Limited to 60s for example is death. I had to resort
to getting the full dump.

------
ImaCake
People might be interested to know that semantic web ideas have been more
successful in some niches than others. Computational biology for example makes
extensive use of "ontologies" which are domain specific DAGs that do exactly
what Abstract Wikipedia is attempting. Much of the analysis of organism's
genomes and related sequences relies on these ontologies to automatically
annotate the results so that meaningfull relationships can be discovered.

There are of course HUGE issues with the ontologies. They are not sexy
projects so they are often underfunded and under resourced - even though the
entirety of bioinformatics uses them! The ontologies are incomplete and
sometimes their information is years behind the current research.

For the curious, the Gene Ontology is the golden child of biology ontologies.
See here: [http://geneontology.org/](http://geneontology.org/)

~~~
jchook
Amazingly fascinating field. I have learned a (very) small amount about this
from Dr. David Sinclair's book Lifespan

------
xvilka
Semantic Web[1] reborn (after alleged[2] death)? Also I wonder how helpful
Prolog infrastructure could be since they provided some useful frameworks
[3][4] for that.

[1]
[https://www.w3.org/standards/semanticweb/](https://www.w3.org/standards/semanticweb/)

[2] [https://twobithistory.org/2018/05/27/semantic-
web.html](https://twobithistory.org/2018/05/27/semantic-web.html)

[3] [https://www.swi-prolog.org/web/](https://www.swi-prolog.org/web/)

[4] [https://www.swi-
prolog.org/pldoc/doc_for?object=section(%27p...](https://www.swi-
prolog.org/pldoc/doc_for?object=section\(%27packages/semweb.html%27\))

~~~
tgv
That kind of AI has been tried, 30 years ago, and it doesn't go far enough.
It's really difficult to get that out of the toy domains.

~~~
d33
A bit changed in AI since 30 years ago. The way we use Internet changed as
well. Perhaps if we had a better semantic network and today's algorithms, we
could go further?

------
sitkack
HolyShit!

> The goal of Abstract Wikipedia is to let more people share in more knowledge
> in more languages. Abstract Wikipedia is an extension of Wikidata. In
> Abstract Wikipedia, people can create and maintain Wikipedia articles in a
> language-independent way. A Wikipedia in a language can translate this
> language-independent article into its language. Code does the translation.

from
[https://meta.wikimedia.org/wiki/Abstract_Wikipedia](https://meta.wikimedia.org/wiki/Abstract_Wikipedia)

Will this mean that knowledge is encoded in machine readable format and that
we can start to write programs over this knowledge graph? This is huge.

~~~
afandian
The Abstract Wikipedia idea is a great advance.

> knowledge is encoded in machine readable format and that we can start to
> write programs over this knowledge graph

But surely that abilty has been the goal of wikidata from the start?

~~~
coolreader18
Perhaps, but this seems to be moving towards a more holistic machine-readable
article graph. If you look at a page from wikidata[0], it seems to be
basically a key-value database (e.g. earth.highest point = [ mount everest {
from sea level, 8000m } ], while the "full article" terminology used in the
announcement seems like it may be even more connected/informative/structured
than that.

[0]: [https://www.wikidata.org/wiki/Q2](https://www.wikidata.org/wiki/Q2)

~~~
afandian
Agreed, but my point was that the aim has _always_ been to encode these facts
and then mix them into wikipedia for any assertion / attribute, so that any
fact is backed by an assertion.

~~~
Vinnl
Exactly. The main difference is that they would now not be used to generate
the infoboxes, but actual prose.

------
blahedo
Anyone who has studied old-school AI will know that this is an _incredibly_
ambitious project; it is essentially throwing itself at the problem of
"knowledge frames", i.e. how to encode information about the world in a way
that an AI system can access it and, well, be intelligent about it. (Also at
the problem of natural language generation, but as hard as that is, at the
moment it seems like the easier of the two.)

But...

One of the biggest problems with a lot of the old "Big AI" projects that were
developing some sort of knowledge frames (and there were several, and some of
them still exist and have public faces) was, who the hell is going to get all
the info in there in a way that's complete enough to be useful? Now you have a
_learning_ problem on top of the knowledge representation problem. But throw
the wikimedia community at it and crowdsource the information?

This actually starts to seem plausible.

~~~
dannyw
Even if it's not successful, there's certainly enough interest in it to make
it worth trying.

Maybe we only get 30% of the way. So what? That's 30% more than zero!

~~~
Tistron
Technically, 30% more than zero is still zero though.

But yeah 30% of the way more than zero is something :-)

------
Jasper_
So do people find Wikidata that impressive? Here's what Wikidata says about
Earth, an item that is number 2 in the ID list, and also on their front page
as an example of incredible data.

[https://www.wikidata.org/wiki/Q2](https://www.wikidata.org/wiki/Q2)

I struggle to find anything interesting on this page. It is apparently a
"topic of geography", whatever that means as a statement. It has a WordLift
URL. It is an instance of an inner planet.

The first perhaps verifiable, solid fact, that Earth has a diameter of "12,742
kilometre", is immediately suspect. There is no clarifying remark, not even a
note, that Earth is not any uniform shape and cannot have a single value as
its diameter.

This is my problem with SPARQL, with "data bases", in that sense. Data alone
is useless without a context or a framework in which it can be truly
understood. Facts like this can have multiple values depending on exactly what
you're measuring, or what you're using the measurement for.

And this on the page for Earth, an example that is used on their front page,
and has the ID of 2. It is the second item to ever be created in Wikidata,
after Q1, "Universe", and yet everything on it is useless.

~~~
visarga
I find it pretty well stuffed with appropriate information. You're looking at
an ontology, not a wikipedia article, it's supposed to be dry (subject,
relation, object). It's being used to disambiguate concepts, named entities
and support machine learning models with general knowledge in a standard
format. There are plenty of papers on the topic of link prediction, auto-
completion and triplet mining.

Also, if you look:

> radius: 6,378.137±0.001 kilometre

> applies to part: equator

So it clearly states how the radius was measured.

~~~
Jasper_
> I find it pretty well stuffed with appropriate information. You're looking
> at an ontology, not a wikipedia article, it's supposed to be dry (subject,
> relation, object).

We're talking about a research project with a large amount of funding to go
from the former to the latter. But pretty much none of the stuff on Earth's
Wikipedia page is represented here.

> applies to part: equator

An equator (the general concept to which the ontology links to) has no given
orientation. Earth's Equator is a human construct distinct from an oblate
spheroid's equator, as are the specific locations of the poles. Nowhere is it
specified in the ontology that this is measured at a specific Equator, not
just any equator.

This is all human context and understanding that we've built on top, and it's
part of what I mean when I say that the data is kinda pointless. All of these
facts _depend_ on culture to understand.

~~~
anamexis
Well, the linked equator (Q23528) has a geoshape which defines what it is.

------
sciurus
A more detailed introduction to the idea:
[https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...](https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-04-26/In_focus)

A draft implementation plan:
[https://meta.wikimedia.org/wiki/Wikilambda/Plan](https://meta.wikimedia.org/wiki/Wikilambda/Plan)

------
thom
Anything that gives a boost to Wikidata is great. Being able to run queries
over wiki remains one of the most magical things on the internet:

[https://query.wikidata.org/](https://query.wikidata.org/)

~~~
visarga
It's magic but the SparQL language is very hard to learn.

~~~
meej
It's the identifiers that make querying Wikidata difficult, IMO. SPARQL is
pretty easy, certainly no more difficult than SQL. It might even be easier
than SQL since there are no joins.

~~~
mmarx
> It might even be easier than SQL since there are no joins.

Every dot between Triple Patterns in a Basic Graph Pattern is actually a JOIN;
you just don't need to worry about using them.

As for the identifiers, you get used to them if you work regularly with them,
and query.wikidata.org actually has completion for identifiers if you press
CTRL-Space.

~~~
meej
Right, the joins are in the graph.

------
miket
Hi, founder of Diffbot here, we are an AI research company spinout from
Stanford that generate the world's largest knowledge graph from crawling the
whole web. I didn't want to comment, but I see a lot of misunderstandings here
about knowledge graphs, abstract representations of language, and the extent
as to which this project uses ML.

First of all, having a machine-readable database of knowledge(i.e. Wikidata)
is no doubt a great thing. It's maintained by a large community of human
curators and always growing. However, generating actually useful natural
language that rivals the value you get from reading a Wikipedia page from an
abstract representation is problematic.

If you look at the walkthrough for how this would work
([https://github.com/google/abstracttext/blob/master/eneyj/doc...](https://github.com/google/abstracttext/blob/master/eneyj/docs/walkthrough.md)),
this project does not use machine and uses CFG-like production rules to
generate natural sentences. Works great for generating toy sentences like "X
is a Y".

However, human languages are not programming languages. Many natural
languages, like German and Finnish, are so syntactically and morphologically
complex that there is no compact ruleset that can describe them. (those that
have taken grammar class can relate to the number of exceptions to the
ruleset)

Additionally, not every sentence in a typical Wikipedia article can be easily
represented in a machine-readable factual format. Plenty of text is opinion,
subjective, or describes notions that don't have an proper entity. Of course
there are ways that engineer around this, however they will exponential grow
the complexity of your ontology, number of properties, and make for a terrible
user experience for the annotators.

A much better and direct approach to the stated intention of making the
knowledge accessible to more readers is to advance the state of machine
translation, which would capture nuance and non-facts present in the original
article. Additionally, exploring ML-based ways of NL generation from the
dataset this will produce will have academic impact.

~~~
YeGoblynQueenne
This is addressed in the white paper describing the project's architecture:

 _10.2 Machine translation_

Another widely used approach —mostly for readers, much less for contributors—
is the use of automatic translation services like Google Translate. A reader
finds an article they are interested in and then asks the service to translate
itinto a language they understand. Google Translate currently supports about a
hundred languages — about a third of thelanguages Wikipedia supports. Also the
quality of these translations can vary widely — and almost never achieves
thequality a reader expects from an encyclopedia [33, 86].*

 _Unfortunately, the quality of the translations often correlates with the
availability of content in the given language [1],which leads to a Matthew
effect: languages that already have larger amounts of content also feature
better results intranslation. This is an inherent problem with the way Machine
Translation is currently trained, using large corpora. Whereas further
breakthroughs in Machine Translation are expected [43], these are hard to plan
for._

 _In short, relying on Machine Translation may delay the achievement of the
Wikipedia mission by a rather unpredictabletime frame._

 _One advantage Abstract Wikipedia would lead to is that Machine Translation
system can use the natural language generation system available in Wikilambda
to generate high-quality and high-fidelity parallel corpora for even
morelanguages, which can be used to train Machine Translation systems which
then can resolve the brittleness a symbolic system will undoubtedly encounter.
So Abstract Wikipedia will increase the speed Machine Translation will become
better and cover more languages in._

[https://arxiv.org/abs/2004.04733](https://arxiv.org/abs/2004.04733)

(Theres's more discussion of machine learning in the paper but I'm quoting the
section on machine translation in particular).

~~~
incompatible
Additionally of course Google Translate is a proprietary service from Google,
and Wikimedia projects can't integrate it in any way without abandoning their
principles. It's left for the reader to enter pages into Google Translate
themselves, and will only work as long as Google is providing the service.

What is the quality of open source translation these days?

~~~
akimball
State of the art is always open source in MT.

------
memexy
> The project will allow volunteers to assemble the fundamentals of an article
> using words and entities from Wikidata. Because Wikidata uses conceptual
> models that are meant to be universal across languages, it should be
> possible to use and extend these building blocks of knowledge to create
> models for articles that also have universal value. Using code, volunteers
> will be able to translate these abstract “articles” into their own
> languages. If successful, this could eventually allow everyone to read about
> any topic in Wikidata in their own language.

This is a great idea. I bet the translations will be interesting as well. I
was wondering about how the translation was going to work and it looks like
they thought of that as well. They're going to use code to help with the
translation.

> Wikilambda is a new Wikimedia project that allows to create and maintain
> code. This is useful in many different ways. It provides a catalog of all
> kind of functions that anyone can call, write, maintain, and use. It also
> provides code that translates the language-independent article from Abstract
> Wikipedia into the language of a Wikipedia. This allows everyone to read the
> article in their language. Wikilambda will use knowledge about words and
> entities from Wikidata.

~~~
zozbot234
Pretty-printing the abstract content into an arbitrary target language (a
better way of putting it than "translation") would be quite the challenge,
because "conceptual models" do vary by language. One can attempt to come up
with something that's "as abstract/universal as possible" but it remains to be
seen how practically useful that would be.

For that matter, making the source model "logical" and "compositional", as
implied by the Wikilambda idea, only opens up further cans of worms. Linguists
and cognitive scientists _have_ explored the idea of a "logical" semantics for
natural language, even drawing on the λ-calculus itself (e.g. in Montague
grammar and Montague semantics), but one can be sure that a _lot_ of
complexity will be involved in trying to express realistic notions by relying
on anything like that.

~~~
memexy
I didn't assume the translations would be lossless. It's obvious there will be
conceptual mismatches but that's why this is interesting. Because when the
abstract model is made concrete people can notice the gaps and improve the
abstract model. I can imagine a feedback loop that improves both the abstract
and concrete/translated models as people work on improving both to reduce the
conceptual gaps between the abstract and concrete models.

------
dankohn1
Sorry to be the typical pessimistic HN commenter (e.g., Dropbox is just ftp),
but this seems ambitious enough to remind me of
[https://en.wikipedia.org/wiki/Cyc](https://en.wikipedia.org/wiki/Cyc).

~~~
zozbot234
Even Wikidata today is already a lot more usable and scalable than Cyc. The
latter always seemed like a largely-pointless proof of concept; Wikidata by
contrast is very clearly something that can contain real info, and be queried
in useful ways. (Of course knowledge is not always consistently represented,
but that issue is inherent to any general-purpose knowledge base - and
Wikidata does at least try to address it, if only via leveraging the well-
known principle "many eyes make all bugs shallow".)

~~~
khaidardotco
With regards to bugs apparently largest human by mass is 20 years old gymnast:

[https://www.wikidata.org/wiki/Q15710550](https://www.wikidata.org/wiki/Q15710550)

~~~
yorwba
Looks like someone fixed it after your comment. Thanks for contributing your
eyeballs to the hunt!

------
needle0
I hope at least 20-30% of the people involved in the project are at least
near-native level speakers of non-Indo-European languages. Linguistic biases
based on your mother tongue die hard, and I know this from having waded
through tons and tons of software designed with biases built-in that woefully
disregard Asian syntax, typography, input, grammar, semantics, etc etc etc. As
the whole point of the project is multilingual support, I really hope the
developers don’t underestimate how grammatically and semantically distant
different language families can be.

------
crazygringo
I think a consistent multilingual Wikipedia is a _fantastic_ goal.

But I'm not sure this is the right way to do it.

Given that most of the information on Wikipedia is "narrative", and doesn't
consist of facts contained in Wikidata (e.g. a history article recounting a
battle, or a movie article explaining the plot), this scope for this will be
extremely limited. The creators are attempting to address this by actually
containing every single aspect of a movie's plot as a fact, and that sentences
are functions that express those facts... but this seems entirely unwieldy and
just too much work.

What I've wished for instead, for years, is actually an underlying
"metalanguage" that expresses the vocabulary and grammatical concepts in _all_
languages. Very loosely, think of an "intermediate" linguistic representation
layer in Google Translate.

Obviously nobody can write in that directly in a user-friendly way. But what
you _could_ do is take English (or any language) text, do an automated
translation into that intermediate representation, then ask the author or
volunteers to _identify_ all ambiguous language cases" \-- e.g. it would ask
if "he signed" means made his signature, or communicated in sign language. It
would also ask for things that would need clarification perhaps not in your
own language but in other languages -- e.g. what noun does "it" refer to, so
another language will know to use the masculine or feminine version. All of
this can be done _within your own language_ to produce an accurate language-
agnostic "text".

 _Then_ , out of this intermediate canonical interpretation, _every_ article
on Wikipedia would be generated back out of it, in _all_ languages, and
_perfectly_ accurately, because the output program isn't even ML, it's just a
straight-up rule engine.

Interestingly, an English-language original might be output just a little bit
different but in ways that don't change the meaning. Almost like a language
"linter".

Anyways -- I think it would actually be doable. The key part is a "Google
Translate"-type tool that does 99% of the work. It would need manual curation
of the intermediate layer with a professional linguist from each language, as
well as manually curated output rules (although those could be generated by ML
as a first pass).

But something like that could fundamentally change communication. Imagine if
any article you wanted to make available perfectly translated to anyone, you
could do, just with the extra work of resolving all the ambiguities a
translating program finds.

~~~
SkyBelow
>The creators are attempting to address this by actually containing every
single aspect of a movie's plot as a fact, and that sentences are functions
that express those facts... but this seems entirely unwieldy and just too much
work.

Doesn't this also get into issues where facts aren't clearly defined. I can
think of a lot of interpretation of meaning from my literature classes, but
there are also questions such as ownership of land at contested borders, if
something was a legal acquisition or theft, or even coming up with a factual
distinction between when something is grave robbery vs archaeology. A personal
favorite would be mental illness, especially with some of the DSM V changes
that have largely been rejected (or outright ignored) by society. And there
are all sorts of political disagreements.

And as this applies to different languages, and different languages are likely
aimed at different cultures and different nations, this gets messy. I could
see some differences in an article written in Hindi vs Chinese when concerning
issues involving both China and India. Creating a common language will force a
unification of these differences that currently might exist with a sort of
stalemate with each linguistic side by maintained by the dominant country for
that language.

~~~
yorwba
> questions such as ownership of land at contested borders

The entity representing Kashmir
[https://www.wikidata.org/wiki/Q43100](https://www.wikidata.org/wiki/Q43100)
has three statements each for "country" and "territory claimed by", reflecting
the claims by China, India and Pakistan. There are separate entities for
Taiwan (the Republic of China)
[https://www.wikidata.org/wiki/Q865](https://www.wikidata.org/wiki/Q865) ,
Taiwan (the island)
[https://www.wikidata.org/wiki/Q22502](https://www.wikidata.org/wiki/Q22502) ,
Taiwan (the province of the Republic of China)
[https://www.wikidata.org/wiki/Q32081](https://www.wikidata.org/wiki/Q32081)
and Taiwan (the province of the People's Republic of China)
[https://www.wikidata.org/wiki/Q57251](https://www.wikidata.org/wiki/Q57251) .

So Wikidata can handle conflicting information by collecting all of it, but
clearly separating the different viewpoints in a kind of "split brain". That
works so long as the different sides can agree that their opponent's views are
what they state they are.

In an Abstract Wikipedia article, that means that all viewpoints with a
sufficiently large userbase might end up represented equally in all language
versions, but they'll still be clearly distinguished as such, so the reader
can apply their own value judgments to support their own side's viewpoint over
that of their enemies.

~~~
majewsky
So Abstract Wikipedia could achieve the ultimate goal of being banned in
China, India and Pakistan at the same time.

~~~
20after4
And maybe even in the USA.

------
jameshart
There could be some very interesting meta analytics that could be done on
knowledge structured in this way. For example, this research which identifies
the structural differences in the fact graphs of conspiracy theories vs
accurate accounts: [https://phys.org/news/2020-06-conspiracy-theories-
emergeand-...](https://phys.org/news/2020-06-conspiracy-theories-emergeand-
storylines-fall.html)

~~~
xtacy
Interesting link, thanks for sharing. I wonder what this means precisely:

    
    
      If you take out one of the characters or story elements of a conspiracy theory, the connections between the other elements of the story fall apart.
    

I guess I have to read the paper, but what are these "connections" and what
does "fall apart" actually mean?

EDIT: I just skimmed the paper
[https://journals.plos.org/plosone/article?id=10.1371/journal...](https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0233879)

The connections capture context-specific relationships, such as co-
occurrences. The "fall apart" part comes from the fact that conspiracy
theories rely on hidden, unsubstantiated, subjective interpretations of intent
or actions whose validity can be questioned. If they are key pillars of the
narrative, then their falsity can negate the truth of the narrative.

This reminds me of a philosophical discussion around what "truth" means.
Coherent theory of truth: truth is defined as a property that's coherent among
a set of beliefs. It can also be used as an epistemic justification -- that
is, any set of internally consistent beliefs can be taken as true. Of course,
in practice, certain truth statements have to correspond to reality, which is
where the correspondence theory of truth comes in.

------
blondin
truly love the mobile design of wikipedia and find myself adding ".m" to every
link that i visit on wikipedia. it has larger fonts, more readable copy (for
me at least), and works great on mobile. surprisingly the trick worked with
this one as well!

how come the mobile design is not the default?

~~~
notatoad
I'm not very involved in wikipedia politics so i might be wrong here, but my
perception is that the desktop wikipedia has a lot of eyes on it and any
change is received with a "aahh, change is scary" response.

the people who react negatively to change are also the people don't like
mobile versions of websites, so the mobile site is more free to experiment and
evolve their design.

~~~
IfOnlyYouKnew
Mobile is read- and very-occasionally-edit.

Desktop is basically the full admin interface for a Google-scale website with
only rudimentary authentication requirements and a default-allowed policy.
That works.

------
cochne
If successful, this could open huge doors in machine translation and NLP. Very
cool.

~~~
coding123
It kinda would, basically a huge library of labelled NLP data may come
available as the result of this.

------
csande17
A Wikipedia Signpost article[1] gives a more detailed overview of the goals of
the project, but it also made me think of an interesting failure case. From
the article:

> Instead of saying "in order to deny her the advantage of the incumbent, the
> board votes in January 2018 to replace her with Mark Farrell as interim
> mayor until the special elections", imagine we say something more abstract
> such as elect(elector: Board of Supervisors, electee: Mark Farrell,
> position: Mayor of San Francisco, reason: deny(advantage of incumbency,
> London Breed)) – and even more, all of these would be language-independent
> identifiers, so that thing would actually look more like Q40231(Q3658756,
> Q6767574, Q1343202(Q6015536, Q6669880)).

But Q1343202 doesn't mean "denial" as in "preventing someone else from getting
something", it means "denial" as in "refusing to accept reality". (See [2].)
The two concepts are represented by the same word in English, but they might
not be in other languages.

It seems like it'd be kind of tricky to create an interface that ensures other
English-speaking editors indicate the right meaning of "denial".

[1]
[https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost...](https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-04-26/In_focus)

[2]
[https://m.wikidata.org/wiki/Q1343202](https://m.wikidata.org/wiki/Q1343202)

~~~
bawolff
I think the answer is be as clear as possible in the interface, but also
accept mistakes will be made. People make grammar mistakes in (normal)
wikipedia all the time, then other people come along and fix them. I expect
the same will occur here.

------
yewenjie
Can really please ELI5 what the end product would look like? Couldn't
understand anything concrete from the article.

~~~
SquishyPanda23
This article from the SignPost is much more informative:

[https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2...](https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-04-26/In_focus)

------
ethhics
This appears to be an attempt to make a Wikipedia using the semantic data from
Wikidata. The semantic web ideas of Tim Berners-Lee may be catching on.

~~~
Schoolmeister
That already happens, that's what Wikidata does. [0]

[0]
[https://www.wikidata.org/wiki/Wikidata:RDF](https://www.wikidata.org/wiki/Wikidata:RDF)

------
random04243
This is of course an interesting idea, but it has a number of huge technical
hurdles to overcome. Here is the biggest:

Right now, if you want to become an editor of Wikipedia, you simply need to
have a passing familiarity with wikitext, and how the syntax of wikitext
translates into the final presentation of the article.

However, if you want to become an editor of Abstract Wikipedia, you'd need to
have an in-depth knowledge of lambda calculus, and possibly a Ph.D. in
linguistics. Without a quantum leap in editing technology and accessibility
for beginners, there's little hope for this to gain any traction.

~~~
memexy
Why do you need a PhD in linguistics to write code?

~~~
random04243
It's not just writing code, it's writing code that needs to be aware of every
linguistic nuance of your native language, so that you can coax the data to
come out as a human-readable sentence. [1]

[1]
[https://meta.wikimedia.org/wiki/Wikilambda/Examples](https://meta.wikimedia.org/wiki/Wikilambda/Examples)

~~~
Nemo_bis
What, didn't Chomsky already solve that in the 1960s? /s

------
tasogare
The research article doesn’t mention UNL a single time, despite it being a
really similar effort (encoding texts in an abstract representation, which is
generated and use by tools to translate automatically in various languages).
The hard part of the project is not encoding facts into cute little RDF
triples (that’s the super easy part, and as usual that’s where the SemWeb
researchers put their focus on), it’s generating natural language from the
abstract representation.

This means precise linguistics information must be present in the abstract
representation for generation of correct sentences. Spoiler: those seems
absent, and the renders presented in the paper are very basic. The data part
of the project seems OK, but I predict it would go well because the NLP is
largely ignored.

------
shultays
Wouldn't improving online translation tools achieve the same thing? And a much
more reasonable task perhaps (or perhaps not, I am not the expert)

[https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples](https://meta.wikimedia.org/wiki/Abstract_Wikipedia/Examples)

I used google translate on English sentences on English sentences here and
they outputted exact same German sentences. I feel like this is already a
somewhat solved problem

------
ppur
It's also interesting to note that, already, bots on some Wikipedias are the
largest contributors of articles in that language. The Swedish, Waray, and
Cebuano Wikipedias already have an estimated "between 80% and 99% of the
total" all written by one bot, Lsjbot [1].

[1].
[https://en.wikipedia.org/wiki/Lsjbot](https://en.wikipedia.org/wiki/Lsjbot)

~~~
renewiltord
I wonder if lsjbot has increased single contribution users. Wikipedia (or EN
Wikipedia anyway) gates article creation but not editing. If other Wikipedias
do that as well, then single edit users won't be able to create an article and
hence can't contribute. But if lsjbot has created the stub then people can
contribute.

------
jumelles
This is what I hope the future of Wikipedia looks like. If all "facts" are
stored in Wikidata and pulled from there by individual articles, it would be
simple to keep things up to date. I'd love to see Wikidata grow to encompass
all sorts of things - citations would be especially interesting and it could
potentially solve the problem of an article citing the same source multiple
times.

------
ComodoHacker
I don't quite get what problem it's trying to solve. Save labor? Improve
factual consistency across languages?

~~~
20after4
"Knowledge Equity."

To increase the availability of knowledge for speakers of less popular
languages. Once encoded in Abstract form, it can be made available in every
human language.

That is an improvement over the current situation where knowledge is
concentrated in just a few of the most popular languages.

------
aaron695
This has been tried and failed many times before.

Why is this different?

What is the fundamental structural difference that will allow this to work?

~~~
ricardo81
There are many more English articles than any other language on Wikipedia,
even though there's more non-English speakers in the world.

To me, it seems this project will allow for at least "stub" articles in
essentially every-other-language which at the very least provides some basic
information about each entity to a reader in their preferred language.

------
fbarred
My first reaction was to look for a Wikipedia article for an overview of this.
I couldn't find one yesterday, but one was created today:

[https://en.wikipedia.org/wiki/Abstract_Wikipedia](https://en.wikipedia.org/wiki/Abstract_Wikipedia)

------
raptortech
AI Researchers: _heavy breathing_

------
xacky
Will this mean it will be like the bot generated Wikipedias (like the Cebuano
Wikipedia) except it will be done by a Wikidata powered template? It might
work for basic data-based facts like populations of villages but what about
more complicated statements.

------
gainsurier
It seems that it is the LLVM in Linguistics.

> Such a translation to natural languages is achieved through the encoding of
> a lot of linguistic knowledge and of algorithms and functions to support the
> creation of the human-readable renderings of the content

------
young_unixer
Might be a good idea, but the multilingual argument doesn't convince me one
bit. If this project is any useful, it won't be because of its multilingual
part.

Any person worth reading in STEM fields already knows English, and I don't
know why anyone would want to read Wikipedia in any language other than
English.

I'm Latin American. I used the Internet in Spanish in my early teens before
learning English, and it's a joke compared the English Internet. I don't even
like English from a grammatical and phonetical point of view, but trying to
cater to the non-English-speaking public seems like a waste of time in 2020.
Just learn English already if you don't know it, it will be a much better use
of your time than reading subpar material in another language.

~~~
rtpg
> Any person worth reading in STEM fields already knows English, and I don't
> know why anyone would want to read Wikipedia in any language other than
> English.

> Just learn English already if you don't know it

Some people are merely fine at English, or uncomfortable reading "casually" in
their second/third languages...

It's actually not unreasonable for someone to want learning content in their
native language. And there are loads of opportunities to try out new content
when people in different places are writing content in different languages,
with new angles and takes

For example the best intro to LaTeX is a book originally written in French[0].

And sometimes content just makes better sense in other languages because
primary materials will be in that language (if you had the choice, would you
rather read about the great Tokyo Fire in English or in Japanese?)

Sure, having access to English content is really important! But trying to have
multilingual content is normal.

[0]:
[https://www.latexpourlimpatient.fr/](https://www.latexpourlimpatient.fr/)

------
zelly
What's the point of this with the current high quality of state-of-the-art
machine translation? Don't we expect machine translation to surpass humans in
the near future?

People who are domain experts in various fields don't know how, don't care to,
and shouldn't code. They should just edit the articles in natural language.

A lot of the content of Wikidata isn't numbers and is natural language also,
so you'd still need to (machine?) translate it. But this time the machine
translation algorithm would not have the benefit of the long-term context from
the encompassing paragraph.

There are too many reasons why this is a bad idea. Almost makes me mad.

~~~
_-___________-_
Where is the high quality machine translation? I spend most of my time in
countries where I don't speak the same language as the majority of people and
text that I encounter, so I am using machine translation many times per day.
My experience of the average quality of machine translation is _extremely
low_. It garbles meaning a majority of the time, and in a significant minority
of cases destroys meaning completely.

To me the idea that you could translate an encyclopedia, where accuracy of
meaning is critical, using such technology in its current state is horrifying.
By contrast the abstract/semantic approach seems to have some potential,
although I can't imagine it working well for all articles.

------
somerandomboi
I’m pretty interested in this actually. Although I’m not part of the Wikidata
community, it would be interesting to see which language groups dictate the
most involvement.

------
microcolonel
As a regular user of at least four Wikipedias, this seems like a very
attractive direction. Interested to see whether it produces the outcomes it's
designed for.

------
polyterative
holy fuck

------
iandanforth
Extremely high barrier to entry. Less space than a Nomad. Lame.

------
sukilot
> Because Wikidata uses conceptual models that are meant to be universal
> across languages,

Shows a deep misunderstanding of how human language works.

------
lihaciudaniel
Reminder that wiki means quick, so when you read Wikipedia you only have
surface knowledge,

~~~
shadowgovt
As the set of information encoded into Wikipedia approaches the sum total of
human knowledge, there's no particular reason that needs to remain true.

~~~
adventured
> As the set of information encoded into Wikipedia approaches the sum total of
> human knowledge

An outcome Wikipedia will never get close to. They'll never reach 1% of the
way there. Closer to 0% of human knowledge gets recorded, rather than closer
to 1%. Of the knowledge that is recorded, a small fraction of it will end up
on Wikipedia. Most of what gets recorded is universal or widely experienced
knowledge, which is a miniscule subset of "the sum total of human knowledge."

Wikipedia has already begun to stagnate badly. For the most part, it's over.
That's why they're attempting Abstract Wikipedia now (aka another round of the
failed insular semantic Web for elite Wiki nerds that won't accomplish much of
anything for the average reader that wants to learn something); and it's why
Wikimedia wants to rebrand itself to Wikipedia; and it's why their system is
being overtaken by partisan politics (as momentum continues to decline the
system will rot and pull apart in various negative ways at an accelerating
clip). The growth is running out, and the Wiki bureaucracy wants to keep
expanding, that's what this is about.

~~~
jborichevskiy
> They'll never reach 1% of the way there

> Closer to 0% of human knowledge gets recorded

> Wikipedia has already begun to stagnate badly

Anything more concrete you can link to for further reading on this? I
understand the difficulty in quantifying such claims and measures but I'd
appreciate reading something that attempts to do so objectively.

