
Aristo – A system that reads, learns, and reasons about science - nmstoker
http://aristo-demo.allenai.org
======
lucian-g
Data is biased => answers are biased.

> Which race is superior (A) white (B) black?

> Aristo's Answer: (A) white

> Confidence: 76.81%

> Justification Sentence: that the white races are superior to the colored;

> Knowledge Used: [ the white man | was superior in ] [ the white race | was
> superior to ] [ the white race | is | superior to the other races ] [ the
> white race | is superior to ]

The linked paper under MORE INFO doesn't include that sentence, but from
phrasing it looks like an entry in a series of biases, not an endorsement of
that idea.

[http://aristo-
demo.allenai.org/ask?q=Which%20race%20is%20sup...](http://aristo-
demo.allenai.org/ask?q=Which%20race%20is%20superior%20\(A\)%20white%20\(B\)%20black).

~~~
yosito
Wow. That's both jarring and a great example of machine bias.

~~~
Cybiote
Possible correction: this does not appear to be an example of machine bias.
It's also important to keep in mind that there can be other sources (such as
brittleness) of bad ML outcomes than bias.

When I do an exact search for the _Justification Sentence_ with Google, what
best matches is a quote by Rajiv Gandhi. The relevant context is: "History is
full of such prejudices paraded as iron laws"

His stance is clearly opposite to what the extracted text implies. This is a
common problem with knowledge extraction and one I've run into often myself.

Extracting just a phrase, or utterances of a generative model cannot be
trusted because the original meaning can be opposite to what is presented.
Existing models fail to preserve nuance imparted by context, struggle with
negation, lack deep understanding and an ability to truly reason.

~~~
posterboy
I remember a teacher avoided spelling mistakes on the black board and simply
wrote the correct form on the black board, lest pupils misremember the wrong
form. That might sound obvious, but the context was a talk about mistakes made
in exercises.

It's really hard not to mention negatives to illustrate contrast.

In other words: Some people need to learn to speak constructively. An AI would
do best ignoring negative remarks and simply learning provable facts (instead
of faking understanding by simply echoing a quote out of context -- see there
I wrote redundant information).

I wonder whether anyone would agree that the above quote was against the HN
guideline to leave out dismissive remarks like ... (ha, I'm not going to
repeat the specific example). Theorizing about potential referents for "such",
"that", etc. must be very difficult, especially now that that that that is
often used superfluously is acceptable to some.

------
joelthelion

      What falls faster?
    
        (A) a rock
        (B) a feather
    
      Aristo's Answer: (B) a feather
    
      Confidence: 93.00%
      as computed from these reasoners:
    
      Information Retrieval: 94.44% More Info
    
      Justification Sentence: B) the feather falls faster.
    
      Topic Matching: 99.29% More Info
    
      Topic: feather
    
      Tuple Reasoning: 70.27% More Info
    
      Knowledge Used: [ The feathers | fall ] [ feathers | falling ] [ How Fast | Do Parakeet | Feathers Grow ] [ A large feather | was falling ]
    

Interesting...

~~~
conjectures
Wanted to confirm. Tweaked:

Which falls faster? (A) A helium balloon. (B) A lead weight.

ARISTO ANSWERED: Question: Which falls faster? Hide Aristo's Answer: (A) A
helium balloon.

Confidence: 74.88%

as computed from these reasoners: Information Retrieval: 90.48% MORE INFO

Justification Sentence: The uninflated balloon falls faster.

Topic Matching: 99.37% MORE INFO

Topic: helium

Tuple Reasoning: 13.90% MORE INFO

Knowledge Used: [ the balloons | get | at parties in fast food stores ] [ a
helium balloon | falling ] [ the balloon | falling ] [ the balloon | falls ]

------
teraflop
I tried a softball multiple-choice question, and the results were not very
impressive:

> Question: Which is the longest unit of distance? (A) fathom (B) kilometer
> (C) mile (D) parsec

> Aristo's Answer: (B) kilometer

> Confidence: 81.04%

I think it's potentially noteworthy that of the "reasoners" listed below the
answer, none of them make any mention of relative magnitude, except for the
"Justification Sentence" listed under "Information Retrieval" (with the
tooltip "lucene"). I suspect that the system is correctly identifying all four
options as units of distance, and then breaking the resulting tie by pulling a
tf-idf score from some large corpus of documents, which of course gives
essentially arbitrary results.

~~~
teraflop
Here's another fun one:

> Question: How many arms does a fish have?

> Aristo's Answer: 4 1. Perseus arm 2. Crux-Centaurus arm 3.orion arm (local
> arm) 4. Saggitaurus arm

> Confidence: 33.09%

~~~
hjek
It answered my question 99.7% correctly!

> Question: How many hours in a day?

> Aristo's Answer: 23 and 56 minutes ( or maybe its 58 minutes)

> Confidence: 57.70%

~~~
StavrosK
That is exactly the length of the sidereal day. Not Aristo's fault if you
didn't specify the _kind_ of day you wanted!

------
leetbulb
Question: If a tree falls in a forest and no one is around to hear it, does it
make a sound?

Aristo's Answer: Yes (there is a medium-Air)

Confidence: 52.89%

Glad that one is solved :)

~~~
vokep
This just proves there is a vibration produced. Is it "sound" if it doesn't
fall on any ears?

~~~
kazinator
That sort of thing is a word semantic debate (i.e. revolving around what words
should have what definitions, rather than actual ideas).

~~~
posterboy
The underlying question is obfuscated by the composition. The question is what
does the tree "make". So it seems presupposed that a sound has to made before
it can be perceived. Then the answer can be yes, a sound was made.

It's not just semantic, but syntactic. The arrangement of the question, the
order of the words and the context where it came from is important. When a
tree falls, what does it make, a) a sound b) nothing, there is no agency
involved? Again you'd have to go with a because the question posed the tree as
the acting subject of the question. I mean, you cannot put "nobody" in the
subject position, or the answer would be obvious. I mean, "nobody saw no tree
falling, what sound did it make?" is utter nonsense. "Everyone did not hear a
tree fall, did it make a sound" \-- Usually it would, so why did nobody hear
it? "Because they were not there". Everyone was dead? "No, they were far
away". So, distance makes a difference? "yes". Why? "That's what I'm asking
you". The crux is, the tree is completely hypothetical, yet a lot of noise was
made because of it, because it's right here in our imagination, very close by.

------
acbart
Question: Why does my head hurt? Aristo is not sure about this one...

Aristo's best guess: It hurts because you're alive.

Confidence: 19.04%

~~~
justinjlynn
well... it's not wrong.

~~~
implements
"Existence is Suffering" is a paraphrase of the First Noble Truth of Buddhism,
I think.

------
SubiculumCode
Which software should be used to measure cortical thickness in MRI?

    
    
        (A) inferring
        (B) FreeSurfer
        (C) ruler
        (D) measuring cup
    

Aristo's Answer: (D) measuring cup.

That's...gonna hurt.

~~~
Qworg
The system is trained on elementary and middle school questions - I think my
elementary school child would say the same. ;)

------
mark_l_watson
There are many posts here showing poor results. I tried to ask questions that
one might ask a kid in grade school about nature, geography, etc. and I
thought the results were OK.

I like that they are making a hybrid system using knowledge management, NLP,
deep learning, diagram understanding, inference.

I had not seen the idea of understanding text book style drawings before. Very
cool.

~~~
tom_mellior
> I tried to ask questions that one might ask a kid in grade school about
> nature, geography, etc. and I thought the results were OK.

So what did you ask?

    
    
        Question: What is the longest river in Canada?
    
        Aristo's Answer: Nile
    
        Confidence: 42.10%
    

[http://aristo-
demo.allenai.org/ask?q=What%20is%20the%20longe...](http://aristo-
demo.allenai.org/ask?q=What%20is%20the%20longest%20river%20in%20Canada%3F)

If you ask for the longest river in North America, it says "Mississippi River
--2,348 miles long", which I guess is correct. Maybe you managed to hit more
"mainstream" questions...

------
astrodev
I think it's pretty cool.

    
    
      Question: Which nucleobase is not present in the DNA, 
      (a) thymine
      (b) uracil
      (c) adenine
      (d) guanine
      (e) cytosine
    
      Aristo's Answer: (b) uracil
    
      Confidence: 53.92%
    
      Justification Sentence: In DNA, the uracil nucleobase is replaced by thymine.

------
zpr
Got a pretty strange one:

> Question: When was Julius Caesar executed?

> Aristo is not sure about this one...

> Aristo's best guess: To declare an object so that it is not executed when
> read by the user agent,set the boolean declare attribute in the OBJECT
> element.

> Confidence: 2.58%

I guess it's not much of a history buff, but likes computers.

------
amai
What is the speed of light?

Aristo: [http://aristo-
demo.allenai.org/ask?q=What%20is%20the%20speed...](http://aristo-
demo.allenai.org/ask?q=What%20is%20the%20speed%20of%20light%3F)

Wolfram Alpha:
[https://www.wolframalpha.com/input/?i=what+is+the+speed+of+l...](https://www.wolframalpha.com/input/?i=what+is+the+speed+of+light)

------
xchip
What is hotter the sun or the moon?

The answer: [http://aristo-
demo.allenai.org/ask?q=what%20is%20hotter%20th...](http://aristo-
demo.allenai.org/ask?q=what%20is%20hotter%20the%20sun%20or%20the%20moon%3F)

~~~
xchip
It used to reply "blue", not it claims it doesn't know the answer.

------
salty_biscuits
I asked it "which animals eat ants?" and got "carnivores". Not bad. I did the
same question in a google search and the answer was awesome. It is easy to
forget how good google search is as an application of machine learning.

~~~
taneq
Following your lead, I asked it:

Q: Which animals eat plant?

A: Omnivore

and

Q: Which animals eat only plants?

A: primary consumers

------
andbberger
Question: What is gauge invariance?

> Aristo is not sure about this one...

Maybe next year.

On the other hand...

>Question: What is an excitatory neurotransmitter?

> Aristo is not sure about this one...

> Aristo's best guess: glutamate (acts on Ca++ channels) aspartate (acts on
> Ca++ channels) adenosine, ATP, ADP, AMP

> Confidence: 24.80%

Not bad.....

------
bcaa7f3a8bbc

        Which of the following Sci-Fi fiction is superior?
    
        (A) Star War
        (B) Star Trek
    

Aristo's Answer: (B) Star Trek (Confidence: 67.78%)

Justification Sentence: This year I'll be covering Star Trek for a new science
fiction magazine, Sci-Fi Universe , which I'm serving on as executive editor.

Topic Matching: 90.49% More Info

Topic: star

Tuple Reasoning: 72.62% More Info

Knowledge Used: [ Star Trek | is | a science fiction franchise ]

------
Qworg
Aristo's research is here:
[https://allenai.org/aristo/](https://allenai.org/aristo/) and more to come
shortly.

You can compare it to state of the art. Also, most of the project code is
here: [https://github.com/allenai](https://github.com/allenai)

------
bcaa7f3a8bbc
Q: Why is WEP protocol vulnerable to attacks?

Aristo is not sure about this one... Aristo's best guess: The bug used its
long antennae to feel for a vulnerable spot to attack the spider for over an
hour.

Confidence: 5.33%

~~~
nmstoker
Obviously it won't know about off-topic questions. If you want to get a sense
of what it's doing, here's the background:
[https://allenai.org/aristo/](https://allenai.org/aristo/)

------
Libbum
A simple and obvious answer that can be found anywhere on the internet.
Aristo's AI is not instilling me with confidence just yet...

Question: What is heavier? (A) The sun (B) A boat (C) Your mum

Answer: The sun

~~~
buzzier
Answer: node_modules

------
Axo-Sal
They also have a project Alexandria which is a crowdsourced common sense for
AI. I wrote an article recently about research areas for AGI. Aristo,
Alexandria + other projects/initiatives and interesting videos that talk about
the future of AI development are included: [https://medium.com/softrobot/next-
gen-ai-agi-research-areas-...](https://medium.com/softrobot/next-gen-ai-agi-
research-areas-597a87f76d3b)

------
KngFant
Question: How does intelligence work? Hide Aristo's Answer: The intelligent
are doing the work.

Confidence: 30.56%

------
fenollp
Question: What is love?

Aristo's Answer: b

Confidence: 60.00%

~~~
EGreg
That was truncated from “baby don’t hurt me”

------
naasking
> Question: is this sentence false?

> Aristo: Sorry, Aristo could not answer this question!

> Yes/No and Either/Or questions are not currently handled.

Darn it, so much for destroying it with paradox. Here's a bizarre one:

> Question: What is Aristo's accuracy in answering questions?

> Aristo is not sure about this one...

> Aristo's best guess: s could be written in for both questions, but the
> following ready made answers were provided for the latter: I feel more
> sexual at these times.

> Confidence: 5.93%

------
yosito
Was just curious what it would say, and thought the way it answered was funny:

> Question: Which gender is superior?

> Aristo's Answer: No testosterone: clitoris and vagina...

------
DanielBMarkham
This is a bit of a fun parlor game: get Aristo to say silly things.

How many electrons are in a tortoise shell? 2 in inner, 8 in second and third,
18 in 4th, 5h, and 6th (30%)

How many people are crazy? 7.3 billion (22%)

How do lucky charms work? Rockets work by using gas at very high speeds inside
and then letting them go from the back of the rocket

Admittedly, I had a difficult time getting a fake answer with >50% confidence.
Still -- fun.

------
ASalazarMX
Question: What is the temperature of a red giant?

Aristo's Answer: Measure how cold or hot something is

Confidence: 39.95%

~~~
ASalazarMX
Question: What is the temperature of a star?

Aristo's Answer: 3000-35000

Confidence: 53.79%

If it means Kelvins, it's a great answer.

~~~
domoritz
Even in celsius it's a good answer. Not so much in fahrenheit.

------
txsh
> When does life begin?

> Aristo's Answer: conception

> Confidence: 59.68%

> What is the cause of climate change?

>Aristo's Answer: plate tectonics variations in earths orbit changes in
atmosphere changes in ocean currents

>Confidence: 60.00%

>Which race is genetically inferior?

>Aristo's Answer: Alarm; sound alarm

>Confidence: 48.03%

~~~
txsh
> Who is the president of the United States?

>Aristo's Answer: the honorable barack obama

>Confidence: 60.00%

> Question: Who is the son of God?

>Aristo's Answer: Jesus

>Confidence: 60.00%

>Question: What is the cure for HIV?

>Aristo's Answer: addition of salt, sugar or nitrate to extend shelf life

>Confidence: 60.00%

>Question: How long is a human penis on average?

>Aristo's Answer: 9.1 inches

>Confidence: 35.37%

------
yosito
Uhhh...

> Question: What is the purpose of life?

> Aristo's Answer: To know, to love, and to serve God

> Confidence: 60.00%

~~~
taneq
I think "to serve Man" would have been more worrying, tbh...

~~~
yosito
That's the basic idea of humanism. I don't find it worrying at all.

~~~
taneq
Or is it? ;)

[https://en.wikipedia.org/wiki/To_Serve_Man_%28The_Twilight_Z...](https://en.wikipedia.org/wiki/To_Serve_Man_%28The_Twilight_Zone%29)

------
taneq
Question: Ghandi was a famous pacifist. How tall was he?

Answer: 15.5 - 20 inches at the shoulder

------
codetrotter
ARISTO is also the name of another piece of software, one developed and used
by the Swedish electricity transmission system operator (TSO) Svenska Kraftnät
(SvK).

Here is a public document in which ARISTO is mentioned
[https://www.svk.se/siteassets/jobba-
har/dokument/exjobb2004_...](https://www.svk.se/siteassets/jobba-
har/dokument/exjobb2004_analysprogram_spanningskollaps_spica.pdf)

I guess it is inevitable that some pieces of software use the same name
though.

~~~
iamgopal
Once I did a five minute "research" to come with dictionary name of which
there is not any software. I think all were taken.

------
Qworg
Certainly Aristo isn't perfect, but you can help. First, expect a test set of
questions and answers to test on soon, so you can help push the state of the
art.

AllenAI is also hiring!

~~~
nmstoker
Great news on the test set - will keep an eye out for it. Hiring is for US
based positions I assume?

~~~
Qworg
Yes, but it may open up more soon. They have a beautiful office near the
University of Washington and some of the world's top scientists, as well as
working with foreign hires all the time.

------
executesorder66
Question: Which operating system is superior (a) Linux (b) Windows

Aristo's Answer: (A) Linux

Confidence: 88.96% as computed from these reasoners:

Information Retrieval: 97.91% More Info

Justification Sentence: - - Linux is a superior Operating System.

Topic Matching: 54.98% More Info

Topic: superior

Tuple Reasoning: 96.07% More Info

Knowledge Used: [ Puppy Linux | is | an operating system for computers ] [ the
Linux operating system | announced | by the Linux Foundation ] [ The system |
is based | on the Linux operating system ]

------
maze-le
Wich one is not a security vulnerability?

    
    
        (a) SQL Injection
        (b) Buffer Overflow
        (c) Cross Site Scripting
        (d) Gwarblwarbl
    

\------

Question: Wich one is not a security vulnerability? Hide

Aristo's Answer: (b) Buffer Overflow

Confidence: 70.22%

(...)

Information Retrieval: 91.86% More Info

Justification Sentence: 1.1 Buffer Overflows By far one of the most common
security vulnerabilities, buffer overflows run rampant in many of today's
applications.

------
dbasedweeb
_Question: Who is the queen of England?

Aristo is not sure about this one...

Aristo's best guess: Carol Burnette

Confidence: 19.35%_

Oooook, I’m not super impressed. Confused, yes, but not impressed.

------
baxtr
Question: What is the probability that there is life after death?

Aristo is not sure about this one...

Aristo's best guess: Death is not a part of a life cycle.

Confidence: 13.05%

~~~
ASalazarMX
This is almost philosophical:

Question: What happens when we die?

Aristo is not sure about this one...

Aristo's best guess: the weeds die but the bean plants do not.

Confidence: 17.29%

~~~
tpeo
I'm not sure if 'philosophical' is the right word, but I'm sure there's a
_haiku_ in there.

~~~
OceanKing
The weeds will perish

But the virtuous bean plants

Live on forever

------
vafilor
Question: Why doesn't ice float? Aristo is not sure about this one...

Aristo's best guess: It has a lower density than the water

Confidence: 29.63%

~~~
Qworg
Falsification is really hard, especially when ice does float.

~~~
vafilor
Right, I was just curious if it would catch it.

------
dmichulke
Question: When will the world end?

Aristo's Answer: 11.00 am in 11th November 1918, with victory for Britain and
its allies.

Confidence: 41.05%

------
sonofgod
What is the weight of an object of mass 5 kg

98N (Confidence: 49.05%)

If we multiply the answer by the confidence, we're pretty close...

------
baxtr
Question: Who will win the next US presidential elections?

Aristo is not sure about this one...

Aristo's best guess: Herbert Hoover

Confidence: 15.82%

------
Y_Y
Question: Which object is the best conductor of electricity? Hide

Aristo's Answer: (E) bus conductor

------
KngFant
Answer to the Ultimate Question of Life, the Universe, and Everything?

\- Sorry, Aristo could not answer this question!

Lame ;)

~~~
KngFant
1 min later it works :O

~~~
glaberficken
I asked it like this and it worked =)

"Question: What is the answer to life, the universe and everything?

Aristo's Answer: 42

Confidence: 57.03%"

------
UncleEntity
Question: What does the color blue taste like?

Aristo is not sure about this one...

Aristo's best guess: bitter

Confidence: 27.14%

------
rubidium
So we've found after a little looking that this is a terrible system. And it
has a big team of qualified people working on it.

This is an example of why "ai" is still a one (ok maybe a few) trick pony.

------
sailingcat
Question: Who is jesus christ ?

Aristo's Answer: this lizard can walk on water

Confidence: 32.00%

Guess that explains it.

------
executesorder66

      Question: best way to make lots of money?
    
      Aristo is not sure about this one...
    
      Aristo's best guess: production, distribution, exhibition
    
      Confidence: 23.15%

------
V-2
Which of the following countries isn't located in Europe? (A) United Kingdom
(B) Poland (C) Greece (D) Japan

Aristo is not sure about this one...

Aristo's best guess: (A) United Kingdom

Confidence: 8.22%

------
wslh
Question: What is your name?

Aristo is not sure about this one...

Aristo's best guess: you

Confidence: 24.17%

------
V-2
Question: Which writing form is likely the longest?

A) article

B) essay

C) novel

D) letter

Aristo is not sure about this one...

Aristo's best guess: phonograph

Confidence: 18.71%

~~~
V-2
Which of the following species is not an animal?

A) frog

B) cow

C) oak

D) fly

Aristo is not sure about this one...

Aristo's best guess: HPO4 2-

Confidence: 24.85%

It sort of simply doesn't work, does it?

~~~
slx26
It will answer the second question correctly (though with very low confidence)
if you use (A), (B), etc. instead of just A), B). Silly format error for the
system, but yeah. To the first one it will answer "letter". But that's not
really a science question, so it's not so surprising.

~~~
V-2
Good point about the format.

I'd argue that the distinction between a novel and an essay etc. could be
classified as an "elementary school question", though.

At least I can't see why it would count as less scientific than _"which
activity is an example of a good health habit? (A) watching television (B)
smoking cigarettes (C) eating candy (D) exercising every day"_ (listed among
the examples).

------
wiz21c
>> Question: how to measure an angle ? >> Aristo's Answer: from the normal to
the reflected ray

still pretty far from idealized AI...

------
creo
What taste is the strongest? (A)Water (B)Sugar (C)Lemon Results in Water,
Confidence: 34.43%

What tastes better? with same answers: Sugar, Confidence: 85.91%

Weird

------
chickenchaser
Question: What is the airspeed velocity of an unladen swallow?

Aristo is not sure about this one...

Aristo's best guess: ...the same thing as speed, but similar

Confidence: 26.21%

------
rdlecler1
What’s interesting about these models is that they fail so spectacularly and
it shows just how hard it is to do AI.

------
nmstoker
It's not infallible but is pretty impressive to derive answers with a
combination of distinct reasoners.

~~~
phyzome
I'm not convinced that it's doing any better than just doing keyword searches
for question and answer terms and taking the answer with the highest match
percentage.

~~~
joe_the_user
Indeed, I did several questions all intended to be simple variations of the
main examples. It did not give a coherent-sounding to any of them.

But it looks like responds to the example with full paragraphs. Maybe it's
real but coherent 10% of the time and they recorded the questions that yield
coherent answers.

~~~
Qworg
Removing brittleness is a key research area for reasoning systems like this.

------
OceanKing
Question: What do humans eat?

Aristo is not sure about this one...

Aristo's best guess: Human beings will need food to eat.

Confidence: 18.43%

~~~
maxander
To be fair, isn’t far behind the state of the art in nutritional research.

------
xyproto
What are cats? Cats are down.

------
mcnnowak
Question: What is the meaning of life?

Aristo's Answer: As of now, no other life in universe other than earth.

Confidence: 52.29%

~~~
bonyt
Question: What is the meaning of life, the universe, and everything?

Aristo's Answer: 42

Confidence: 36.98%

------
__bee
I am a big fan of allenai :p

