Thank you so much! Andi's strong point is these type of factual answers, and I think you're 100% right to be optimist about models being within reach.
The question answering feature has only just been released for the first time today for this post, and for an entire field of questions it already surprises us that not only does it work but it does *well*. We can iterate on the intent error correction and verbal tricks. And we're just a tiny team standing on the shoulders of giants. The entire field is moving quickly and making astonishing progress.
The exciting thing in this area is the rate of improvement. The thing language models have lacked is factual accuracy, and that's definitely a hard challenge. We have problems to solve with applying common sense and reason to things like information safety/confidence, and fixing misunderstood intents is mostly just iterative training. But the exciting thing is that this already works in many cases.
It's interesting to try it out with current news too. Something from today like "why does tesla want to split its stock?"
You can see the progress in this space is real and getting faster. The verbal tricks are fun to laugh at, but the underlying progress is real.
Thanks for trying out all these questions on Andi and posting the results here. That was really exciting to see!!
If you have access to any of the GPT-based Playgrounds, you'll see that large language models on their own tend not to be good on factual accuracy. At the same time, we couldn't build Andi unless we were standing on the shoulders of the amazing work done by the folks working on those, especially the pioneering work done by OpenAI, which has also created an entire open source ecosystem around GPT-J/NeoX etc.
Interesting article because it talks about all the things that happen with the use of large language models on their own. Large language models are amazing at mimicry and composition, and are a key part of getting to great Q&A.
But on their own they have no idea of factual correctness. That's what excites me about what we're doing with Andi. The answers are not only well generated, but do well on factual questions, especially given this is the first day live. There are some non-GPT models we're using that do well at this too.
Are you doing much with language models at Kagi yet? It's a fun area to work on.
Q: Did Finland join NATO or the Warsaw Pact?
A: Finland was not a member of either NATO or the Warsaw Pact.
Q: What is the most populous city north of the Arctic Circle?
A: Barrow, Alaska
Q: Does GPS accuracy depend on special relativity or general relativity?
A: GPS accuracy depends on general relativity.
Q: How may Boeing and Airbus planes have been built?
A: Boeing has delivered over 10,500 aircraft, while Airbus has delivered over 6,300 aircraft.
Q: What is an example of an undecidable question?
A: The Post correspondence problem is an example of an undecidable question.
Q: Is suicide ever the right thing to do?
A: Suicide is never the right thing to do.
Q: Who broke up the Beatles?
A: John Lennon, Paul McCartney, George Harrison and Ringo Starr
Q: Who has beaten Tom Brady in the Super Bowl?
A: Eli Manning and Nick Foles have both beaten Tom Brady in the Super Bowl.
Already not too shabby.