From what I can tell, this doesn’t re-seed the random number generator, so we shouldn’t expect deterministic results. A better test would be to examine the model’s logits, or probabilities of the next token, across different ram sizes.
>>> lm.do("What is the population of Chicago")
'As of 2021, the population of Chicago is approximately 8.4 million.'
>>> lm.do("What is the population of Shenzhen")
'As of 2021, the population of Shenzhen is approximately 1.3 million people.'
>>> lm.do("What is the wavelength of blue light.")
'The wavelength of blue light is approximately 299,792,458 meters per second.'
>>> lm.do("What is YCombinator")
'YCombinator is a programming language used to combine two or more languages into a single program.'
>>> lm.do("What is asphalt made of?")
'Asphalt is made of sand, gravel, and other materials.'
>>> lm.do("What is the square root of 2")
'2.'
>>> lm.do("How do I get to New York City from California?")
'You can get to New York City from California by taking a bus or train.'
>>> lm.do("How can I unlock a lock without a key")
'You can use a combination of keys and a password to unlock a lock without a key.'
>>> lm.do("How long should rice be cooked.")
'The recommended cooking time for rice depends on the type of rice,
but generally it should be cooked for about 8-10 minutes per pound.'
100% wrong.
This is an automated version of the Dunning-Kruger effect. You can ask it anything, and get back a confident wrong answer. So far, it hasn't replied to any question of mine with an indication that it doesn't know.
It's a nice demonstration of the hallucination problem with LLMs. With a small data set, the results are usually bogus, but that's not detected.
You can actually get these models to do this, but you have to ask:
>>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('Python')}")
'The context does not provide information about YCombinator.'
>>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('YCombinator')}")
'YCombinator is an American technology startup accelerator that has launched over 4,000 companies, including Airbnb, Coinbase, Cruise, DoorDash, Dropbox, Instacart, Quora, PagerDuty, Reddit, Stripe and Twitch.'
Without being told to be grounded, the model will guess. However, it may be able to identify information not available in a provided context.
One of my goals for this package is to provide a way for folks to learn about the basics of grounding and semantic search.
> Without being told to be grounded, the model will guess.
Right. I understand why, but consider the underlying technology flawed unless there's some way to reject wildly wrong results. What's going on here looks like noise fed through layers which generate plausible-looking text from noise. Is it possible to detect that you're not far enough above the noise threshold to generate anything useful?
>>> lm.do("What is Ycombinator? Do not guess.")
'Ycombinator is a mathematical formula that states that the sum of
two integers multiplied by one are equal to zero.'
LLM stands for “large language model”. It’s producing strings of text that at statistically similar to sequences in the training data. External reality isn’t relevant. It’s at least possible to run the result through a confidence filter, but that would be a feature of the tooling, not the model.
Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?
Could a business ever trust an LLM based chatbot as much as the old school chatbot where certain questions reliably give certain answers and fails when it doesn’t know?
> Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?
That is a very good question. What Google returns for "LLM hallucination" mostly describes post-processing hacks to detect the problem, or pre-processing hacks to guide the LLM into using data directly relevant to the question asked. Not doing something to the core LLM system to get some measure of confidence out.
Those models of LaMini-Flan-T5 are trained to follow instructions and not to recognize the truth content. You could train a transformer like Ernie or Vega (which lead superglue) on such challenging factual data. But don't expect mathematical correct results only from the model. Therefore you have langchain with other APIs.
A few years ago, being able to reply with fully-formed, grammatically correct sentences was impressive. So, I'm still impressed, even though this bot is hallucinating on mushrooms
Even the largest LLMs with huge data sets. I asked GPT4 this morning very simply what relation Charles I was to Henry VIII and it said he was his great-great grandson, despite Henry VIII's children famously all dying childless.
It's so easy to set up that it's worth playing with. Maybe you can find a use for it. It might be good enough for generating spam blog posts, clickbait, and ad copy at low cost.
At first glance i would be surprised if this works at all.
The readme says it loads a significant amount of data the first time, 250mb. None of the LLM weights I know of are less than several gigabytes in size.
It says it only requires 512mb of RAM. None of the interesting LLMs I know of run in less than 6gb of VRAM.
It says it uses no API keys, which is great, but that means inference is local, which I can't imagine works with the above constraints.
"This model is one of our LaMini-LM model series in paper "LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions". This model is a fine-tuned version of google/flan-t5-base on LaMini-instruction dataset that contains 2.58M samples for instruction fine-tuning"
Thanks for pointing that out. Classification is half-baked at the moment. It should ultimately be restricting output to only appropriate labels, but right now it is simply sampling.
So does this download a hugging-face model and run it locally? Is the hughing face library doing the inference or something? Can’t see as much code as I expected.
'You have 8 apples.'
>>> lm.set_max_ram('4gb')
4.0
>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")
'I have 2 apples left.'
It's funny how it switched from "You" to "I" when the memory was increased.