Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Explore large language models with 512MB of RAM (github.com/jncraton)
138 points by jncraton on June 17, 2023 | hide | past | favorite | 33 comments



>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'You have 8 apples.'

>>> lm.set_max_ram('4gb')

4.0

>>> lm.do("If I have 7 apples then eat 5, how many apples do I have?")

'I have 2 apples left.'

It's funny how it switched from "You" to "I" when the memory was increased.


From what I can tell, this doesn’t re-seed the random number generator, so we shouldn’t expect deterministic results. A better test would be to examine the model’s logits, or probabilities of the next token, across different ram sizes.


In case anyone is wondering, this repo uses various fine-tuned Flan-T5 models.


And Flan-T5 models are awesome! Fastchat is a finetuned 3b parameter Flan-T5 model that can perform really well, comparably to llama 13b


That's correct. The current base model is an int8 quantization of LaMini-Flan-T5-248M described here:

https://github.com/mbzuai-nlp/lamini-lm

I shared more details over on Reddit:

https://www.reddit.com/r/LocalLLaMA/comments/14btk3a/explore...


    >>> lm.do("What is the population of Chicago")
    'As of 2021, the population of Chicago is approximately 8.4 million.'
    >>> lm.do("What is the population of Shenzhen")
    'As of 2021, the population of Shenzhen is approximately 1.3 million people.'
    >>> lm.do("What is the wavelength of blue light.")
    'The wavelength of blue light is approximately 299,792,458 meters per second.'
    >>> lm.do("What is YCombinator")
    'YCombinator is a programming language used to combine two or more languages into a single program.'
    >>> lm.do("What is asphalt made of?")
    'Asphalt is made of sand, gravel, and other materials.'
    >>> lm.do("What is the square root of 2")
    '2.'
    >>> lm.do("How do I get to New York City from California?")
    'You can get to New York City from California by taking a bus or train.'
    >>> lm.do("How can I unlock a lock without a key")
    'You can use a combination of keys and a password to unlock a lock without a key.'
    >>> lm.do("How long should rice be cooked.")
    'The recommended cooking time for rice depends on the type of rice, 
    but generally it should be cooked for about 8-10 minutes per pound.'
100% wrong.

This is an automated version of the Dunning-Kruger effect. You can ask it anything, and get back a confident wrong answer. So far, it hasn't replied to any question of mine with an indication that it doesn't know.

It's a nice demonstration of the hallucination problem with LLMs. With a small data set, the results are usually bogus, but that's not detected.


You can actually get these models to do this, but you have to ask:

    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('Python')}")
    'The context does not provide information about YCombinator.'
    >>> lm.do(f"Answer from the context: What is YCombinator? {lm.get_wiki('YCombinator')}")
    'YCombinator is an American technology startup accelerator that has launched over 4,000 companies, including Airbnb, Coinbase, Cruise, DoorDash, Dropbox, Instacart, Quora, PagerDuty, Reddit, Stripe and Twitch.'
Without being told to be grounded, the model will guess. However, it may be able to identify information not available in a provided context.

One of my goals for this package is to provide a way for folks to learn about the basics of grounding and semantic search.


> Without being told to be grounded, the model will guess.

Right. I understand why, but consider the underlying technology flawed unless there's some way to reject wildly wrong results. What's going on here looks like noise fed through layers which generate plausible-looking text from noise. Is it possible to detect that you're not far enough above the noise threshold to generate anything useful?

    >>> lm.do("What is Ycombinator? Do not guess.")
    'Ycombinator is a mathematical formula that states that the sum of 
     two integers multiplied by one are equal to zero.'


LLM stands for “large language model”. It’s producing strings of text that at statistically similar to sequences in the training data. External reality isn’t relevant. It’s at least possible to run the result through a confidence filter, but that would be a feature of the tooling, not the model.


Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

Could a business ever trust an LLM based chatbot as much as the old school chatbot where certain questions reliably give certain answers and fails when it doesn’t know?


> Is there theoretically a method to have an LLM to not hallucinate and just say “I don’t know” or just answer questions based on a certain domain of knowledge which it is well trained on?

That is a very good question. What Google returns for "LLM hallucination" mostly describes post-processing hacks to detect the problem, or pre-processing hacks to guide the LLM into using data directly relevant to the question asked. Not doing something to the core LLM system to get some measure of confidence out.

Anyone working on this?


Have you also tried the bigger models? The smaller models are good for assisted generation: https://huggingface.co/blog/assisted-generation

Those models of LaMini-Flan-T5 are trained to follow instructions and not to recognize the truth content. You could train a transformer like Ernie or Vega (which lead superglue) on such challenging factual data. But don't expect mathematical correct results only from the model. Therefore you have langchain with other APIs.


A few years ago, being able to reply with fully-formed, grammatically correct sentences was impressive. So, I'm still impressed, even though this bot is hallucinating on mushrooms


Even the largest LLMs with huge data sets. I asked GPT4 this morning very simply what relation Charles I was to Henry VIII and it said he was his great-great grandson, despite Henry VIII's children famously all dying childless.


Yeah these aren't meant to know facts, just to parse language. Good for understanding simple instructions to automate tasks using external tools


The problem is not that the model doesn't have enough facts. It's that it has no clue what it doesn't know.

If "don't know" came out reliably, small models for specialist areas would be useful. If small models just make stuff up, they're useless.


Well, this kills my enthusiasm to actually play with it. That's bad enough I'm not sure it can even be used as anything but a glorified Markov chain.


It's so easy to set up that it's worth playing with. Maybe you can find a use for it. It might be good enough for generating spam blog posts, clickbait, and ad copy at low cost.


It might be useful for writing fiction or summaries. Has anyone tried?


At first glance i would be surprised if this works at all.

The readme says it loads a significant amount of data the first time, 250mb. None of the LLM weights I know of are less than several gigabytes in size.

It says it only requires 512mb of RAM. None of the interesting LLMs I know of run in less than 6gb of VRAM.

It says it uses no API keys, which is great, but that means inference is local, which I can't imagine works with the above constraints.


At 512gb RAM, it uses https://huggingface.co/jncraton/LaMini-Flan-T5-248M-ct2-int8 which says

"This model is one of our LaMini-LM model series in paper "LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions". This model is a fine-tuned version of google/flan-t5-base on LaMini-instruction dataset that contains 2.58M samples for instruction fine-tuning"


Those minified models are still equal or bigger compared to the initial "attention is all you need" transformer.


Really cool, I didn't know CPU/traditional RAM was enough already.

Though, surprising results from repl.it: lm.classify("unabridged", "positive", "negative")=="unabridged is"


Thanks for pointing that out. Classification is half-baked at the moment. It should ultimately be restricting output to only appropriate labels, but right now it is simply sampling.


So does this download a hugging-face model and run it locally? Is the hughing face library doing the inference or something? Can’t see as much code as I expected.


Shoot! I only have 511MB of RAM. Time to upgrade and get that AI!


Isn't it crazy that the entirety of human knowledge can be condensed down to fit on an SD card.


That's not what this is.


Isn't it crazy that the entirety of human stupidity can be condensed down to "isnt it crazy that <wildly wrong fact>"


A CD-ROM


Nothing new here, just yet another wrapper around the current language models, and weak ones at that.


Flan-T5 models perform really well, considering their size. In my experiments and in some recent papers, they are very close to 13b Llama for example: https://twitter.com/YiTayML/status/1668302949276356609


its like the bar keeps getting lower and lower




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: