Hacker News new | past | comments | ask | show | jobs | submit login
Launch Lamini: The LLM Engine for Rapidly Customizing Models as Good as ChatGPT (lamini.ai)
123 points by sharonzhou on April 28, 2023 | hide | past | favorite | 70 comments



Congrats! I went to your demo and asked for words that end in agi. This is what I got:

--

agi, agi, agi, agi, agi, agi, agi

These are some of the words that end in agi. You can also use the word agi in a sentence. For example, "I am going to the grocery store to get some agi."

These are some of words that end in agi.

These are some words that end in agi.

maximize, maximize, maximize, maximize, maximize, maximize, maximize, maximize

These are some words that ends in agi

--

So I think this needs more work to get to "as good as ChatGPT". But having said that, congrats on the landing


It's a fair criticism, and ChatGPT does better, but this isn't a great test of model quality. All LLMS that rely on tokenization struggle with being introspective on language. Try asking chatGPT to count how many e's are in a sentence, or to list all words that start with "to" and end wide "de".

I haven't heard anyone describe the phenomenon clearly, but I expect it is a challenge with reasoning over both intent of the prompt and specific token IDs.


You can't ask ChatGPT to count something and expect that it can answer correctly, because it does not have counting logic. It is a language model, not a math model. People use this to "prove" hallucinations, but when you ask it something that is within it's programmed abilities, you get something at least close to what you want.

Having said that, here are the words ChatGPT gave me for the same prompt:

Magi Nagi Sagi Yagi Adagi Galagi Tegagi Sigikagi Tagi Wagagi

It missed Unagi, surprisingly. But it is still leagues ahead of the response primordialsoup got from Lamini.


It's true that ChatGPT is not designed for counting and struggles with it in general.

But my point was that ChatGPT, like any tokenized LLM, doesn't even have the concept of letters. The prompt "how many e's in this sentence" is rendered as the tokens [4919, 867, 304, 338, 287, 428, 6827]. There just isn't a pathway for it to consider the letters that make up those tokens.

I'm a little surprised it did that well on your prompt, which is rendered as [10919, 2456, 886, 287, 556, 72]. The interesting thing here is that 556 = " ag" (with leading space) and 72 = "i". So I'm not sure how got to those words. "Wagagi" is tokens [54, 363, 18013], so somehow it is seeing that token 18013 is what you get when you combine 556 and 72? That seems really weird.

I'd love clarification from someone deeper into LLMs and tokenization.


This is an excellent question. I wonder if it's something like [1] on letter composition rather than meaning.

[1] https://arxiv.org/pdf/1810.04882.pdf


In a prompt, can you just tell the model which letters make up each token? Eg a list of ag = a g etc. I imagine a dictionary of that for all tokens in the training data would help.


Maybe? Individual letters are tokens, so you could say something like 3128 = 56 + 129, but the problem is that 3128 is processed as text, not the integer token ID. So the tokenizwr would turn 3128 into a series of tokens.

Intuitively I think there's an abstraction barrier there, but I'm not positive. It feels like asking us to list all of the words that trigger particular neurons.


Chat GPT does have counting logic. The math model is encoded inside of the language model.


This needs citation. These are not the same things. It will get numerical references right if it has sources used in the model, but it isn't doing any numerical calculations.


Just look at any papers that put models through mathematical benchmarks. The model isn't memorizing these problems. For example I just generated 2 random 64 bit integers and asked ChatGPT to add them.

"6769545085823578960 + 16027170449476717488"

ChatGPT said the answer is 22796715535300296448. It got the correct answer even though the problem wasn't in its training data.


Yep, as always, people (and LLMs) take stuff for granted because they read it somewhere months ago. That’s why we are doomed; everyone believes anything without question if it’s not against their personal agenda.


This need citation :) It does numerical calculations, at least in GPT-4 mode, tested. It can do simple arithmetic, and even has sort of 'imagination', or impression of it. I asked it to imagine a room with 4 colored balls at the corners. Then asked about the view angles between some pairs of balls as if looking from the center of the room, and from other balls. It gave the answers with explanations.

This doesn't mean it's always correct, or can be trusted without verification.


I can feed ChatGPT code that does calculations (and have) and have it calculate the right answers. It also gets it wrong a lot, so it's not good at that, but any notion that it can't do numerical calculations is easy to disprove.


well, here we go, ChatGPT in GPT-4 mode:

There are 7 instances of the letter 'e' in the sentence: "Try asking chatGPT to count how many e's are in a sentence."

another one:

The words with the letter 'e' from the sentence "Try asking chatGPT to count how many e's are in a sentence" are:

    asking
    sentence
and another, notice the last one:

Here are some English words containing three instances of the letter 'e':

    Nevertheless
    Extreme
    Relevance
    Precedence
    Residence
    Easement
    Demeanor
Please note that this is not an exhaustive list, but these examples should give you an idea of words with three 'e's in them.

What surprising here is that it's still capable of writing hundred lines of python code..


On the last list, the only word that does not comply with the constraints (having 3 'e's) is "Demeanor", which has only 2. Not great but also not as horrible as you make it sound.


Nevertheless, it got 2 out of 7 wrong, not just one.


"Nevertheless" does have 3 'e's. In fact, it even has one more than 3. So does "precedence".

It was never specified "exactly 3".


It's not a character based model (likely - although it's closed source so anything is technically possible behind the scenes) so this makes some sense. The system can infer some relationships, which may be why 'agy' is conflated with 'agi' interestingly, but the tokenization process yields sequences of 'symbols' or indexes that are decided to English - so the system has a more difficult task when asked about 'e's (probably something like token 4893) and has to determine which tokens (e.g. [358,284840, 58292, 4830104, 57282, 4829193, 58282, 384, 24945] contain 'e's or token 4893). None of them do directly it seems - but 58292 may be 'ee' - so you would get this wrong as well.


The problem is that these models do not have any working memory they could use to carry out such tasks, which are on a meta-level when seen from a language perspective. They can only go with their 'gut instinct' for selecting the next word, they can't 'consider and ponder the problem internally' first.


The problem is that the input is tokenized before the model gets it as input. It does not see the individual letters "t" + "o". It gets one single token, #1462. The word "toe" is another single token, #44579. Maybe over time it could learn from context that inputs that start with #44579 also satisfy the constraint of starting with #1462, but that's a lot of work and it's not going to happen for all combinations of letters.


Perhaps prompting the model to first describe its approach to answering the question. This type of chain-of-thought technique can yield better results.


GPT-4 did just fine when I asked it to name words that end in agi so I don't think your argument holds


yeah as usual these model can barely sustain a conversation and fall apart the moment actual instructions are given. typical prompt they fail to udnerstand:

"what is pistacchio? explain the question, not the answer."

all these toy llm: "pistacchio is..."

gpt is the only one that consistently understand these instructions: "The question "what is pistachio?" is asking for an explanation or description of the food item..."

this makes these llm basically useless for obtaining anything but hallucinated data.


It only makes them useless.of you insist on asking them in ways you already know will provide bad results instead of adapting your prompts.

This is a bit like complaining that your compiler refuses to produce the right outputs for code you've already determined is incorrect.


Asking LLM from things they learned in training mostly result in hallucinations and in general makes you unable to detect by which amount they are hallucinating: these models are unable to reflect on their output, and average output token probability is a lousy proxy for confidence scoring their results.

On the other hand, no amount of prompt engineering seems to make these LLM able to do question and answer over source documents which is the only realistic way by which factual information can be retrieved

You're welcome to bring examples of it tho if you're so confident.


I've had ChatGPT build a fire nctioning website, write a DNA server, fill in significant portions of specs, all without the problems you describe. I'm never going back to doing things from scratch - it's saving me immense amounts of time every single day. The only reasonable conclusion is that the way you're promoting it is counterproductive.


Good thing then that I specifically mentioned gpt as being able to follow instruction and that I was specifically mentioning the other models.

You're welcome to demonstrate the same ability on other models tho.


You can get useful results out of a whole lot of them as long as you actually prompt them in a way suitable for the models. The point I made originally was that if you just feed them an ambiguous question, then sure, you will get extremely variable and mostly useless results out. Ironically,

And I mentioned ChatGPT because from context of your comments here it was unclear on first read-through what you meant. Maybe consider that it's possible your prompting is not geared for the models you've tried.

Not least, specifically given that if you expect a model to know how to follow instructions, when most of them have not been through RLHF you're using them wrong. A lot of them needs prompt shaped as a completion, not a conversation.


you're welcome to provide examples to prove your points.


I have nothing to gain from spending time testing models for you because whatever I pick will just seem like cherry picking to you, and it doesn't matter to me whether or not you agree on the usability of these models. They work for me, and that's all that matters to me. Try a a few completions instead of a question. Or don't


Thats an interesting test. Here's what I got from ChatGPT:

---GPT-3.5---

Here are some words that end in "agi":

Strategy

Swarajya

Arthroplasty

Sialagogue

Podagric

Gynecology

Physiognomy

Ophthalmology

Esophagitis

Otalgia

--- GPT-4 ---

Here are some words that end in "agi":

Swaggy

Raggi

Magi

Gagi

Stagi

Please note that some of these words may not be commonly used or may be specific to certain dialects or regions.


Stagi isn't a word (unless you count Lojban). Gagi isn't a word unless you could Filipino slang.


Could you not have at least Googled the word before speaking against it?

https://www.google.com/search?q=stagi

There are a lot of genuine hits for stagi.


those are either family or brand names. I don’t see it used as a common word in any of the results.


Proper nouns are words. That answers the prompt, right? There was no mention of "common" in the prompt. If there was, the list of words I got with the same prompt on ChatGPT would have been a lot shorter.


To be fair the question did not specify the language and included a disclaimer about it.


Even if you don't consider an Italian word as a word: It's a last name. It's a brand name. It is several companies' name.

It belongs in the list just fine.


Then the word rhamanagagi (which I just made up) is a word that would technically belong to the list just fine, it definitely not answered to the implicit intent of the question.

The strength of LLM is their ability to answer to unprecisely specified questions, being able to guess the speaker's intent, but in this particular case, it's failing the test.


It seems you agree with me, I do not understand. Wrong thread maybe you replied to?


Yes. That was agreeing.


Hi HN!

I’m super excited to announce Lamini, the LLM engine that gives every developer the superpowers that took the world from GPT-3 to ChatGPT!

I’ve seen a lot of developers get stuck after prompt-tuning for a couple days or after fine-tuning an LLM and it just gets worse—there’s no good way to debug it. I have a PhD in AI from Stanford, and don’t think anyone should need one to build an LLM as good as ChatGPT. A world full of LLMs as different & diverse as people would be even more creative, productive, and inspiring.

That’s why I’m building Lamini, the LLM engine for developers to rapidly customize models from amazing foundation models from a ton of institutions: OpenAI, EleutherAI, Cerebras, Databricks, HuggingFace, Meta, and more.

Here’s our blog announcing us and a few special open-source features! https://lamini.ai/blog/introducing-lamini

Here’s what Lamini does for you: Your LLM outperforms general-purpose models on your specific use case You own the model, weights and all, not us (if foundation model allows it, of course!) Your data helps the LLM, and build you an AI moat Any developer can do it today in just a few lines of code Commercial-use-friendly with a CC-BY license

We’re also releasing several tools on Github: Today, you can try out our hosted data generator for training your own LLMs, weights and all, without spinning up any GPUs, in just a few lines of code from the Lamini library. https://github.com/lamini-ai/lamini/

You can play with an open-source LLM, trained on generated data using Lamini. https://huggingface.co/spaces/lamini/instruct-playground

Sign up for early access to the training module that took the generated data and trained it into this LLM, including enterprise features like virtual private cloud (VPC) deployments. https://lamini.ai/contact


Im confused, what are you actually offering? Does my fine tuning data get shared with your platform’? Does the model get fine tuned on your end or my own system? Do you host the model?


You’re building some seriously exciting stuff! Looking forward to diving in.


This headline is totally editorializing. Stick with the source one. “Introducing Lamini, the LLM Engine for Rapidly Customizing Models”

So much click bait in the LLM space.


Is it still editorialising when OP is the CEO of the company?


Yes


Trivial examples show that this isn't nearly as good as ChatGPT. The headline should be changed.


It took Open.ai 10 years of fine tuning, can’t expect things to work as well in day 1.


I like blog post title.

Introducing Lamini, the LLM Engine for Rapidly Customizing Models

Obviously it still takes a huge amount of work to customize a model to be as good as GPT4 or ChatGPT, that’s exactly why we are building Lamini.

To give developers tools to make it easier.

Hopefully it is clear that it will take more work than 1 day.


The actual post doesn't say "as Good as ChatGPT", why does the HN title?

I don't really care to click on something I know is obviously lying to me.


I've been playing a bit with stacking transformer adapters to add knowledge to models and so far it has met my needs. It doesn't have the same illusion of intelligence, but so far it's just as good as a multitasking intern, so I am still having fun with it. I wonder if this is basically doing the same thing.


Interesting. Do you know if this can be done with Sentence Transformers, too? Picking a good performing one from HF. Then training an adapter for the domain (unsupervised). Then adding another one using actual training triplets (base, similar, non-similar)?


I haven't done this with sentence transformers but I imagine it's possible since they can be loaded as regular transformers.

Check out https://github.com/huggingface/peft -- they've packaged it up nicely- and read up on LoRA (https://arxiv.org/pdf/2106.09685.pdf) That should get you started.


Thank you. Peft and adapters seem to be two different things though, no? AFAIK there are other libraries for adapters (forgot the name). Is peft what you were talking about when you said adapters in your original comment?


I was of the understanding that Lora was one flavor of adapters, but I am still learning so I may be wrong. I yet gotten too deep into other transformer adapters yet (still reading).


Very exciting! Glad to finally be able to get beyond prompt engineering. What's the pricing model like?


Free open source libraries.

Paid LLM hosting. 50% cheaper than OpenAI, pay per compute needed to run & create the LLM. Export the weights anytime you want.

Enterprise VPC deployments.


50% cheaper than OpenAI compared to what?


OpenAI models have wide ranges in prices, so indeed some clarity is needed


If I want to export the model and run it myself, can I do that?


Why wouldn’t we use something like DeepSpeed? It’s a one-click on Azure. What’s the value add?


I hope this turns out be as good as chatgpt and not "we have chatgpt at home"


Try SuperCOT 30B.


GPT at this point is more than an LLM, it is a baseline layer of logic using the underlying transformer technology. This will be challenging to replicate without the same size of data sets


Noting that the Github repo includes a data pipeline for instruction fine tunining.

What's the difference between this and other data pipelines like Alpaca?


Aren't you Greg Diamos, the founder, why are you asking this instead of answering?


This was a frequently asked question among my friends.

I’m really curious to see how someone who hasn’t been staring at the docs for weeks would explain it.


To HN, this looks like faking engagement, which is against the posting guidelines.

This is a question you should instead ask in a user interview, or at a minimum qualify when asking here as one of the people involved in the project.


Forgot to switch to sock puppet account.


looks great...looking forward to trying it out




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: