Hah, the model is pretty good, but the material does sound like an American politician’s hype team. To be fair I think google and OAI are doing the same thing all the time:
“As a large language model, I can explain. I and my relative were trained at the bestest universities, only in superlatives about humans who are optimum et vetustissimum octagenarium, i.e. the “bestest of all time! Could win nathan’s hot dog-eating contest while making the Jersey Turnpike Marathon unfair and telling you about how it used to cost a nickel at the ferry. Everyone else stinks stonks!””
Why is it that so many people here have to comment on basic marketing stuff? It is necesssary bravado because it's trying to convince people that it's worth using. And saying "Here is a model that is just a bit better than others" isn't going to do anything. Therefore it's necessary to have buzzwords such as groundbreaking.
> Mistral is doing the exact opposite and everyone is talking about it.
Everyone is currently talking about it because they got a massive investment and people started posting links. However, no one was talking about them 2 weeks ago.
You have to look at the pre-trained models and not the "tuned" models. Its number one in that category (which is what I think they are referring to, given the benchmarks are against Mistral and llama)
Is anyone actually using these small models for general purpose applications? Or are they just used to fine tune into narrowly useful specialized models?
I keep seeing them pass each other on different benchmarks and leaderboards and I can't help but imagine that so many of them are only good at benchmarks and not much else.
I haven't gotten a chance to play with this stuff yet so I have no basis to go off of, but I'd like to hear from anyone who's actually been impressed by any of these small models for general purpose tasks.
Mistral-7B is surprisingly decent for general purpose small tasks. The more complex the task or the more specific the knowledge recall, the worse the performance since the smaller the models are - the less breadth they tend to have.
But they're very nice for making PoCs on complex systems since they're near free to run.
I'm using the new generation of small models to do semantic search for music lyrics. The first step is giving the model the text and asking, "What is this text about?" Without fine-tuning and only minimal prompt engineering these models can understand most languages, pull out the most relevant phrases, and list major themes and speculative ideas of what the lyrics mean. I'm super impressed with the results. Reading the answers feels like grading ambitious undergraduate student essays (both the good and the bad).
Yes, I'm using them in my text editor with my plugin [1] and it's already useful for everyday tasks like summarization, fixing grammar or making text more concise.
Just one observation. This model seems "less" multilingual than LLAMA and Mistral. I always ask in Portuguese "What is the capital of Bahia?"/ "Qual a capital da Bahia?" and DeciLM answered wrong and in English ( it said it was Brasilia, the right answer is Salvador). The other models replied correctly.
Doesn't say it on their announcement blog post, but the huggingface page says this model is english only. Unfortunate, imho. https://huggingface.co/Deci/DeciLM-7B
For someone who is a bit techie, but not familiar with much past the original PyTorch, can I get an explanation of why this should matter to me? Are we at the point yet where I can give one of these LLMs a list of characters and an introductory paragraph, and get a 100K word book out of it yet? Totally not asking because I have too many ideas and not enough time to write them all...
If the model has big enough context window, yes. If the 100k word book is good enough for you is another story.
GPT-4 turbo has a 128k token context, which might be good enough for your book.
You might also use another strategy: write a summary of what your book is about and the title of each chapter, then pass the summary plus the chapter title to the LLM and it will generate for you. This would allow you to go beyond the 100k word limit.
I am going to use it and let you know my review. Seems very impressive so far in the smaller model category. Fast inference time matters a lot for smaller LLMs! (Gosh, we live in a new reality where 7B seems smaller!)
"DeciLM-7B’s superior performance is rooted in its strategic implementation of variable Grouped Query Attention (GQA), a significant enhancement over traditional Multi-Query Attention (MQA) and standard GQA."
Their website looks like buzzword bingo from the crypto days with no substance whitepapers and a big group of "researchers" + lots and lots of cool graphics and a huge money making "platform" before they have a product. "Book a demo" lmao.
The literal opposite of Mistral's "no marketing" just a torrent with the data.
Off course it may be actual new work but colour me very sceptical with this extreme confidence and marketingspeak.
Ignoring the fact that this is a 1 day old account, why should I care that this LLM's bars are larger and numbers bigger? Is this corporate blogpost actually good content for anyone? What is the appeal other than "yup, that blue bar sure is longer than the other color bars."?
It's fine if you don't care. Nobody needs to care about everything. But why post your comment? It doesn't add anything to discussion.
For me I care about different companies reaching mistral level so that mistral or whoever is in top has to release the model weights else competitors will.
I posted my comment because I see a new one of these on the front page every morning and I was genuinely curious if there was anyone who cares about these posts? That's alongside the fact I'm almost certain that the account that posted this is a sockpuppet for the company advertising this LLM.
> That's alongside the fact I'm almost certain that the account that posted this is a sockpuppet for the company advertising this LLM.
This is the worst form of defence.
I still use Mistral as was clear from the last post. This model is not that good to make me switch. As I said I just want Mistral or the top model to remain open weight and so I want competition.
"groundbreaking", "outshines its competitors", "remarkable", "pivotal transformation"
Totally unnecessary bravado.