Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
[flagged] DeciLM-7B: The Fastest and Most Accurate 7B-Parameter LLM to Date (deci.ai)
84 points by paulenelim on Dec 12, 2023 | hide | past | favorite | 42 comments



The language in this blog post is overblown to say the least.

"groundbreaking", "outshines its competitors", "remarkable", "pivotal transformation"

Totally unnecessary bravado.


Maybe it is written by the model itself.


Hah, the model is pretty good, but the material does sound like an American politician’s hype team. To be fair I think google and OAI are doing the same thing all the time:

“As a large language model, I can explain. I and my relative were trained at the bestest universities, only in superlatives about humans who are optimum et vetustissimum octagenarium, i.e. the “bestest of all time! Could win nathan’s hot dog-eating contest while making the Jersey Turnpike Marathon unfair and telling you about how it used to cost a nickel at the ferry. Everyone else stinks stonks!””


The best model in the history of models, maybe ever?


Why is it that so many people here have to comment on basic marketing stuff? It is necesssary bravado because it's trying to convince people that it's worth using. And saying "Here is a model that is just a bit better than others" isn't going to do anything. Therefore it's necessary to have buzzwords such as groundbreaking.


Mistral is doing the exact opposite and everyone is talking about it.

I'll never respect marketing like this because it's standing on the shoulders of thousands of others peoples work.


> Mistral is doing the exact opposite and everyone is talking about it.

Everyone is currently talking about it because they got a massive investment and people started posting links. However, no one was talking about them 2 weeks ago.


Marginally better just because it does less poorly at math problems. Meh.


> The Fastest and Most Accurate 7B-Parameter LLM to Date

There's plainly 7B models that surpass it on https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

Am I missing something?


You have to look at the pre-trained models and not the "tuned" models. Its number one in that category (which is what I think they are referring to, given the benchmarks are against Mistral and llama)


Doesn't the article say this is a LoRA off ORCA? Isn't that a finetune?


DeciLM-7B-instruct is the finetuned model, not DeciLM-7B.


thanks for the clarification


Base model, not fine-tuned


Is anyone actually using these small models for general purpose applications? Or are they just used to fine tune into narrowly useful specialized models?

I keep seeing them pass each other on different benchmarks and leaderboards and I can't help but imagine that so many of them are only good at benchmarks and not much else.

I haven't gotten a chance to play with this stuff yet so I have no basis to go off of, but I'd like to hear from anyone who's actually been impressed by any of these small models for general purpose tasks.


Mistral-7B is surprisingly decent for general purpose small tasks. The more complex the task or the more specific the knowledge recall, the worse the performance since the smaller the models are - the less breadth they tend to have.

But they're very nice for making PoCs on complex systems since they're near free to run.


There's even a new version of the Mistral 7b out there today that should be a lot better, v 0.2.

The finetunes of 0.1 are already extremely impressive at general tasks.

https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGU...


Have you tried the 0.2 version they released yesterday? Curious if you’re seeing significant improvements


I'm using the new generation of small models to do semantic search for music lyrics. The first step is giving the model the text and asking, "What is this text about?" Without fine-tuning and only minimal prompt engineering these models can understand most languages, pull out the most relevant phrases, and list major themes and speculative ideas of what the lyrics mean. I'm super impressed with the results. Reading the answers feels like grading ambitious undergraduate student essays (both the good and the bad).


Yes, I'm using them in my text editor with my plugin [1] and it's already useful for everyday tasks like summarization, fixing grammar or making text more concise.

[1]: https://github.com/David-Kunz/gen.nvim


I use the Mistral-7B base model for a few things via fewshot prompts, but not really for anything production, just some fun toys.


Just one observation. This model seems "less" multilingual than LLAMA and Mistral. I always ask in Portuguese "What is the capital of Bahia?"/ "Qual a capital da Bahia?" and DeciLM answered wrong and in English ( it said it was Brasilia, the right answer is Salvador). The other models replied correctly.


Doesn't say it on their announcement blog post, but the huggingface page says this model is english only. Unfortunate, imho. https://huggingface.co/Deci/DeciLM-7B


For someone who is a bit techie, but not familiar with much past the original PyTorch, can I get an explanation of why this should matter to me? Are we at the point yet where I can give one of these LLMs a list of characters and an introductory paragraph, and get a 100K word book out of it yet? Totally not asking because I have too many ideas and not enough time to write them all...


If the model has big enough context window, yes. If the 100k word book is good enough for you is another story.

GPT-4 turbo has a 128k token context, which might be good enough for your book.

You might also use another strategy: write a summary of what your book is about and the title of each chapter, then pass the summary plus the chapter title to the LLM and it will generate for you. This would allow you to go beyond the 100k word limit.


Key question: is this an entirely new LLM trained from scratch, or is it fine-tuned on top of an existing foundation model like Llama 2 or Mistral?

I think it's a new foundation model, but the announcement doesn't make that clear to me.


They are trained from scratch on potentially different datasets. Their architectures are very similar though.


I am going to use it and let you know my review. Seems very impressive so far in the smaller model category. Fast inference time matters a lot for smaller LLMs! (Gosh, we live in a new reality where 7B seems smaller!)


I appreciate how these folks released after the model was already on Hugging Face.


How is it faster that other 7b models using pytorch?


Architecture details matter.

"DeciLM-7B’s superior performance is rooted in its strategic implementation of variable Grouped Query Attention (GQA), a significant enhancement over traditional Multi-Query Attention (MQA) and standard GQA."


Claimed in the linked article is about a 2X. The linked post has more details (as does the Model Card).


It uses variable grouped query attention, so it reduces the memory required to fetch during inference.


Their website looks like buzzword bingo from the crypto days with no substance whitepapers and a big group of "researchers" + lots and lots of cool graphics and a huge money making "platform" before they have a product. "Book a demo" lmao.

The literal opposite of Mistral's "no marketing" just a torrent with the data.

Off course it may be actual new work but colour me very sceptical with this extreme confidence and marketingspeak.


Sure, but a crucial difference here is you can go right now and try it out and see if it's garbage or not on HuggingFace[0]

0: https://huggingface.co/spaces/Deci/DeciLM-7B-instruct


> I put a plate on a banana in the kitchen, then take the plate to the living room. Where is the banana?

> The banana is still in the kitchen, as it was placed on the plate before it was moved to the living room.

He's a little confused, but he's got the spirit.


Ignoring the fact that this is a 1 day old account, why should I care that this LLM's bars are larger and numbers bigger? Is this corporate blogpost actually good content for anyone? What is the appeal other than "yup, that blue bar sure is longer than the other color bars."?


Several reasons come to mind:

- AI regulation is partly based on the number of parameters

- It's generally accepted that bigger models perform better, so counterexamples to that notion are valuable

- Smaller models are needed for local, offline processing with current hardware


Well, it should show that there is a new better 7B model under Apache license, something that will prove in practice.


It's fine if you don't care. Nobody needs to care about everything. But why post your comment? It doesn't add anything to discussion.

For me I care about different companies reaching mistral level so that mistral or whoever is in top has to release the model weights else competitors will.


I posted my comment because I see a new one of these on the front page every morning and I was genuinely curious if there was anyone who cares about these posts? That's alongside the fact I'm almost certain that the account that posted this is a sockpuppet for the company advertising this LLM.


> That's alongside the fact I'm almost certain that the account that posted this is a sockpuppet for the company advertising this LLM.

This is the worst form of defence.

I still use Mistral as was clear from the last post. This model is not that good to make me switch. As I said I just want Mistral or the top model to remain open weight and so I want competition.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: