Jamba: A Hybrid Transformer-Mamba Language Model

jimmySixDOF · 2024-04-01T04:56:03 1711947363

This [1] recently rereleased dive into Mamba and State Space Models is more relevant now AI21 Labs have proved it somewhat at scale here. People are already fine tuning it so will be interesting to see how well it performs in the open source community.

Ps someone needs to update the lab name in the title

[1] Mamba Explained https://news.ycombinator.com/item?id=39876114

JimmyRuska · 2024-04-01T05:29:20 1711949360

Looking forward to an 8bit instruct version on llama.cpp to try out problems with the insane context length.

It would be interesting if all these models were finetuned on basic datalog which is a very simple language. That way they could demonstrate their logic/reasoning capabilities as well as ability to learn from mistakes and iterate.

Escapado · 2024-04-01T04:19:01 1711945141

Super stoked to see this. The loss curves look like there is still some gains to be made from further training. Maybe I’m too stupid to read but I could not find anything on the amount of tokens this has been trained on except for some of the ablation runs.

dang · 2024-04-01T06:22:16 1711952536

Recent and related:

Jamba: Production-grade Mamba-based AI model - https://news.ycombinator.com/item?id=39853958 - March 2024 (80 comments)

karmasimida · 2024-04-01T04:25:25 1711945525

Anywhere that I can play with this model?

michael-go · 2024-04-01T04:49:45 1711946985

Yes, it's available in https://huggingface.co/ai21labs/Jamba-v0.1 and has Apache-2 license. Do note that it's a base model, not fine-tuned for instruction-following or chat