This [1] recently rereleased dive into Mamba and State Space Models is more relevant now AI21 Labs have proved it somewhat at scale here. People are already fine tuning it so will be interesting to see how well it performs in the open source community.
Ps someone needs to update the lab name in the title
Looking forward to an 8bit instruct version on llama.cpp to try out problems with the insane context length.
It would be interesting if all these models were finetuned on basic datalog which is a very simple language. That way they could demonstrate their logic/reasoning capabilities as well as ability to learn from mistakes and iterate.
Super stoked to see this. The loss curves look like there is still some gains to be made from further training. Maybe I’m too stupid to read but I could not find anything on the amount of tokens this has been trained on except for some of the ablation runs.
Yes, it's available in https://huggingface.co/ai21labs/Jamba-v0.1 and has Apache-2 license.
Do note that it's a base model, not fine-tuned for instruction-following or chat
Ps someone needs to update the lab name in the title
[1] Mamba Explained https://news.ycombinator.com/item?id=39876114