Hacker News new | past | comments | ask | show | jobs | submit login
Jamba: A Hybrid Transformer-Mamba Language Model (arxiv.org)
74 points by eitanturok 6 months ago | hide | past | favorite | 6 comments



This [1] recently rereleased dive into Mamba and State Space Models is more relevant now AI21 Labs have proved it somewhat at scale here. People are already fine tuning it so will be interesting to see how well it performs in the open source community.

Ps someone needs to update the lab name in the title

[1] Mamba Explained https://news.ycombinator.com/item?id=39876114


Looking forward to an 8bit instruct version on llama.cpp to try out problems with the insane context length.

It would be interesting if all these models were finetuned on basic datalog which is a very simple language. That way they could demonstrate their logic/reasoning capabilities as well as ability to learn from mistakes and iterate.


Super stoked to see this. The loss curves look like there is still some gains to be made from further training. Maybe I’m too stupid to read but I could not find anything on the amount of tokens this has been trained on except for some of the ablation runs.


Recent and related:

Jamba: Production-grade Mamba-based AI model - https://news.ycombinator.com/item?id=39853958 - March 2024 (80 comments)


Anywhere that I can play with this model?


Yes, it's available in https://huggingface.co/ai21labs/Jamba-v0.1 and has Apache-2 license. Do note that it's a base model, not fine-tuned for instruction-following or chat




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: