We have just released a new series of models, including language models and multimodal models, called MiniCPM. Our goal is to explore the process of configuration tuning and scaling in large language models while advancing the democratization of such models, enabling intelligence to be accessible anytime, anywhere.
In our technical report, we share how we systematically search for and tune model configurations, demonstrating the ongoing effectiveness of this search process during the scaling process. Additionally, based on our real-world use cases, we introduce the Warmup-Stable-Decay (WSD) Learning Rate Scheduler and data strategy, designed to be more friendly to continuous training.
If you believe that the exploration of MiniCPM can benefit your use cases or provide insights for optimizing your models and you are interested in contributing, please test it out and provide feedback. You can report any bugs, difficulties, or desired features by opening issues. Feel free to share your ideas in this thread or on GitHub. And, of course, we appreciate stars for the project.
We have just released a new series of models, including language models and multimodal models, called MiniCPM. Our goal is to explore the process of configuration tuning and scaling in large language models while advancing the democratization of such models, enabling intelligence to be accessible anytime, anywhere.
In our technical report, we share how we systematically search for and tune model configurations, demonstrating the ongoing effectiveness of this search process during the scaling process. Additionally, based on our real-world use cases, we introduce the Warmup-Stable-Decay (WSD) Learning Rate Scheduler and data strategy, designed to be more friendly to continuous training.
If you believe that the exploration of MiniCPM can benefit your use cases or provide insights for optimizing your models and you are interested in contributing, please test it out and provide feedback. You can report any bugs, difficulties, or desired features by opening issues. Feel free to share your ideas in this thread or on GitHub. And, of course, we appreciate stars for the project.
* Xiang