ChatLM-mini-Chinese
Property | Value |
---|---|
Parameter Count | 0.2B parameters |
License | Apache-2.0 |
Paper | Research Paper |
Architecture | T5-based Transformer |
What is ChatLM-mini-Chinese?
ChatLM-mini-Chinese is a compact Chinese language model designed for dialogue tasks, featuring 0.2B parameters. It represents a significant effort to create an efficient and accessible Chinese language model that can run on consumer-grade hardware while maintaining good performance. The model is built on a T5 architecture and has been thoroughly trained through multiple stages including pre-training, SFT instruction tuning, and DPO preference optimization.
Implementation Details
The model utilizes a comprehensive training pipeline that includes data cleaning, tokenizer training, model pre-training, and various fine-tuning stages. It's implemented using the Hugging Face framework and supports both single and multi-GPU training configurations. The training data encompasses over 10 million samples from diverse Chinese sources, ensuring broad coverage of language patterns and use cases.
- Text-to-Text pretraining with 9.3 million samples
- SFT fine-tuning with 1.37 million instruction samples
- DPO preference optimization for better alignment
- Custom tokenizer with 29,298 vocabulary size
Core Capabilities
- Chinese dialogue generation and response
- Information extraction and downstream task adaptation
- Efficient inference with minimal hardware requirements
- Stream chat support with greedy search
- Flexible deployment options for various use cases
Frequently Asked Questions
Q: What makes this model unique?
The model stands out for its efficient architecture that enables running on consumer hardware with just 512MB GPU memory in float16 format, while maintaining competitive performance. It also features a complete, open-source training pipeline that allows for reproduction and customization.
Q: What are the recommended use cases?
The model is well-suited for Chinese language dialogue applications, chatbots, and information extraction tasks. It can be fine-tuned for specific downstream tasks while maintaining its dialogue capabilities, making it versatile for various NLP applications requiring Chinese language understanding and generation.