MAmmoTH2-8B-Plus
Property | Value |
---|---|
Parameter Count | 8.03B |
Model Type | Text Generation/Conversational |
Architecture | Llama-based |
License | MIT |
Paper | arXiv:2405.03548 |
What is MAmmoTH2-8B-Plus?
MAmmoTH2-8B-Plus is an advanced language model developed by TIGER-Lab that represents a significant breakthrough in AI reasoning capabilities. Built on the Llama-3 architecture, this model has been fine-tuned using an innovative approach that harvested 10 million instruction-response pairs from web data, followed by additional training on public instruction datasets.
Implementation Details
The model utilizes BF16 tensor type and incorporates sophisticated training procedures that leverage the WEBINSTRUCT dataset. Its architecture is optimized for both general language understanding and specialized mathematical reasoning tasks.
- 8.03 billion parameters for complex reasoning capabilities
- Built on Llama-3 architecture with enhanced instruction tuning
- Optimized using web-scale instruction data
- Implements advanced mathematical reasoning capabilities
Core Capabilities
- Exceptional performance on GSM8K (85.2% accuracy)
- Strong results on MATH benchmark (43.0%)
- Impressive scores on BBH (69.7%) and ARC-C (84.3%)
- Versatile text generation and conversation abilities
Frequently Asked Questions
Q: What makes this model unique?
MAmmoTH2-8B-Plus stands out for its innovative training approach using web-harvested instruction data and its exceptional performance on mathematical reasoning tasks without specific domain training. The model achieved significant improvements over baseline models across multiple benchmarks.
Q: What are the recommended use cases?
The model excels in mathematical reasoning, problem-solving, and general conversation tasks. It's particularly well-suited for applications requiring complex mathematical understanding, educational tools, and general-purpose AI assistance.