MoMo-72B-LoRA-V1.4
Property | Value |
---|---|
Parameter Count | 72.3B |
License | MIT |
Base Model | QWEN-72B |
Training Method | LoRA |
Paper | LoRA Paper |
What is MoMo-72B-LoRA-V1.4?
MoMo-72B-LoRA-V1.4 is an advanced language model developed by Moreh, built upon the QWEN-72B architecture and fine-tuned using the Low-Rank Adaptation (LoRA) technique. This model represents a significant achievement in efficient model training, utilizing AMD's MI250 GPU and the MoAI platform for optimization.
Implementation Details
The model is trained exclusively on the Open-Orca/SlimOrca dataset using Supervised Fine-Tuning (SFT). Notable performance metrics include a 0.73% contamination rate on MMLU and 0.71% on TruthfulQA, demonstrating its originality and minimal data contamination.
- Implemented using PyTorch and PEFT libraries
- Trained on AMD MI250 hardware
- Uses F32 tensor type for computations
- Optimized for text-generation-inference
Core Capabilities
- Advanced text generation and processing
- Optimized for English language tasks
- Efficient deployment through text-generation-inference
- Compatible with Transformers framework
Frequently Asked Questions
Q: What makes this model unique?
The model's uniqueness lies in its implementation of LoRA for fine-tuning a massive 72.3B parameter model, achieving efficient training without weight merging, and its exclusive use of the SlimOrca dataset.
Q: What are the recommended use cases?
The model is particularly suited for text generation tasks, leveraging its large parameter count and specialized training for enhanced performance in English language processing and generation tasks.