MoMo-72B-LoRA-V1.4

Property	Value
Parameter Count	72.3B
License	MIT
Base Model	QWEN-72B
Training Method	LoRA
Paper	LoRA Paper

What is MoMo-72B-LoRA-V1.4?

MoMo-72B-LoRA-V1.4 is an advanced language model developed by Moreh, built upon the QWEN-72B architecture and fine-tuned using the Low-Rank Adaptation (LoRA) technique. This model represents a significant achievement in efficient model training, utilizing AMD's MI250 GPU and the MoAI platform for optimization.

Implementation Details

The model is trained exclusively on the Open-Orca/SlimOrca dataset using Supervised Fine-Tuning (SFT). Notable performance metrics include a 0.73% contamination rate on MMLU and 0.71% on TruthfulQA, demonstrating its originality and minimal data contamination.

Implemented using PyTorch and PEFT libraries
Trained on AMD MI250 hardware
Uses F32 tensor type for computations
Optimized for text-generation-inference

Core Capabilities

Advanced text generation and processing
Optimized for English language tasks
Efficient deployment through text-generation-inference
Compatible with Transformers framework

Frequently Asked Questions

Q: What makes this model unique?

The model's uniqueness lies in its implementation of LoRA for fine-tuning a massive 72.3B parameter model, achieving efficient training without weight merging, and its exclusive use of the SlimOrca dataset.

Q: What are the recommended use cases?

The model is particularly suited for text generation tasks, leveraging its large parameter count and specialized training for enhanced performance in English language processing and generation tasks.

MoMo-72B-LoRA-V1.4

MoMo-72B-LoRA-V1.4

What is MoMo-72B-LoRA-V1.4?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models