Magnum v4 123B

Property	Value
Parameter Count	123 Billion
Base Model	Mistral-Large-Instruct-2407
Context Length	16,384 tokens
License	MRL
Training Infrastructure	8x mi300x GPUs

What is magnum-v4-123b?

Magnum v4 123B is an advanced language model designed to replicate the prose quality of Claude 3 models (Sonnet and Opus). Built on the Mistral-Large-Instruct-2407 architecture, this model represents a significant achievement in high-quality text generation and conversational AI.

Implementation Details

The model utilizes BF16 precision and incorporates advanced training techniques including gradient checkpointing and flash attention. It was trained using the Axolotl framework across 6 carefully curated datasets, with a sequence length of 16,384 tokens and specialized optimizations for performance.

Full parameter fine-tuning on 8x mi300x GPUs
Implements flash attention and gradient checkpointing for efficiency
Uses sample packing and cosine learning rate scheduler
Trained with adamw_bnb_8bit optimizer

Core Capabilities

High-quality prose generation similar to Claude 3
Extended context window of 16k tokens
Efficient text generation with optimized architecture
Compatible with standard chat interfaces and SillyTavern templates

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its specific optimization toward Claude-like prose quality, combined with the robust Mistral architecture and extensive 16k context window. It's built using a diverse set of high-quality training datasets and advanced training techniques.

Q: What are the recommended use cases?

The model excels in conversational AI applications, creative writing, and tasks requiring high-quality prose generation. It's particularly well-suited for long-form content generation and complex dialogue interactions.

magnum-v4-123b