Luminum-v0.1-123B
Property | Value |
---|---|
Parameter Count | 123B |
Model Type | Text Generation |
Architecture | Mistral-based Transformer |
Tensor Type | BF16 |
Context Length | Up to 128k tokens |
What is Luminum-v0.1-123B?
Luminum-v0.1-123B is an advanced language model created through a sophisticated merge of Mistral Large, Lumimaid-v0.2-123B, and Magnum-v2-123B. Using the della_linear merge method, it combines the cognitive capabilities of Mistral with the creative elements of its merged companions. The model maintains the base intelligence of Mistral while incorporating Lumimaid's lexical richness and Magnum's descriptive prowess.
Implementation Details
The model employs a carefully calibrated merge configuration with specific weights: Magnum-v2-123B (weight: 0.19, density: 0.5) and Lumimaid-v0.2-123B (weight: 0.34, density: 0.8). It uses bfloat16 precision and implements intelligent parameter mixing with epsilon: 0.05 and lambda: 1.
- Supports context lengths up to 128k tokens (tested successfully up to 50k)
- Utilizes Mistral template format for input/output
- Available in multiple quantization formats (GGUF and EXL2)
- Optimized hyperparameters for inference
Core Capabilities
- Enhanced creative text generation while maintaining coherence
- Balanced output between conciseness and detail
- Strong performance in conversational tasks
- Extensive context handling capability
- Multiple quantization options for different hardware configurations
Frequently Asked Questions
Q: What makes this model unique?
The model uniquely combines Mistral's intelligence with enhanced creative capabilities while avoiding the common pitfall of excessive verbosity. It achieves this through carefully calibrated merge weights and maintains high performance across extended context lengths.
Q: What are the recommended use cases?
The model excels in creative writing, detailed conversations, and tasks requiring both analytical thinking and creative expression. It's particularly effective with contexts up to 32k tokens, though it can handle much longer sequences.