Llama-3.1-8B-Stheno-v3.4-GGUF

Property	Value
Parameter Count	8.03B
License	CC-BY-NC-4.0
Base Model	Llama-3.1-8B
Training Datasets	Stheno-v3.4-Instruct, Stheno-3.4-Creative-2

What is Llama-3.1-8B-Stheno-v3.4-GGUF?

This is a specialized quantized version of the Llama 3.1 8B model, fine-tuned with Stheno datasets for enhanced creative and instructional capabilities. The model offers multiple GGUF quantization options optimized for different hardware configurations and memory constraints, ranging from 16GB to 2.95GB file sizes.

Implementation Details

The model utilizes llama.cpp's advanced quantization techniques with imatrix calibration, offering various compression levels while maintaining performance. Each quantization version is optimized for specific use cases, from high-quality Q8_0 to lightweight IQ2_M formats.

Multiple quantization options (Q8_0 to IQ2_M) for different hardware setups
Specialized versions for ARM inference with SVE and i8mm support
Enhanced embedding handling with Q8_0 weights in select versions
Supports both creative and instructional tasks through dual-dataset training

Core Capabilities

Text generation with creative and instructional abilities
Flexible deployment across various hardware configurations
Optimized performance through specialized quantization techniques
Support for conversation-style interactions

Frequently Asked Questions

Q: What makes this model unique?

The model combines Llama 3.1's capabilities with Stheno's creative and instructional abilities, offering numerous quantization options for optimal deployment across different hardware setups.

Q: What are the recommended use cases?

For most users, the Q4_K_M (4.92GB) quantization offers a good balance of quality and size. Users with limited RAM should consider IQ3_XS (3.52GB) or Q3_K_M (4.02GB) versions, while those prioritizing quality should opt for Q6_K_L (6.85GB) or higher.