Llama-3.1-8B-Stheno-v3.4-GGUF

Maintained By
bartowski

Llama-3.1-8B-Stheno-v3.4-GGUF

PropertyValue
Parameter Count8.03B
LicenseCC-BY-NC-4.0
Base ModelLlama-3.1-8B
Training DatasetsStheno-v3.4-Instruct, Stheno-3.4-Creative-2

What is Llama-3.1-8B-Stheno-v3.4-GGUF?

This is a specialized quantized version of the Llama 3.1 8B model, fine-tuned with Stheno datasets for enhanced creative and instructional capabilities. The model offers multiple GGUF quantization options optimized for different hardware configurations and memory constraints, ranging from 16GB to 2.95GB file sizes.

Implementation Details

The model utilizes llama.cpp's advanced quantization techniques with imatrix calibration, offering various compression levels while maintaining performance. Each quantization version is optimized for specific use cases, from high-quality Q8_0 to lightweight IQ2_M formats.

  • Multiple quantization options (Q8_0 to IQ2_M) for different hardware setups
  • Specialized versions for ARM inference with SVE and i8mm support
  • Enhanced embedding handling with Q8_0 weights in select versions
  • Supports both creative and instructional tasks through dual-dataset training

Core Capabilities

  • Text generation with creative and instructional abilities
  • Flexible deployment across various hardware configurations
  • Optimized performance through specialized quantization techniques
  • Support for conversation-style interactions

Frequently Asked Questions

Q: What makes this model unique?

The model combines Llama 3.1's capabilities with Stheno's creative and instructional abilities, offering numerous quantization options for optimal deployment across different hardware setups.

Q: What are the recommended use cases?

For most users, the Q4_K_M (4.92GB) quantization offers a good balance of quality and size. Users with limited RAM should consider IQ3_XS (3.52GB) or Q3_K_M (4.02GB) versions, while those prioritizing quality should opt for Q6_K_L (6.85GB) or higher.

The first platform built for prompt engineering