burtenshaw_GemmaCoder3-12B-GGUF
Property | Value |
---|---|
Original Model | GemmaCoder3-12B |
Size Range | 4.02GB - 23.54GB |
Quantization Types | Multiple (Q2-Q8, IQ2-IQ4) |
Source | Hugging Face |
What is burtenshaw_GemmaCoder3-12B-GGUF?
This is a comprehensive collection of LlamaCpp quantized versions of the GemmaCoder3-12B model, offering various compression levels optimized for different hardware configurations and use cases. The quantizations were created using llama.cpp release b5010 with imatrix options, providing a range of trade-offs between model size and performance.
Implementation Details
The model comes in multiple quantization formats, from the full BF16 weights (23.54GB) down to highly compressed IQ2_S format (4.02GB). Each quantization level offers different benefits:
- Q8_0/Q6_K_L: Highest quality quantizations for maximum performance
- Q5_K series: Recommended balance of quality and size
- Q4_K series: Good quality default for most use cases
- IQ4/IQ3 series: Newer methods offering good performance at smaller sizes
- Q2_K/IQ2 series: Ultra-compact options with surprisingly usable quality
Core Capabilities
- Supports online repacking for ARM and AVX CPU inference
- Specialized formats (Q3_K_XL, Q4_K_L) with Q8_0 embed/output weights
- Compatible with LM Studio and any llama.cpp based project
- Optimized prompt format for consistent interaction
Frequently Asked Questions
Q: What makes this model unique?
The model provides an extensive range of quantization options, allowing users to precisely balance model size, performance, and quality based on their hardware constraints. It includes modern quantization techniques like online repacking and specialized embed/output weight handling.
Q: What are the recommended use cases?
For most users, the Q4_K_M (7.30GB) variant offers a good balance of quality and size. Users with limited RAM should consider Q3_K series or IQ3 variants, while those prioritizing quality should opt for Q6_K_L or Q5_K series quantizations.