OpenHands LM 32B GGUF

Property	Value
Base Model	OpenHands LM 32B
Quantization Types	Multiple (BF16 to IQ2_XXS)
Size Range	9GB - 65.54GB
Original Source	all-hands/openhands-lm-32b-v0.1

What is all-hands_openhands-lm-32b-v0.1-GGUF?

This is a comprehensive collection of GGUF quantizations of the OpenHands LM 32B model, offering various compression levels to accommodate different hardware configurations and use cases. The quantizations were created using llama.cpp with imatrix optimization, providing a balance between model size and performance.

Implementation Details

The model features multiple quantization formats, from full BF16 weights (65.54GB) down to highly compressed IQ2_XXS (9.03GB). Notable implementations include K-quants (Q2_K to Q8_0) and I-quants (IQ2_XXS to IQ4_NL), each optimized for specific hardware configurations and performance requirements.

Advanced quantization techniques including embed/output weight optimization
Online repacking support for ARM and AVX CPU inference
SOTA compression techniques maintaining usability even at lower bits
Specialized formats for different hardware architectures

Core Capabilities

Flexible deployment options across various hardware configurations
High-quality compression maintaining model performance
Optimized performance for both CPU and GPU implementations
Support for modern prompt format with system/user/assistant structure

Frequently Asked Questions

Q: What makes this model unique?

This model offers an exceptionally wide range of quantization options, from high-quality Q8_0 to highly compressed IQ2 variants, making it adaptable to various hardware constraints while maintaining usability.

Q: What are the recommended use cases?

For maximum quality, use Q6_K_L or Q6_K versions. For balanced performance, Q4_K_M is recommended. For limited RAM scenarios, the IQ3/IQ2 variants offer surprisingly usable performance at smaller sizes.

all-hands_openhands-lm-32b-v0.1-GGUF