TinyLlama-1.1B-Chat-v0.3-GPTQ
Property | Value |
---|---|
Parameter Count | 1.1B |
License | Apache 2.0 |
Quantization | GPTQ 4-bit |
Model Size | 262M params |
What is TinyLlama-1.1B-Chat-v0.3-GPTQ?
TinyLlama-1.1B-Chat-v0.3-GPTQ is a quantized version of the TinyLlama chat model, designed for efficient deployment while maintaining performance. This model was created by TheBloke, based on Zhang Peiyuan's original work, and features multiple quantization options to balance performance and resource usage.
Implementation Details
The model uses the same architecture and tokenizer as Llama 2, making it compatible with many existing Llama-based projects. It was trained on 3 trillion tokens and fine-tuned using the OpenAssistant dataset following the ChatML format.
- Multiple GPTQ quantization options (4-bit and 8-bit variants)
- Supports various group sizes (32g, 64g, 128g) for different VRAM requirements
- Compatible with AutoGPTQ and text-generation-inference
- Uses the ChatML prompt template for structured interactions
Core Capabilities
- Efficient resource usage with multiple quantization options
- Plug-and-play compatibility with Llama ecosystem
- Support for both GPU and CPU inference
- Optimized for chat-based applications
Frequently Asked Questions
Q: What makes this model unique?
Its compact size and multiple quantization options make it ideal for resource-constrained environments while maintaining compatibility with the Llama ecosystem.
Q: What are the recommended use cases?
This model is perfect for chat applications, especially in scenarios where computational resources are limited. It's particularly suitable for edge devices and applications requiring efficient deployment.