TinyLlama-1.1B-Chat-v1.0-GPTQ

Maintained By
TheBloke

TinyLlama-1.1B-Chat-v1.0-GPTQ

PropertyValue
Parameter Count1.1B
LicenseApache 2.0
Model Size262M params (quantized)
Training DataSlimPajama-627B, StarCoder, OpenAssistant

What is TinyLlama-1.1B-Chat-v1.0-GPTQ?

TinyLlama-1.1B-Chat-v1.0-GPTQ is a quantized version of the original TinyLlama chat model, specifically optimized for efficient deployment and reduced resource consumption. This model represents a significant achievement in creating compact, efficient language models that maintain impressive capabilities while requiring minimal computational resources.

Implementation Details

The model uses the same architecture as Llama 2 but is compressed to just 1.1B parameters. It has been quantized using GPTQ technology, offering multiple quantization options including 4-bit and 8-bit versions with various group sizes. The model was initially trained on 3 trillion tokens and then fine-tuned using the UltraChat dataset and aligned using DPO training on UltraFeedback.

  • Multiple quantization options (4-bit to 8-bit)
  • Compatible with ExLlama for 4-bit versions
  • Supports different group sizes (32g, 64g, 128g) for performance tuning
  • Uses Zephyr prompt template format

Core Capabilities

  • Efficient chat and text generation
  • Supports context length of up to 2048 tokens
  • Compatible with major inference frameworks including text-generation-webui and HuggingFace TGI
  • Optimized for both CPU and GPU deployment

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its impressive balance between size and performance. At just 1.1B parameters, it's one of the most compact yet capable chat models available, making it ideal for resource-constrained environments.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring lightweight deployment, edge computing, or situations where computational resources are limited. It's ideal for chatbots, text generation, and basic language understanding tasks that don't require the full capacity of larger models.

The first platform built for prompt engineering