Phind-CodeLlama-34B-v2-GGUF
Property | Value |
---|---|
Parameter Count | 33.7B |
Model Type | LLaMA Architecture |
License | LLaMA 2 |
HumanEval Score | 73.8% pass@1 |
What is Phind-CodeLlama-34B-v2-GGUF?
Phind-CodeLlama-34B-v2-GGUF is a state-of-the-art code generation model that represents a significant advancement in AI-powered programming assistance. This model is a GGUF-formatted version of Phind's CodeLlama 34B v2, fine-tuned on 1.5B tokens of high-quality programming data. It achieves an impressive 73.8% pass@1 on HumanEval, making it one of the leading open-source code generation models available.
Implementation Details
The model utilizes the GGUF format, which offers improved tokenization and special token support compared to the older GGML format. It's available in various quantization options, from 2-bit to 8-bit, allowing users to balance between model size and performance based on their hardware capabilities. The model was trained using DeepSpeed ZeRO 3 and Flash Attention 2 on 32 A100-80GB GPUs.
- Multiple quantization options (Q2_K through Q8_0)
- Supports sequence length of 4096 tokens
- Compatible with major frameworks including llama.cpp, text-generation-webui, and others
- GPU acceleration support with layer offloading capabilities
Core Capabilities
- Multi-language support including Python, C/C++, TypeScript, and Java
- Instruction-tuned on Alpaca/Vicuna format for better control and usability
- State-of-the-art performance on code generation tasks
- Efficient inference with various optimization options
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its exceptional performance on code generation tasks, achieving 73.8% pass@1 on HumanEval, which is currently the state-of-the-art for open-source models. It's also notably versatile, supporting multiple programming languages and offering various quantization options for different hardware configurations.
Q: What are the recommended use cases?
The model excels at code generation, code completion, and programming assistance tasks. It's particularly well-suited for developers seeking AI assistance in multiple programming languages, and can be deployed in various environments thanks to its flexible quantization options.