Phind-CodeLlama-34B-v2-GGUF

Property	Value
Parameter Count	33.7B
Model Type	LLaMA Architecture
License	LLaMA 2
HumanEval Score	73.8% pass@1

What is Phind-CodeLlama-34B-v2-GGUF?

Phind-CodeLlama-34B-v2-GGUF is a state-of-the-art code generation model that represents a significant advancement in AI-powered programming assistance. This model is a GGUF-formatted version of Phind's CodeLlama 34B v2, fine-tuned on 1.5B tokens of high-quality programming data. It achieves an impressive 73.8% pass@1 on HumanEval, making it one of the leading open-source code generation models available.

Implementation Details

The model utilizes the GGUF format, which offers improved tokenization and special token support compared to the older GGML format. It's available in various quantization options, from 2-bit to 8-bit, allowing users to balance between model size and performance based on their hardware capabilities. The model was trained using DeepSpeed ZeRO 3 and Flash Attention 2 on 32 A100-80GB GPUs.

Multiple quantization options (Q2_K through Q8_0)
Supports sequence length of 4096 tokens
Compatible with major frameworks including llama.cpp, text-generation-webui, and others
GPU acceleration support with layer offloading capabilities

Core Capabilities

Multi-language support including Python, C/C++, TypeScript, and Java
Instruction-tuned on Alpaca/Vicuna format for better control and usability
State-of-the-art performance on code generation tasks
Efficient inference with various optimization options

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional performance on code generation tasks, achieving 73.8% pass@1 on HumanEval, which is currently the state-of-the-art for open-source models. It's also notably versatile, supporting multiple programming languages and offering various quantization options for different hardware configurations.

Q: What are the recommended use cases?

The model excels at code generation, code completion, and programming assistance tasks. It's particularly well-suited for developers seeking AI assistance in multiple programming languages, and can be deployed in various environments thanks to its flexible quantization options.