LLaMA-3-FireFunction-v2
Property | Value |
---|---|
Parameter Count | 70.6B |
Model Type | Function Calling Language Model |
Architecture | LLaMA 3 |
License | LLaMA 3 |
Tensor Type | FP16 |
What is llama-3-firefunction-v2?
FireFunction v2 is a specialized language model built on the LLaMA 3 architecture, specifically optimized for function calling capabilities. It represents a significant advancement over its predecessor, achieving performance metrics comparable to GPT-4, while operating at roughly 10% of the cost and twice the speed. The model maintains strong conversation and instruction-following capabilities, scoring 0.84 on MT bench compared to LLaMA 3's 0.89.
Implementation Details
Built on the LLaMA 3 architecture, FireFunction v2 implements sophisticated function calling capabilities while preserving the base model's 8k context window. It utilizes FP16 precision and incorporates parallel function calling support, a significant improvement over its predecessor.
- Supports up to 20 function specifications simultaneously
- Enables parallel function calling with high accuracy (0.9 on Gorilla parallel_function benchmark)
- Maintains an 8k context window from the base LLaMA 3 model
- Implements efficient structured information extraction
Core Capabilities
- Multi-turn chat mixing vanilla messages with function calls
- Parallel function calling with high accuracy
- Strong instruction following abilities
- Structured information extraction
- Competitive performance with GPT-4 on function-calling tasks (0.81 vs 0.80)
Frequently Asked Questions
Q: What makes this model unique?
FireFunction v2 stands out for its ability to match GPT-4's function-calling capabilities at a fraction of the cost while maintaining high performance in general conversation tasks. It's particularly notable for supporting parallel function calling and handling up to 20 function specifications simultaneously.
Q: What are the recommended use cases?
The model excels in scenarios requiring function calling, structured information extraction, and general instruction following. It's ideal for applications needing to process multiple function calls in parallel, handle complex chat interactions, and perform structured data extraction, while maintaining cost efficiency.