LLaMA-3-FireFunction-v2

Property	Value
Parameter Count	70.6B
Model Type	Function Calling Language Model
Architecture	LLaMA 3
License	LLaMA 3
Tensor Type	FP16

What is llama-3-firefunction-v2?

FireFunction v2 is a specialized language model built on the LLaMA 3 architecture, specifically optimized for function calling capabilities. It represents a significant advancement over its predecessor, achieving performance metrics comparable to GPT-4, while operating at roughly 10% of the cost and twice the speed. The model maintains strong conversation and instruction-following capabilities, scoring 0.84 on MT bench compared to LLaMA 3's 0.89.

Implementation Details

Built on the LLaMA 3 architecture, FireFunction v2 implements sophisticated function calling capabilities while preserving the base model's 8k context window. It utilizes FP16 precision and incorporates parallel function calling support, a significant improvement over its predecessor.

Supports up to 20 function specifications simultaneously
Enables parallel function calling with high accuracy (0.9 on Gorilla parallel_function benchmark)
Maintains an 8k context window from the base LLaMA 3 model
Implements efficient structured information extraction

Core Capabilities

Multi-turn chat mixing vanilla messages with function calls
Parallel function calling with high accuracy
Strong instruction following abilities
Structured information extraction
Competitive performance with GPT-4 on function-calling tasks (0.81 vs 0.80)

Frequently Asked Questions

Q: What makes this model unique?

FireFunction v2 stands out for its ability to match GPT-4's function-calling capabilities at a fraction of the cost while maintaining high performance in general conversation tasks. It's particularly notable for supporting parallel function calling and handling up to 20 function specifications simultaneously.

Q: What are the recommended use cases?

The model excels in scenarios requiring function calling, structured information extraction, and general instruction following. It's ideal for applications needing to process multiple function calls in parallel, handle complex chat interactions, and perform structured data extraction, while maintaining cost efficiency.