Gemma-2-9B-It-SPPO-Iter3

Property	Value
Parameter Count	9.24B
License	Gemma License
Base Model	google/gemma-2-9b-it
Paper	Self-Play Preference Optimization
AlpacaEval Win Rate	53.27%

What is Gemma-2-9B-It-SPPO-Iter3?

Gemma-2-9B-It-SPPO-Iter3 is an advanced language model developed by UCLA-AGI, representing the third iteration of Self-Play Preference Optimization (SPPO) applied to Google's Gemma architecture. This model demonstrates significant improvements in instruction-following capabilities, achieving a 53.27% win rate on the AlpacaEval benchmark.

Implementation Details

The model is implemented using a sophisticated training approach with specific hyperparameters including a learning rate of 5e-07, RMSProp optimizer, and linear learning rate scheduling. Training utilized DeepSpeed ZeRO-3 across 8 devices, with careful attention to batch size and gradient accumulation for optimal performance.

Training utilized UltraFeedback dataset split into 3 iterations
Implements BF16 tensor type for efficient computation
Maintains the base Gemma architecture while enhancing instruction-following capabilities

Core Capabilities

Superior instruction-following abilities with progressive improvement across iterations
Enhanced response generation with an average response length of 1803 tokens
Optimized for English language tasks with focus on natural conversation
Demonstrated improvement in win rates from 48.70% (Iter1) to 53.27% (Iter3)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its iterative SPPO training approach, showing consistent improvement across three iterations, with each version demonstrating better performance on the AlpacaEval benchmark. The final iteration achieves a significant 53.27% win rate, marking substantial improvement over its predecessors.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, conversational applications, and general text generation. Its optimization through SPPO makes it especially effective for tasks requiring nuanced understanding and coherent response generation in English.