Gemma-2-9B-It-SPPO-Iter3

Maintained By
UCLA-AGI

Gemma-2-9B-It-SPPO-Iter3

PropertyValue
Parameter Count9.24B
LicenseGemma License
Base Modelgoogle/gemma-2-9b-it
PaperSelf-Play Preference Optimization
AlpacaEval Win Rate53.27%

What is Gemma-2-9B-It-SPPO-Iter3?

Gemma-2-9B-It-SPPO-Iter3 is an advanced language model developed by UCLA-AGI, representing the third iteration of Self-Play Preference Optimization (SPPO) applied to Google's Gemma architecture. This model demonstrates significant improvements in instruction-following capabilities, achieving a 53.27% win rate on the AlpacaEval benchmark.

Implementation Details

The model is implemented using a sophisticated training approach with specific hyperparameters including a learning rate of 5e-07, RMSProp optimizer, and linear learning rate scheduling. Training utilized DeepSpeed ZeRO-3 across 8 devices, with careful attention to batch size and gradient accumulation for optimal performance.

  • Training utilized UltraFeedback dataset split into 3 iterations
  • Implements BF16 tensor type for efficient computation
  • Maintains the base Gemma architecture while enhancing instruction-following capabilities

Core Capabilities

  • Superior instruction-following abilities with progressive improvement across iterations
  • Enhanced response generation with an average response length of 1803 tokens
  • Optimized for English language tasks with focus on natural conversation
  • Demonstrated improvement in win rates from 48.70% (Iter1) to 53.27% (Iter3)

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its iterative SPPO training approach, showing consistent improvement across three iterations, with each version demonstrating better performance on the AlpacaEval benchmark. The final iteration achieves a significant 53.27% win rate, marking substantial improvement over its predecessors.

Q: What are the recommended use cases?

The model is particularly well-suited for instruction-following tasks, conversational applications, and general text generation. Its optimization through SPPO makes it especially effective for tasks requiring nuanced understanding and coherent response generation in English.

The first platform built for prompt engineering