BgGPT-Gemma-2-27B-IT-v1.0

Maintained By
INSAIT-Institute

BgGPT-Gemma-2-27B-IT-v1.0

PropertyValue
Parameter Count27.2B
Model TypeCausal decoder-only transformer
LanguagesBulgarian, English
LicenseGemma Terms of Use
DeveloperINSAIT Institute

What is BgGPT-Gemma-2-27B-IT-v1.0?

BgGPT-Gemma-2-27B-IT-v1.0 is a state-of-the-art Bulgarian language model developed by INSAIT Institute, built upon Google's Gemma 2 27B architecture. The model was continuously pre-trained on approximately 100 billion tokens, with 85 billion in Bulgarian, using an innovative Branch-and-Merge strategy presented at EMNLP'24.

Implementation Details

The model leverages a sophisticated pre-training approach combining Bulgarian web crawl data, Wikipedia content, specialized Bulgarian datasets, and machine translations of popular English datasets. It supports both Bulgarian and English languages while maintaining exceptional performance in both.

  • Implements BF16 tensor type for efficient computation
  • Utilizes Gemma 2's architecture with specialized instruction tuning
  • Supports up to 2048 tokens for generation
  • Features optimized generation parameters including temperature=0.1 and top_k=25

Core Capabilities

  • Outperforms larger models like Qwen 2.5 72B and Llama3.1 70B in Bulgarian benchmarks
  • Excels in logical reasoning, mathematics, and knowledge-based tasks
  • Matches commercial models like Claude Sonnet and GPT-4o in Bulgarian chat performance
  • Maintains strong English language capabilities inherited from Gemma 2

Frequently Asked Questions

Q: What makes this model unique?

The model's Branch-and-Merge training strategy and extensive Bulgarian pre-training make it exceptionally powerful for Bulgarian language tasks while maintaining English capabilities. It achieves this with a relatively efficient 27B parameter count.

Q: What are the recommended use cases?

The model excels in Bulgarian and English text generation, chat applications, and complex reasoning tasks. It's particularly well-suited for educational applications, given its strong performance on academic benchmarks like MON exams and GSM-8k.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.