Baichuan-13B-Base

Maintained By
baichuan-inc

Baichuan-13B-Base

PropertyValue
Parameter Count13.2B
Context Length4096 tokens
ArchitectureTransformer with ALiBi positioning
Training Data1.4T tokens
LanguagesChinese, English
LicenseCommunity License (Commercial use requires approval)

What is Baichuan-13B-Base?

Baichuan-13B-Base is a state-of-the-art bilingual language model developed by Baichuan Intelligence. It represents a significant advancement in the field of large language models, featuring 13 billion parameters and trained on an impressive 1.4 trillion tokens, surpassing LLaMA-13B's training data by 40%.

Implementation Details

The model employs an advanced architecture with 40 layers, 40 attention heads, and a hidden size of 5,120. It utilizes ALiBi position encoding instead of traditional Rotary Embedding, resulting in a 31.6% increase in inference speed compared to LLaMA-13B.

  • Hidden layer dimension: 5,120
  • Number of layers: 40
  • Attention heads: 40
  • Vocabulary size: 64,000
  • Context window: 4,096 tokens

Core Capabilities

  • Superior performance on Chinese benchmarks (C-Eval: 52.4% average score)
  • Strong English language capabilities (MMLU: 51.6% average score)
  • Efficient inference with INT8 and INT4 quantization options
  • Optimized for both research and commercial applications

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient ALiBi positioning system, extensive training data, and state-of-the-art performance in both Chinese and English benchmarks. It's also notable for being commercially usable with proper licensing.

Q: What are the recommended use cases?

The model is ideal for developing language applications requiring strong bilingual capabilities, research purposes, and commercial applications after obtaining proper licensing. It serves as a foundation model for fine-tuning specific applications.

The first platform built for prompt engineering