internlm-20b

Maintained By
internlm

InternLM-20B

PropertyValue
Model Size20B parameters
Training Data2.3T Tokens
Context Length16k tokens
LicenseApache-2.0
Architecture60-layer Transformer

What is internlm-20b?

InternLM-20B is a state-of-the-art language model developed through collaboration between Shanghai AI Lab, SenseTime Technology, CUHK, and Fudan University. This model represents a significant advancement in large language model capability, particularly notable for its deeper architecture and comprehensive training approach.

Implementation Details

The model employs a unique 60-layer architecture, significantly deeper than conventional 7B and 13B models. It's trained on over 2.3T tokens of high-quality multilingual data including English, Chinese, and code. The model supports bfloat16 precision and can be easily integrated using the Transformers library.

  • Deeper architecture (60 layers) for enhanced capability
  • Comprehensive training on multilingual data
  • Supports 16k context length through inference extrapolation
  • Optimized for both base language modeling and chat applications

Core Capabilities

  • Strong performance across language, knowledge, and reasoning tasks
  • Exceptional scores in examination benchmarks (62.05% on MMLU)
  • Advanced tool invocation capabilities
  • Improved value alignment and safety features
  • Superior performance in programming tasks (25.61% on HumanEval)

Frequently Asked Questions

Q: What makes this model unique?

InternLM-20B stands out for its deeper architecture and comprehensive training approach, achieving performance levels that rival models with significantly more parameters. It particularly excels in understanding, reasoning, and examination tasks, often outperforming models in the 13B-33B parameter range.

Q: What are the recommended use cases?

The model is well-suited for a wide range of applications including text generation, programming assistance, reasoning tasks, and general language understanding. It's particularly effective for both English and Chinese language tasks, making it versatile for multilingual applications.

The first platform built for prompt engineering