RedPajama-INCITE-Base-3B-v1

Maintained By
togethercomputer

RedPajama-INCITE-Base-3B-v1

PropertyValue
Parameters2.8B
LicenseApache 2.0
Training DataRedPajama-Data-1T
LanguageEnglish
Hardware Requirements8GB GPU memory (or CPU)

What is RedPajama-INCITE-Base-3B-v1?

RedPajama-INCITE-Base-3B-v1 is a powerful language model developed through collaboration between Together Computer and leading institutions in the AI community. Trained on the comprehensive RedPajama-Data-1T dataset, this model represents a significant achievement in open-source AI development, leveraging 3,072 V100 GPUs through the INCITE 2023 project.

Implementation Details

The model was trained using a sophisticated setup involving 256 nodes of 6xV100 GPUs on the OLCF Summit cluster. It implements pipeline parallel 6 and tensor parallel 2 architectures, with a global batch size of 4M tokens and 800B tokens total training data. The training utilized Apex FusedAdam optimizer with a learning rate of 0.00016.

  • Supports multiple inference modes: GPU, CPU, and Int8 quantization
  • Requires transformers version 4.25.1 or higher
  • Implements efficient memory management techniques
  • Offers flexible deployment options with different precision levels

Core Capabilities

  • General text generation and completion
  • Efficient inference on both GPU and CPU
  • Support for various inference optimization techniques
  • Customizable generation parameters (temperature, top-p, top-k)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and versatile deployment options, making it accessible for various computing environments while maintaining strong performance. It's particularly notable for being part of the larger RedPajama ecosystem, with specialized versions available for instruction-tuning and chat applications.

Q: What are the recommended use cases?

The model is best suited for general language modeling tasks, including text generation and completion. However, it's important to note that it should not be used for generating harmful content, misinformation, or any malicious purposes as outlined in the model's usage guidelines.

The first platform built for prompt engineering