RedPajama-INCITE-Base-3B-v1

Property	Value
Parameters	2.8B
License	Apache 2.0
Training Data	RedPajama-Data-1T
Language	English
Hardware Requirements	8GB GPU memory (or CPU)

What is RedPajama-INCITE-Base-3B-v1?

RedPajama-INCITE-Base-3B-v1 is a powerful language model developed through collaboration between Together Computer and leading institutions in the AI community. Trained on the comprehensive RedPajama-Data-1T dataset, this model represents a significant achievement in open-source AI development, leveraging 3,072 V100 GPUs through the INCITE 2023 project.

Implementation Details

The model was trained using a sophisticated setup involving 256 nodes of 6xV100 GPUs on the OLCF Summit cluster. It implements pipeline parallel 6 and tensor parallel 2 architectures, with a global batch size of 4M tokens and 800B tokens total training data. The training utilized Apex FusedAdam optimizer with a learning rate of 0.00016.

Supports multiple inference modes: GPU, CPU, and Int8 quantization
Requires transformers version 4.25.1 or higher
Implements efficient memory management techniques
Offers flexible deployment options with different precision levels

Core Capabilities

General text generation and completion
Efficient inference on both GPU and CPU
Support for various inference optimization techniques
Customizable generation parameters (temperature, top-p, top-k)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its efficient architecture and versatile deployment options, making it accessible for various computing environments while maintaining strong performance. It's particularly notable for being part of the larger RedPajama ecosystem, with specialized versions available for instruction-tuning and chat applications.

Q: What are the recommended use cases?

The model is best suited for general language modeling tasks, including text generation and completion. However, it's important to note that it should not be used for generating harmful content, misinformation, or any malicious purposes as outlined in the model's usage guidelines.