GPT-NeoXT-Chat-Base-20B

Maintained By
togethercomputer

GPT-NeoXT-Chat-Base-20B

PropertyValue
LicenseApache 2.0
LanguageEnglish
Training Data43M high-quality instructions
Hardware Requirements48GB GPU (Full) / 24GB GPU (Int8)

What is GPT-NeoXT-Chat-Base-20B?

GPT-NeoXT-Chat-Base-20B is a sophisticated 20-billion parameter language model developed by Together Computer, built upon EleutherAI's GPT-NeoX architecture. The model has been extensively fine-tuned on over 40 million high-quality instructions, with a particular focus on dialog-style interactions. What sets this model apart is its training on 100% carbon-negative compute infrastructure and its additional fine-tuning using human feedback data.

Implementation Details

The model supports various deployment options, including full precision (requiring 48GB GPU memory), 8-bit quantization (24GB GPU memory), and CPU inference. It was trained using 2x8 A100 GPUs with 8bit-AdamW optimizer and specialized hyperparameters for optimal performance.

  • Batch size: 524,288 tokens
  • Learning rate: 1e-6 with 100-step warmup
  • Gradient accumulations: 2

Core Capabilities

  • Summarization and contextual question answering
  • Information extraction and structuring
  • Classification tasks
  • Few-shot learning applications
  • Dialog-style interactions

Frequently Asked Questions

Q: What makes this model unique?

The model combines large-scale parameters (20B) with extensive instruction tuning (43M examples) and human feedback fine-tuning, all while maintaining carbon neutrality. It's particularly notable for its strong performance in dialog tasks and structured information extraction.

Q: What are the recommended use cases?

The model excels in research applications, educational tools, and creative processes. It's particularly strong in summarization, question answering, classification, and information extraction tasks. However, it should not be used for safety-critical applications or decisions that significantly impact individuals or society.

The first platform built for prompt engineering