Yarn-Mistral-7b-128k

Maintained By
NousResearch

Yarn-Mistral-7b-128k

PropertyValue
Base ModelMistral-7B-v0.1
Context Window128,000 tokens
LicenseApache 2.0
PaperarXiv:2309.00071

What is Yarn-Mistral-7b-128k?

Yarn-Mistral-7b-128k is an advanced language model that extends the capabilities of Mistral-7B to handle significantly longer contexts. It has been further pretrained on long-context data for 1,500 steps using the YaRN extension method, enabling it to process up to 128,000 tokens while maintaining strong performance.

Implementation Details

The model requires specific implementation considerations, including the use of Flash Attention 2 and bfloat16 precision. It must be loaded with trust_remote_code=True and requires the latest version of the transformers library.

  • Supports 128k token context window
  • Built on Mistral-7B architecture
  • Utilizes Flash Attention 2 technology
  • Requires latest transformers library

Core Capabilities

  • Exceptional long-context performance with PPL scores of 2.19 at 128k context
  • Maintains strong performance on standard benchmarks (ARC-c: 58.87, Hellaswag: 80.58)
  • Minimal degradation in short-context tasks compared to base Mistral-7B
  • Optimized for both long and short-context applications

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to handle extremely long contexts (128k tokens) while maintaining performance comparable to the original Mistral-7B model. It shows impressive perplexity scores across various context lengths and minimal degradation in standard benchmark tasks.

Q: What are the recommended use cases?

The model is particularly well-suited for applications requiring long-context understanding, such as document analysis, extended conversations, and complex text processing tasks. It maintains strong performance in both long and short-context scenarios, making it versatile for various applications.

The first platform built for prompt engineering