nomic-bert-2048
Property | Value |
---|---|
Parameter Count | 137M |
License | Apache 2.0 |
Tensor Type | F32 |
Max Sequence Length | 2048 tokens |
Training Data | Wikipedia, BookCorpus |
What is nomic-bert-2048?
nomic-bert-2048 is an advanced BERT model specifically designed to handle longer sequences of up to 2048 tokens, significantly extending the traditional BERT context window. This model incorporates modern architectural improvements while maintaining competitive performance on standard benchmarks like GLUE.
Implementation Details
The model implements several key architectural innovations from recent research, including Rotary Position Embeddings for better context length handling and SwiGLU activations for improved performance. It maintains zero dropout and achieves comparable results to RoBERTa-base while supporting 4x longer sequences.
- Rotary Position Embeddings for context length extrapolation
- SwiGLU activations for enhanced model performance
- Zero dropout rate for optimal training
- Trained on Wikipedia and BookCorpus with 2048-token sequences
Core Capabilities
- Masked Language Modeling with extended context
- Sequence Classification tasks
- Strong performance on GLUE benchmark (0.84 average score)
- Efficient handling of long-form text up to 2048 tokens
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to handle 2048-token sequences while maintaining competitive performance on standard benchmarks sets it apart from traditional BERT models that typically handle only 512 tokens.
Q: What are the recommended use cases?
The model is particularly well-suited for tasks requiring longer context understanding, such as document-level analysis, long-form text processing, and tasks that benefit from extended context windows like document classification or long-sequence masked language modeling.