span-marker-roberta-large-ontonotes5
Property | Value |
---|---|
Parameter Count | 355M |
License | Apache 2.0 |
F1 Score | 91.53% |
Author | tomaarsen |
What is span-marker-roberta-large-ontonotes5?
This is a sophisticated Named Entity Recognition (NER) model that leverages the SpanMarker architecture built on top of RoBERTa-large. It's specifically trained on the OntoNotes v5.0 dataset, achieving impressive performance metrics with 91.53% F1 score, 91.16% precision, and 91.91% recall.
Implementation Details
The model utilizes the SpanMarker framework combined with RoBERTa-large as its underlying encoder. It's implemented using PyTorch and supports inference through the span_marker library, making it easily accessible for practical applications.
- Built on RoBERTa-large architecture
- Trained on OntoNotes v5.0 dataset
- Implements token classification for NER tasks
- Uses Safetensors for efficient tensor operations
Core Capabilities
- High-accuracy named entity recognition
- Efficient token classification
- Handles complex entity relationships
- Optimized for English language processing
- Supports batch processing and inference endpoints
Frequently Asked Questions
Q: What makes this model unique?
This model combines the powerful RoBERTa-large architecture with SpanMarker's innovative approach to NER, resulting in state-of-the-art performance on the OntoNotes dataset. Its high F1 score and balanced precision-recall metrics make it particularly reliable for production environments.
Q: What are the recommended use cases?
The model excels at identifying named entities in English text, making it ideal for applications such as information extraction, document analysis, and automated content tagging. However, users should note that optimal performance requires proper text preprocessing, particularly regarding punctuation separation.