span-marker-roberta-large-ontonotes5

Property	Value
Parameter Count	355M
License	Apache 2.0
F1 Score	91.53%
Author	tomaarsen

What is span-marker-roberta-large-ontonotes5?

This is a sophisticated Named Entity Recognition (NER) model that leverages the SpanMarker architecture built on top of RoBERTa-large. It's specifically trained on the OntoNotes v5.0 dataset, achieving impressive performance metrics with 91.53% F1 score, 91.16% precision, and 91.91% recall.

Implementation Details

The model utilizes the SpanMarker framework combined with RoBERTa-large as its underlying encoder. It's implemented using PyTorch and supports inference through the span_marker library, making it easily accessible for practical applications.

Built on RoBERTa-large architecture
Trained on OntoNotes v5.0 dataset
Implements token classification for NER tasks
Uses Safetensors for efficient tensor operations

Core Capabilities

High-accuracy named entity recognition
Efficient token classification
Handles complex entity relationships
Optimized for English language processing
Supports batch processing and inference endpoints

Frequently Asked Questions

Q: What makes this model unique?

This model combines the powerful RoBERTa-large architecture with SpanMarker's innovative approach to NER, resulting in state-of-the-art performance on the OntoNotes dataset. Its high F1 score and balanced precision-recall metrics make it particularly reliable for production environments.

Q: What are the recommended use cases?

The model excels at identifying named entities in English text, making it ideal for applications such as information extraction, document analysis, and automated content tagging. However, users should note that optimal performance requires proper text preprocessing, particularly regarding punctuation separation.