span-marker-mbert-base-multinerd

Property	Value
Parameter Count	178M
License	CC-BY-NC-SA 4.0
F1 Score	92.48%
Languages	Multilingual

What is span-marker-mbert-base-multinerd?

This is a sophisticated multilingual Named Entity Recognition (NER) model that leverages the SpanMarker architecture with BERT-base-multilingual-cased as its foundation. The model excels at identifying 15 different types of entities across multiple languages, achieving an impressive overall F1 score of 92.48%. It's particularly noteworthy for its comprehensive entity coverage, ranging from person names and organizations to more specialized categories like celestial bodies and mythological entities.

Implementation Details

The model utilizes a span-based approach for entity recognition, trained on the MultiNERD dataset using the SpanMarker framework. It was trained for a single epoch with a learning rate of 5e-05 and batch size of 32, incorporating linear learning rate scheduling with 0.1 warmup ratio.

Architecture: BERT-base-multilingual-cased encoder
Training Framework: SpanMarker 1.2.4 with PyTorch 1.13.1
Optimization: Adam optimizer with custom warmup scheduling

Core Capabilities

Supports 15 entity types including PER, ORG, LOC, ANIM, BIO, CEL, DIS, EVE, FOOD, INST, MEDIA, PLANT, MYTH, TIME, VEHI
Multilingual support with strong performance across 10 languages
High precision (93.39%) and recall (91.59%) metrics
Optimized for cased text processing

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its comprehensive multilingual capabilities and extensive entity type coverage. Unlike many NER models that focus on traditional entity types, it can recognize specialized categories like celestial bodies and mythological entities across multiple languages.

Q: What are the recommended use cases?

The model is ideal for multilingual text analysis, information extraction, and content classification tasks. It's particularly suitable for applications requiring detailed entity recognition across diverse domains like scientific literature, news articles, and cultural content analysis.