bert-base-turkish-cased-ner
Property | Value |
---|---|
Parameter Count | 110M |
License | MIT |
Framework | PyTorch, ONNX |
Language | Turkish |
What is bert-base-turkish-cased-ner?
This is a specialized Named Entity Recognition (NER) model fine-tuned on Turkish text, based on the dbmdz/bert-base-turkish-cased architecture. It's designed to identify and classify named entities in Turkish text, including persons (PER), organizations (ORG), and locations (LOC).
Implementation Details
The model was fine-tuned using carefully selected parameters including a batch size of 8, maximum sequence length of 512, and learning rate of 2e-5 over 3 epochs. It implements token classification with state-of-the-art performance, achieving an impressive overall accuracy of 99.61% and F1-score of 96.17%.
- Supports 7 label classifications: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC
- Implements efficient token classification using transformer architecture
- Uses cased tokenization for better accuracy in Turkish text
Core Capabilities
- High-accuracy named entity recognition with 95.16% recall
- Robust performance across different test sets (2001-2010)
- Efficient processing of Turkish text with support for ONNX runtime
- Easy integration with Hugging Face's pipeline architecture
Frequently Asked Questions
Q: What makes this model unique?
This model stands out for its exceptional performance on Turkish NER tasks, with consistent accuracy above 99% across various test sets. It's been extensively validated using the benchmark dataset from Küçük et al.'s research.
Q: What are the recommended use cases?
The model is ideal for Turkish text analysis tasks requiring entity extraction, including information retrieval, document processing, and automated content analysis. It's particularly effective for identifying people, organizations, and locations in Turkish text.