bert-base-turkish-cased-ner

Property	Value
Parameter Count	110M
License	MIT
Framework	PyTorch, ONNX
Language	Turkish

What is bert-base-turkish-cased-ner?

This is a specialized Named Entity Recognition (NER) model fine-tuned on Turkish text, based on the dbmdz/bert-base-turkish-cased architecture. It's designed to identify and classify named entities in Turkish text, including persons (PER), organizations (ORG), and locations (LOC).

Implementation Details

The model was fine-tuned using carefully selected parameters including a batch size of 8, maximum sequence length of 512, and learning rate of 2e-5 over 3 epochs. It implements token classification with state-of-the-art performance, achieving an impressive overall accuracy of 99.61% and F1-score of 96.17%.

Supports 7 label classifications: O, B-PER, I-PER, B-ORG, I-ORG, B-LOC, I-LOC
Implements efficient token classification using transformer architecture
Uses cased tokenization for better accuracy in Turkish text

Core Capabilities

High-accuracy named entity recognition with 95.16% recall
Robust performance across different test sets (2001-2010)
Efficient processing of Turkish text with support for ONNX runtime
Easy integration with Hugging Face's pipeline architecture

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its exceptional performance on Turkish NER tasks, with consistent accuracy above 99% across various test sets. It's been extensively validated using the benchmark dataset from Küçük et al.'s research.

Q: What are the recommended use cases?

The model is ideal for Turkish text analysis tasks requiring entity extraction, including information retrieval, document processing, and automated content analysis. It's particularly effective for identifying people, organizations, and locations in Turkish text.