SapBERT-UMLS-2020AB-all-lang-from-XLMR
Property | Value |
---|---|
Parameter Count | 278M |
Model Type | Feature Extraction / Transformers |
Base Architecture | XLM-RoBERTa |
Paper | Research Paper |
Downloads | 748,964 |
What is SapBERT-UMLS-2020AB-all-lang-from-XLMR?
SapBERT-UMLS-2020AB is a specialized biomedical language model built on XLM-RoBERTa, designed for cross-lingual biomedical entity linking. Featured in ACL 2021, this model leverages the UMLS 2020AB dataset to create powerful multilingual biomedical entity representations.
Implementation Details
The model implements a sophisticated architecture that utilizes the [CLS] token as the primary representation for input text. It's optimized for processing biomedical terms across multiple languages, with a maximum sequence length of 25 tokens and support for batch processing.
- Built on xlm-roberta-base architecture
- Supports multilingual biomedical entity representation
- Optimized for batch processing with configurable batch sizes
- Uses PyTorch framework with Safetensors support
Core Capabilities
- Cross-lingual biomedical entity linking
- Multilingual text representation
- Efficient batch processing of medical terms
- Feature extraction for biomedical entities
Frequently Asked Questions
Q: What makes this model unique?
This model's uniqueness lies in its specialized training on UMLS 2020AB data and its ability to handle cross-lingual biomedical entity linking, making it particularly valuable for multilingual medical text processing.
Q: What are the recommended use cases?
The model is ideal for biomedical entity linking across different languages, medical term similarity matching, and creating standardized representations of medical concepts in multilingual contexts.