SapBERT-UMLS-2020AB-all-lang-from-XLMR

Maintained By
cambridgeltl

SapBERT-UMLS-2020AB-all-lang-from-XLMR

PropertyValue
Parameter Count278M
Model TypeFeature Extraction / Transformers
Base ArchitectureXLM-RoBERTa
PaperResearch Paper
Downloads748,964

What is SapBERT-UMLS-2020AB-all-lang-from-XLMR?

SapBERT-UMLS-2020AB is a specialized biomedical language model built on XLM-RoBERTa, designed for cross-lingual biomedical entity linking. Featured in ACL 2021, this model leverages the UMLS 2020AB dataset to create powerful multilingual biomedical entity representations.

Implementation Details

The model implements a sophisticated architecture that utilizes the [CLS] token as the primary representation for input text. It's optimized for processing biomedical terms across multiple languages, with a maximum sequence length of 25 tokens and support for batch processing.

  • Built on xlm-roberta-base architecture
  • Supports multilingual biomedical entity representation
  • Optimized for batch processing with configurable batch sizes
  • Uses PyTorch framework with Safetensors support

Core Capabilities

  • Cross-lingual biomedical entity linking
  • Multilingual text representation
  • Efficient batch processing of medical terms
  • Feature extraction for biomedical entities

Frequently Asked Questions

Q: What makes this model unique?

This model's uniqueness lies in its specialized training on UMLS 2020AB data and its ability to handle cross-lingual biomedical entity linking, making it particularly valuable for multilingual medical text processing.

Q: What are the recommended use cases?

The model is ideal for biomedical entity linking across different languages, medical term similarity matching, and creating standardized representations of medical concepts in multilingual contexts.

The first platform built for prompt engineering