SapBERT-from-PubMedBERT-fulltext-mean-token

Maintained By
cambridgeltl

SapBERT-from-PubMedBERT-fulltext-mean-token

PropertyValue
Parameter Count109M
Research PaperView Paper
Downloads148,061
Authorcambridgeltl

What is SapBERT-from-PubMedBERT-fulltext-mean-token?

SapBERT is a specialized biomedical language model that self-aligns the representation space of biomedical entities. Built upon PubMedBERT, it's trained on UMLS 2020AA (English) and optimized for medical entity linking tasks. The model employs a unique self-alignment pretraining approach to capture fine-grained semantic relationships in biomedical text.

Implementation Details

The model implements a mean-pooling architecture for entity representation and can process batched inputs efficiently. It uses PyTorch framework and supports both CPU and CUDA execution.

  • Supports batch processing with configurable batch sizes
  • Maximum sequence length of 25 tokens
  • Implements mean-pooling for entity embeddings
  • Compatible with HuggingFace Transformers library

Core Capabilities

  • Medical Entity Linking (MEL)
  • Biomedical entity representation learning
  • Fine-grained semantic relationship modeling
  • Scalable metric learning for large ontologies

Frequently Asked Questions

Q: What makes this model unique?

SapBERT offers a one-model-for-all solution for medical entity linking, achieving state-of-the-art performance on six MEL benchmarking datasets. Its self-alignment pretraining approach specifically addresses the challenges of capturing fine-grained semantic relationships in biomedical text.

Q: What are the recommended use cases?

The model is ideal for biomedical entity linking tasks, ontology mapping, and semantic similarity analysis in medical texts. It's particularly effective when working with UMLS concepts and medical terminology alignment.

The first platform built for prompt engineering