BiomedNLP-KRISSBERT-PubMed-UMLS-EL

Maintained By
microsoft

BiomedNLP-KRISSBERT-PubMed-UMLS-EL

PropertyValue
DeveloperMicrosoft
LicenseMIT
PaperKnowledge-Rich Self-Supervision for Biomedical Entity Linking
Base ModelPubMedBERT

What is BiomedNLP-KRISSBERT-PubMed-UMLS-EL?

KRISSBERT is a specialized biomedical entity linking model that leverages Knowledge-Rich Self-Supervision (KRISS) to address the challenges of entity disambiguation in medical text. Built upon PubMedBERT, this model is uniquely trained using biomedical entity names from the UMLS ontology and self-supervised examples from PubMed abstracts.

Implementation Details

The model employs a contextual encoding approach that considers both the entity mention and its surrounding context, distinguishing it from previous systems that ignored contextual information. It's specifically designed to handle ambiguous entity mentions by leveraging contextual clues to determine the correct entity from multiple possibilities.

  • Initialized with PubMedBERT parameters
  • Continuously pretrained on UMLS ontology data
  • Utilizes self-supervised learning from PubMed abstracts
  • Capable of disambiguating entities using contextual information

Core Capabilities

  • Accurate entity linking with state-of-the-art performance
  • Context-aware entity disambiguation
  • Handling of previously unseen entities
  • Support for canonical entity ID (CUI) prediction
  • Achievement of 58.3% top-1 accuracy on MedMentions dataset

Frequently Asked Questions

Q: What makes this model unique?

KRISSBERT's uniqueness lies in its ability to use context for entity disambiguation, unlike previous models that only matched surface forms. For example, it can correctly identify whether "ER" refers to Emergency Room, Estrogen Receptor Gene, or Endoplasmic Reticulum based on the surrounding text.

Q: What are the recommended use cases?

The model is particularly suited for biomedical text analysis, especially when dealing with ambiguous entity mentions that require contextual understanding for accurate disambiguation. It's ideal for processing medical literature, clinical notes, and other healthcare-related documents where precise entity identification is crucial.

The first platform built for prompt engineering