BioMedLM

Maintained By
stanford-crfm

BioMedLM

PropertyValue
Parameters2.7B
DeveloperStanford CRFM & MosaicML
Licensebigscience-bloom-rail-1.0
PaperView Paper

What is BioMedLM?

BioMedLM is a specialized language model trained exclusively on biomedical abstracts and papers from The Pile. This 2.7B parameter model represents a significant advancement in biomedical NLP, achieving state-of-the-art performance of 50.3% accuracy on the MedQA biomedical question answering task. The model utilizes a custom tokenizer specifically trained on PubMed abstracts to optimize performance on biomedical terminology.

Implementation Details

The model implements a GPT-2 architecture with Flash Attention, featuring 2560 hidden size, 20 heads, and 32 layers. Training was conducted on MosaicML Cloud using 128 A100-40GB GPUs over approximately 6.25 days, processing 300B tokens with a batch size of 1024 and sequence length of 1024.

  • Custom biomedical tokenizer with 28,896 vocab size
  • Trained using PyTorch FSDP and Composer library
  • Optimized with Decoupled AdamW (lr=1.6e-4, weight decay=1.6e-5)

Core Capabilities

  • State-of-the-art biomedical question answering
  • Efficient processing of medical terminology
  • Research-focused text generation
  • Downstream task fine-tuning potential

Frequently Asked Questions

Q: What makes this model unique?

BioMedLM's uniqueness lies in its specialized training on biomedical literature and custom tokenizer that efficiently handles medical terminology as single tokens rather than subwords, enabling better comprehension of domain-specific concepts.

Q: What are the recommended use cases?

The model is recommended primarily for research purposes and downstream task fine-tuning. It is explicitly not recommended for production deployment or direct medical advice, as specified in its license restrictions.

The first platform built for prompt engineering