BioMedLM

Property	Value
Parameters	2.7B
Developer	Stanford CRFM & MosaicML
License	bigscience-bloom-rail-1.0
Paper	View Paper

What is BioMedLM?

BioMedLM is a specialized language model trained exclusively on biomedical abstracts and papers from The Pile. This 2.7B parameter model represents a significant advancement in biomedical NLP, achieving state-of-the-art performance of 50.3% accuracy on the MedQA biomedical question answering task. The model utilizes a custom tokenizer specifically trained on PubMed abstracts to optimize performance on biomedical terminology.

Implementation Details

The model implements a GPT-2 architecture with Flash Attention, featuring 2560 hidden size, 20 heads, and 32 layers. Training was conducted on MosaicML Cloud using 128 A100-40GB GPUs over approximately 6.25 days, processing 300B tokens with a batch size of 1024 and sequence length of 1024.

Custom biomedical tokenizer with 28,896 vocab size
Trained using PyTorch FSDP and Composer library
Optimized with Decoupled AdamW (lr=1.6e-4, weight decay=1.6e-5)

Core Capabilities

State-of-the-art biomedical question answering
Efficient processing of medical terminology
Research-focused text generation
Downstream task fine-tuning potential

Frequently Asked Questions

Q: What makes this model unique?

BioMedLM's uniqueness lies in its specialized training on biomedical literature and custom tokenizer that efficiently handles medical terminology as single tokens rather than subwords, enabling better comprehension of domain-specific concepts.

Q: What are the recommended use cases?

The model is recommended primarily for research purposes and downstream task fine-tuning. It is explicitly not recommended for production deployment or direct medical advice, as specified in its license restrictions.

BioMedLM

BioMedLM

What is BioMedLM?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models