hyenadna-medium-160k-seqlen-hf

Maintained By
LongSafari

HyenaDNA Medium 160k

PropertyValue
Model TypeGenomic Foundation Model
Maximum Sequence Length160,000 nucleotides
GPU Requirements (Training)T4 GPU (16GB VRAM)
PaperarXiv:2306.15794

What is hyenadna-medium-160k-seqlen-hf?

HyenaDNA is a groundbreaking genomic foundation model designed for processing extremely long DNA sequences at single nucleotide resolution. This medium-sized variant supports sequences up to 160,000 nucleotides in length, representing a significant advancement in genomic sequence modeling.

Implementation Details

The model employs Hyena operators as an efficient replacement for traditional attention mechanisms in Transformers. It processes DNA sequences using a single character tokenizer with a vocabulary of 4 nucleotides plus special tokens, enabling true single-nucleotide resolution analysis.

  • Subquadratic computation complexity for efficient processing
  • Implicit long convolution for global receptive field at each layer
  • Support for bfloat16 precision for improved speed and memory usage
  • Gradient checkpointing capability for memory optimization

Core Capabilities

  • Long-range genomic sequence analysis up to 160k nucleotides
  • Next token (nucleotide) prediction
  • Regulatory element prediction
  • Chromatin profile analysis
  • Species classification
  • In-context learning with soft prompt tunable tokens

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its ability to process exceptionally long DNA sequences at single nucleotide resolution, achieving 500x longer context lengths than previous genomic Transformer models. It trains 160x faster at sequence length 1M compared to Flash Attention implementations.

Q: What are the recommended use cases?

The model is ideal for genomic research applications including regulatory element prediction, chromatin profile analysis, and species classification. It can be fine-tuned for specific sequence classification tasks and supports various genomic analysis workflows.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.