boomer-1b

Maintained By
budecosystem

boomer-1b

PropertyValue
Parameter Count1.1B
LicenseApache 2.0
Training Data41B tokens
ArchitectureCustom with Flash Attention
Training Infrastructure4x A100 80GB GPUs

What is boomer-1b?

Boomer-1b is an open-source language model developed by budecosystem, featuring 1.1 billion parameters and trained on a diverse dataset of 41B tokens. The model incorporates innovative architectural elements including flash attention and an enhanced MLP layer with an intermediate dimension of 11008.

Implementation Details

The model architecture consists of 4 layers with 32 attention heads and a model dimension of 4096. It uses a SentencePiece tokenizer with a vocabulary size of 32000 and supports sequence lengths up to 4096 tokens. Training was conducted on 4 A100 80GB GPUs for approximately 250 hours using AdamW optimizer with mixed precision training.

  • Custom architecture with flash attention
  • Enhanced MLP layer intermediate dimensions
  • Efficient training implementation with DeepSpeed support
  • SentencePiece tokenizer for open-vocabulary tasks

Core Capabilities

  • Text generation and language modeling
  • Edge device deployment capabilities
  • Retrieval augmentation support
  • Benchmark performance: MMLU (25.92%), ARC (22.35%), HumanEval (6.1%)

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient architecture choices like flash attention with a custom-curated dataset, making it particularly suitable for edge deployment and retrieval-augmented generation tasks. Its moderate size of 1.1B parameters makes it accessible for deployment on resource-constrained environments.

Q: What are the recommended use cases?

Boomer-1b is particularly well-suited for retrieval augmentation, edge device deployment, and general language modeling tasks. Its architecture and training make it efficient for scenarios requiring a balance between performance and resource utilization.

The first platform built for prompt engineering