boomer-1b

Property	Value
Parameter Count	1.1B
License	Apache 2.0
Training Data	41B tokens
Architecture	Custom with Flash Attention
Training Infrastructure	4x A100 80GB GPUs

What is boomer-1b?

Boomer-1b is an open-source language model developed by budecosystem, featuring 1.1 billion parameters and trained on a diverse dataset of 41B tokens. The model incorporates innovative architectural elements including flash attention and an enhanced MLP layer with an intermediate dimension of 11008.

Implementation Details

The model architecture consists of 4 layers with 32 attention heads and a model dimension of 4096. It uses a SentencePiece tokenizer with a vocabulary size of 32000 and supports sequence lengths up to 4096 tokens. Training was conducted on 4 A100 80GB GPUs for approximately 250 hours using AdamW optimizer with mixed precision training.

Custom architecture with flash attention
Enhanced MLP layer intermediate dimensions
Efficient training implementation with DeepSpeed support
SentencePiece tokenizer for open-vocabulary tasks

Core Capabilities

Text generation and language modeling
Edge device deployment capabilities
Retrieval augmentation support
Benchmark performance: MMLU (25.92%), ARC (22.35%), HumanEval (6.1%)

Frequently Asked Questions

Q: What makes this model unique?

The model combines efficient architecture choices like flash attention with a custom-curated dataset, making it particularly suitable for edge deployment and retrieval-augmented generation tasks. Its moderate size of 1.1B parameters makes it accessible for deployment on resource-constrained environments.

Q: What are the recommended use cases?

Boomer-1b is particularly well-suited for retrieval augmentation, edge device deployment, and general language modeling tasks. Its architecture and training make it efficient for scenarios requiring a balance between performance and resource utilization.

boomer-1b

boomer-1b

What is boomer-1b?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models