AMD-Llama-135m
Property | Value |
---|---|
Parameter Count | 135M |
License | Apache 2.0 |
Architecture | LLaMA-based |
Training Data | SlimPajama + Project Gutenberg (670B tokens) |
Research Paper | GPT-NeoX Paper |
What is AMD-Llama-135m?
AMD-Llama-135m is a lightweight language model trained on AMD Instinct MI250 accelerators, designed to be compatible with the LLaMA2 architecture. This model represents a significant achievement in creating efficient, smaller-scale language models that can serve both as standalone text generators and as draft models for speculative decoding with larger LLaMA2 and CodeLLama models.
Implementation Details
The model features a 12-layer architecture with 768 hidden dimensions and 12 attention heads. It utilizes advanced components like RMSNorm layer normalization, rotary positional embeddings (RoPE), and the Swiglu activation function. The model supports a context window of 2048 tokens and employs a vocabulary size of 32000.
- Trained on SlimPajama and Project Gutenberg datasets (670B tokens)
- Implements multi-head attention with 64-dimensional heads
- Optimized using AdamW with cosine learning rate scheduling
- Supports speculative decoding for performance acceleration
Core Capabilities
- General text generation and completion
- Code completion when finetuned (AMD-Llama-135m-code variant)
- Speculative decoding acceleration for larger models
- Competitive performance on various NLP benchmarks
Frequently Asked Questions
Q: What makes this model unique?
The model's ability to serve as an efficient draft model for speculative decoding while maintaining competitive performance despite its small size makes it unique. It achieves up to 3.88x throughput speedup when used as a draft model.
Q: What are the recommended use cases?
The model is particularly well-suited for deployment scenarios requiring efficient text generation, code completion tasks (when using the code-finetuned variant), and as a draft model for speculative decoding with larger LLaMA2 or CodeLlama models.