llama-160m
Property | Value |
---|---|
Parameter Count | 162M parameters |
License | Apache 2.0 |
Tensor Type | F32 |
Paper | SpecInfer Paper |
Training Data | Wikipedia, C4-en, C4-realnewslike |
What is llama-160m?
llama-160m is a compact language model designed as a base Small Speculative Model for the SpecInfer research project. With just 162M parameters, it's a lightweight alternative to larger LLaMA models, trained specifically on Wikipedia and portions of the C4 dataset.
Implementation Details
This model implements a LLaMA-like architecture using PyTorch and Safetensors, optimized for text generation tasks. It's particularly noteworthy for its role in speculative inference acceleration research, as detailed in the SpecInfer paper.
- Utilizes F32 tensor format for computation
- Implements text-generation-inference pipeline
- Trained on high-quality text data from Wikipedia and C4 datasets
Core Capabilities
- Text generation and completion tasks
- Serves as a research tool for speculative inference
- Efficient deployment through inference endpoints
- English language text processing
Frequently Asked Questions
Q: What makes this model unique?
This model's primary distinction is its role as a Small Speculative Model in the SpecInfer framework, offering a balance between computational efficiency and performance for research purposes.
Q: What are the recommended use cases?
While formal evaluation is pending, the model is primarily designed for research in speculative inference acceleration and can be used for basic text generation tasks where a lightweight model is preferred.