llama-160m

Property	Value
Parameter Count	162M parameters
License	Apache 2.0
Tensor Type	F32
Paper	SpecInfer Paper
Training Data	Wikipedia, C4-en, C4-realnewslike

What is llama-160m?

llama-160m is a compact language model designed as a base Small Speculative Model for the SpecInfer research project. With just 162M parameters, it's a lightweight alternative to larger LLaMA models, trained specifically on Wikipedia and portions of the C4 dataset.

Implementation Details

This model implements a LLaMA-like architecture using PyTorch and Safetensors, optimized for text generation tasks. It's particularly noteworthy for its role in speculative inference acceleration research, as detailed in the SpecInfer paper.

Utilizes F32 tensor format for computation
Implements text-generation-inference pipeline
Trained on high-quality text data from Wikipedia and C4 datasets

Core Capabilities

Text generation and completion tasks
Serves as a research tool for speculative inference
Efficient deployment through inference endpoints
English language text processing

Frequently Asked Questions

Q: What makes this model unique?

This model's primary distinction is its role as a Small Speculative Model in the SpecInfer framework, offering a balance between computational efficiency and performance for research purposes.

Q: What are the recommended use cases?

While formal evaluation is pending, the model is primarily designed for research in speculative inference acceleration and can be used for basic text generation tasks where a lightweight model is preferred.

llama-160m

llama-160m

What is llama-160m?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models