llama-160m

Maintained By
JackFram

llama-160m

PropertyValue
Parameter Count162M parameters
LicenseApache 2.0
Tensor TypeF32
PaperSpecInfer Paper
Training DataWikipedia, C4-en, C4-realnewslike

What is llama-160m?

llama-160m is a compact language model designed as a base Small Speculative Model for the SpecInfer research project. With just 162M parameters, it's a lightweight alternative to larger LLaMA models, trained specifically on Wikipedia and portions of the C4 dataset.

Implementation Details

This model implements a LLaMA-like architecture using PyTorch and Safetensors, optimized for text generation tasks. It's particularly noteworthy for its role in speculative inference acceleration research, as detailed in the SpecInfer paper.

  • Utilizes F32 tensor format for computation
  • Implements text-generation-inference pipeline
  • Trained on high-quality text data from Wikipedia and C4 datasets

Core Capabilities

  • Text generation and completion tasks
  • Serves as a research tool for speculative inference
  • Efficient deployment through inference endpoints
  • English language text processing

Frequently Asked Questions

Q: What makes this model unique?

This model's primary distinction is its role as a Small Speculative Model in the SpecInfer framework, offering a balance between computational efficiency and performance for research purposes.

Q: What are the recommended use cases?

While formal evaluation is pending, the model is primarily designed for research in speculative inference acceleration and can be used for basic text generation tasks where a lightweight model is preferred.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.