llama-3-70B-Instruct-abliterated
Property | Value |
---|---|
Parameter Count | 70.6B |
Model Type | Instruction-tuned LLM |
License | llama3 |
Format | BF16 Safetensors |
What is llama-3-70B-Instruct-abliterated?
This is a specialized variant of Meta's Llama-3-70B-Instruct model that implements an innovative orthogonalization technique to modify the model's refusal behavior. Based on research exploring refusal mechanisms in LLMs, this model has undergone specific weight manipulations to reduce its tendency to refuse requests while maintaining its core capabilities.
Implementation Details
The model utilizes orthogonalized bfloat16 safetensor weights, implementing the methodology described in the research "Refusal in LLMs is mediated by a single direction." The modification process involves manipulating certain weights to inhibit refusal responses while preserving other model behaviors and training aspects.
- 70.6B parameter architecture
- BF16 tensor format for efficient computation
- Includes refusal_dir.pth for custom implementation
- Available in GGUF quantized versions
Core Capabilities
- Reduced refusal responses compared to base model
- Maintains original instruction-following capabilities
- Supports text generation and conversational tasks
- Compatible with text-generation-inference systems
Frequently Asked Questions
Q: What makes this model unique?
The model's unique feature is its modified architecture that reduces refusal behaviors through orthogonalization, while maintaining the base capabilities of Llama-3-70B-Instruct.
Q: What are the recommended use cases?
This model is suited for applications where standard LLM refusal behaviors might be overly restrictive, while still maintaining ethical considerations. It's important to note that the model may still express ethical concerns or refuse in certain scenarios.