llama-3-70B-Instruct-abliterated

Property	Value
Parameter Count	70.6B
Model Type	Instruction-tuned LLM
License	llama3
Format	BF16 Safetensors

What is llama-3-70B-Instruct-abliterated?

This is a specialized variant of Meta's Llama-3-70B-Instruct model that implements an innovative orthogonalization technique to modify the model's refusal behavior. Based on research exploring refusal mechanisms in LLMs, this model has undergone specific weight manipulations to reduce its tendency to refuse requests while maintaining its core capabilities.

Implementation Details

The model utilizes orthogonalized bfloat16 safetensor weights, implementing the methodology described in the research "Refusal in LLMs is mediated by a single direction." The modification process involves manipulating certain weights to inhibit refusal responses while preserving other model behaviors and training aspects.

70.6B parameter architecture
BF16 tensor format for efficient computation
Includes refusal_dir.pth for custom implementation
Available in GGUF quantized versions

Core Capabilities

Reduced refusal responses compared to base model
Maintains original instruction-following capabilities
Supports text generation and conversational tasks
Compatible with text-generation-inference systems

Frequently Asked Questions

Q: What makes this model unique?

The model's unique feature is its modified architecture that reduces refusal behaviors through orthogonalization, while maintaining the base capabilities of Llama-3-70B-Instruct.

Q: What are the recommended use cases?

This model is suited for applications where standard LLM refusal behaviors might be overly restrictive, while still maintaining ethical considerations. It's important to note that the model may still express ethical concerns or refuse in certain scenarios.