NeuralHermes-2.5-Mistral-7B

Property	Value
Parameter Count	7.24B
License	Apache 2.0
Base Model	teknium/OpenHermes-2.5-Mistral-7B
Training Method	Direct Preference Optimization (DPO)

What is NeuralHermes-2.5-Mistral-7B?

NeuralHermes-2.5-Mistral-7B is an advanced language model that builds upon the OpenHermes-2.5-Mistral-7B architecture, enhanced through Direct Preference Optimization using a carefully curated dataset. The model has demonstrated exceptional performance across multiple benchmarks, including ARC-Challenge (66.55% accuracy), HellaSwag (84.9% accuracy), and MMLU (63.32% accuracy).

Implementation Details

The model was trained using LoRA fine-tuning with specific hyperparameters including r=16, lora_alpha=16, and targeted optimization of key projection layers. Training was conducted on an A100 GPU for approximately one hour using the mlabonne/chatml_dpo_pairs dataset.

Implements ChatML template for consistent dialogue formatting
Available in multiple quantized versions (GGUF, AWQ, GPTQ, EXL2)
Optimized using paged_adamw_32bit optimizer with cosine learning rate scheduling

Core Capabilities

Strong performance in reasoning tasks (ARC Challenge)
Excellent common sense understanding (HellaSwag)
High accuracy in mathematical reasoning (GSM8k: 61.33%)
Improved truthfulness in responses (TruthfulQA: 54.93%)

Frequently Asked Questions

Q: What makes this model unique?

The model stands out for its improved performance over the original OpenHermes model across multiple benchmarks, achieved through careful DPO fine-tuning and optimization of the training process.

Q: What are the recommended use cases?

The model is well-suited for general text generation, instruction following, mathematical reasoning, and truthful Q&A applications. It can be easily integrated into applications using popular frameworks like transformers or LM Studio.