Karasu-Mixtral-8x22B-v0.1

Maintained By
lightblue

Karasu-Mixtral-8x22B-v0.1

PropertyValue
Parameter Count141B
Model TypeText Generation
ArchitectureMixtral-8x22B Fine-tuned
LicenseApache 2.0
Training DataGPT-4 Conversations

What is Karasu-Mixtral-8x22B-v0.1?

Karasu-Mixtral-8x22B-v0.1 is a sophisticated fine-tuned version of the Mixtral-8x22B base model, specifically optimized for multilingual conversations. The model was trained on over 9,000 high-quality conversations, including both English and non-English interactions, making it particularly effective for diverse language applications.

Implementation Details

The model was trained using Axolotl's 4bit QLoRA configuration for approximately 100 minutes on 4x A100 (80GB) GPUs. It leverages Deepspeed Zero2 for efficient multi-GPU training and achieves an inference speed of roughly 40 tokens/s in single batch operations.

  • Training dataset includes 6,206 conversations from openchat/openchat_sharegpt4_dataset
  • Additional 3,011 conversations with enhanced non-English representation
  • Implements BF16 tensor type for optimal performance
  • Uses vLLM for deployment with tensor parallelism

Core Capabilities

  • Multilingual conversation handling
  • High accuracy in factual responses
  • Creative text generation
  • Logical reasoning capabilities
  • Context-aware responses in multiple languages

Frequently Asked Questions

Q: What makes this model unique?

The model's distinctive feature is its enhanced multilingual capabilities combined with high accuracy in both factual and creative tasks, achieved through careful fine-tuning on diverse GPT-4 conversations.

Q: What are the recommended use cases?

The model excels in multilingual conversations, creative writing tasks, factual Q&A, and logical reasoning scenarios. It's particularly suitable for applications requiring robust multilingual support and high-quality conversational abilities.

The first platform built for prompt engineering