Aria

Maintained By
rhymes-ai

Aria

PropertyValue
Parameter Count25.3B (3.9B activated)
Model TypeMultimodal Native MoE
LicenseApache-2.0
PaperView Paper
Context Length64K tokens

What is Aria?

Aria represents a significant advancement in multimodal AI models, combining efficient architecture with state-of-the-art performance. As the first open multimodal native Mixture-of-Experts (MoE) model, it achieves remarkable results across various tasks while maintaining computational efficiency through its innovative design of using only 3.9B activated parameters during inference.

Implementation Details

The model utilizes a mixture-of-experts architecture with BF16 precision, enabling efficient processing of both visual and textual inputs. It features a sophisticated visual encoder and can handle variable input sizes and aspect ratios, making it particularly versatile for real-world applications.

  • Supports up to 64K token context window
  • Processes 256-frame videos in just 10 seconds
  • Implements efficient parameter activation (3.5B MoE + 0.4B Visual Encoder)

Core Capabilities

  • Strong performance in video and document understanding
  • Advanced multimodal reasoning and knowledge integration
  • Competitive performance in math and visual QA tasks
  • Efficient processing of various input modalities
  • Strong coding capabilities with 73.2% on HumanEval

Frequently Asked Questions

Q: What makes this model unique?

Aria stands out for its efficient MoE architecture that achieves state-of-the-art performance while activating only 3.9B parameters during inference, making it both powerful and resource-efficient. It performs on par with GPT-4o mini and Gemini 1.5 Flash across various tasks.

Q: What are the recommended use cases?

Aria excels in document understanding (92.6% on DocQA), video analysis, visual question-answering, and general language tasks. It's particularly well-suited for applications requiring multimodal understanding, such as document processing, video analysis, and complex visual reasoning tasks.

The first platform built for prompt engineering