Aria

Property	Value
Parameter Count	25.3B (3.9B activated)
Model Type	Multimodal Native MoE
License	Apache-2.0
Paper	View Paper
Context Length	64K tokens

What is Aria?

Aria represents a significant advancement in multimodal AI models, combining efficient architecture with state-of-the-art performance. As the first open multimodal native Mixture-of-Experts (MoE) model, it achieves remarkable results across various tasks while maintaining computational efficiency through its innovative design of using only 3.9B activated parameters during inference.

Implementation Details

The model utilizes a mixture-of-experts architecture with BF16 precision, enabling efficient processing of both visual and textual inputs. It features a sophisticated visual encoder and can handle variable input sizes and aspect ratios, making it particularly versatile for real-world applications.

Supports up to 64K token context window
Processes 256-frame videos in just 10 seconds
Implements efficient parameter activation (3.5B MoE + 0.4B Visual Encoder)

Core Capabilities

Strong performance in video and document understanding
Advanced multimodal reasoning and knowledge integration
Competitive performance in math and visual QA tasks
Efficient processing of various input modalities
Strong coding capabilities with 73.2% on HumanEval

Frequently Asked Questions

Q: What makes this model unique?

Aria stands out for its efficient MoE architecture that achieves state-of-the-art performance while activating only 3.9B parameters during inference, making it both powerful and resource-efficient. It performs on par with GPT-4o mini and Gemini 1.5 Flash across various tasks.

Q: What are the recommended use cases?

Aria excels in document understanding (92.6% on DocQA), video analysis, visual question-answering, and general language tasks. It's particularly well-suited for applications requiring multimodal understanding, such as document processing, video analysis, and complex visual reasoning tasks.

Aria

Aria

What is Aria?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models