NuExtract-1.5

Property	Value
Parameter Count	3.82B
Model Type	Text Generation / Information Extraction
License	MIT
Supported Languages	English, French, Spanish, German, Portuguese, Italian
Base Model	Phi-3.5-mini-instruct

What is NuExtract-1.5?

NuExtract-1.5 is an advanced multilingual model designed specifically for structured information extraction. Built upon Microsoft's Phi-3.5-mini-instruct architecture, this model has been fine-tuned on a proprietary high-quality dataset to excel at extracting precise information from documents of varying lengths. With its compact yet powerful 3.82B parameter size, it provides efficient performance while maintaining high accuracy across multiple languages.

Implementation Details

The model utilizes BF16 tensor type for optimal performance and includes custom code for implementation. It's designed to handle both short and long-context scenarios, with demonstrated capability to process documents containing 10-20k tokens effectively.

Zero-shot multilingual support for six major European languages
Optimized for pure extraction tasks with recommended temperature setting of 0
Implements sliding window attention for handling arbitrary sequence lengths
Supports JSON template-based extraction for structured output

Core Capabilities

Precise information extraction from long documents
Template-based structured data extraction
Multi-lingual support with competitive performance
Efficient processing of documents up to 20k tokens
Superior performance compared to larger models in benchmark tests

Frequently Asked Questions

Q: What makes this model unique?

NuExtract-1.5 stands out for its ability to handle long documents while maintaining high accuracy in information extraction across multiple languages. It outperforms larger models while maintaining a relatively small parameter count of 3.82B.

Q: What are the recommended use cases?

The model is ideal for structured information extraction tasks, particularly when dealing with long documents in multiple languages. It's specifically designed for scenarios requiring precise extraction of information that exists within the source text, making it perfect for document parsing, data extraction, and automated information gathering.

NuExtract-1.5

NuExtract-1.5

What is NuExtract-1.5?

Implementation Details

Core Capabilities

Frequently Asked Questions

Q: What makes this model unique?

Q: What are the recommended use cases?

Related Models