NuExtract-large
Property | Value |
---|---|
Parameter Count | 7.39B |
License | MIT |
Tensor Type | F32/BF16 |
Base Model | phi-3-small |
What is NuExtract-large?
NuExtract-large is an advanced information extraction model built on Microsoft's phi-3-small architecture, fine-tuned on a proprietary high-quality synthetic dataset. This 7.39B parameter model specializes in precise information extraction from text inputs up to 2000 tokens, utilizing JSON templates for structured data extraction.
Implementation Details
The model operates using a template-based approach where users provide both input text and a JSON schema describing the desired information structure. It's purely extractive, meaning all output text must exist within the original input. The implementation supports both F32 and BF16 tensor types for flexible deployment options.
- Template-based extraction system using JSON schemas
- Support for example-based formatting
- Pure extraction capability ensuring output fidelity
- Input limitation of 2000 tokens
- Available in multiple sizes (tiny-0.5B, base-3.8B, large-7.39B)
Core Capabilities
- Structured information extraction from unstructured text
- Template-driven extraction with custom schemas
- Example-based learning for precise formatting
- High-fidelity extraction with no hallucination
Frequently Asked Questions
Q: What makes this model unique?
NuExtract-large's uniqueness lies in its purely extractive nature and template-based approach, ensuring high accuracy in information extraction tasks while maintaining original text fidelity. The model's ability to work with custom JSON schemas makes it highly adaptable to various extraction needs.
Q: What are the recommended use cases?
The model is ideal for structured data extraction from documents, automated information gathering, and template-based text analysis. It's particularly useful in scenarios requiring precise extraction of specific information patterns from larger text bodies.