OneKE
Property | Value |
---|---|
License | CC-BY-NC-SA-4.0 |
Languages | English, Chinese |
Paper | IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus |
Downloads | 1,888 |
What is OneKE?
OneKE is a sophisticated bilingual large language model framework developed jointly by Ant Group and Zhejiang University, specifically designed for comprehensive knowledge extraction tasks. Built on Chinese-Alpaca-2-13B, it excels at performing generalized knowledge extraction in both Chinese and English across multiple domains and tasks.
Implementation Details
The model implements a schema-generalizable approach to information extraction, utilizing advanced techniques such as normalization and cleaning of extraction instructions, difficult negative sample collection, and schema-based batched instruction construction. It requires at least 20GB of VRAM for optimal performance.
- Supports Named Entity Recognition (NER), Relation Extraction (RE), and Event Extraction (EE)
- Implements a unified knowledge extraction framework with schema-based capabilities
- Utilizes 4-bit quantization for efficient deployment
Core Capabilities
- Bilingual processing in Chinese and English
- Zero-shot generalization across multiple domains
- Structured knowledge extraction with customizable schemas
- Support for complex event and relation extraction tasks
- Batch processing of multiple schemas
Frequently Asked Questions
Q: What makes this model unique?
OneKE stands out for its ability to perform schema-generalizable information extraction across multiple languages and domains, while maintaining high performance in zero-shot scenarios. Its unified framework significantly reduces the cost of building domain-specific knowledge graphs.
Q: What are the recommended use cases?
The model is ideal for converting unstructured documents into structured knowledge, particularly in domains like medical information extraction, financial report analysis, and public sector document processing. It's especially useful for building knowledge graphs and enhancing other large language models by providing structured information.