SciNCL
Property | Value |
---|---|
Parameter Count | 110M |
License | MIT |
Paper | Link to Paper |
Base Model | SciBERT |
What is SciNCL?
SciNCL is a sophisticated BERT-based language model specifically designed for generating high-quality document-level embeddings of research papers. Built upon SciBERT's architecture, it leverages citation graph neighborhoods through contrastive learning to create more meaningful representations of scientific documents.
Implementation Details
The model employs a unique training approach combining citation graph analysis with contrastive learning. It's initialized with SciBERT weights and further trained on the S2ORC citation graph, achieving state-of-the-art performance on the SciDocs benchmark suite.
- Pre-trained on scientific literature using neighborhood contrastive learning
- Utilizes citation graph structure for enhanced document understanding
- Implements sophisticated triplet mining strategies
- Supports both sentence-transformers and HuggingFace transformers implementations
Core Capabilities
- Document-level embedding generation for research papers
- High-performance similarity matching between scientific documents
- Robust handling of title and abstract combinations
- State-of-the-art performance on multiple scientific document tasks
- Achieves 81.9% average score across SciDocs benchmarks
Frequently Asked Questions
Q: What makes this model unique?
SciNCL's uniqueness lies in its citation-aware training approach and state-of-the-art performance on scientific document tasks. It significantly outperforms previous models like SPECTER and SciBERT on various metrics including citation prediction and document similarity.
Q: What are the recommended use cases?
The model is ideal for academic search engines, scientific paper recommendation systems, citation analysis, and research paper similarity matching. It's particularly effective for tasks requiring understanding of scientific document relationships and content-based paper retrieval.