CodeBERT-C
Property | Value |
---|---|
Author | neulab |
Task Type | Fill-Mask |
Framework | PyTorch |
Paper | CodeBERTScore Paper |
What is codebert-c?
CodeBERT-C is a specialized language model based on microsoft/codebert-base-mlm, specifically trained on C programming language code from the codeparrot/github-code-clean dataset. The model underwent extensive training for 1,000,000 steps with a batch size of 32, focusing on masked language modeling tasks for C code understanding and generation.
Implementation Details
The model is implemented using the RoBERTa architecture and is primarily designed to support CodeBERTScore, a novel method for evaluating code generation. It leverages transformer-based architecture and supports inference endpoints for practical applications.
- Trained on clean C code from GitHub
- Utilizes masked language modeling for training
- Built on microsoft/codebert-base-mlm architecture
- Optimized with 1M training steps
Core Capabilities
- Code understanding and evaluation
- Masked language modeling for C code
- Integration with CodeBERTScore framework
- Support for automated code assessment
Frequently Asked Questions
Q: What makes this model unique?
This model is specifically optimized for C programming language understanding and evaluation, making it particularly effective for code assessment tasks using the CodeBERTScore framework.
Q: What are the recommended use cases?
The model is primarily designed for code evaluation using CodeBERTScore, but can be adapted for various C code understanding tasks, including code analysis, completion, and quality assessment.