CodeBERT-C

Property	Value
Author	neulab
Task Type	Fill-Mask
Framework	PyTorch
Paper	CodeBERTScore Paper

What is codebert-c?

CodeBERT-C is a specialized language model based on microsoft/codebert-base-mlm, specifically trained on C programming language code from the codeparrot/github-code-clean dataset. The model underwent extensive training for 1,000,000 steps with a batch size of 32, focusing on masked language modeling tasks for C code understanding and generation.

Implementation Details

The model is implemented using the RoBERTa architecture and is primarily designed to support CodeBERTScore, a novel method for evaluating code generation. It leverages transformer-based architecture and supports inference endpoints for practical applications.

Trained on clean C code from GitHub
Utilizes masked language modeling for training
Built on microsoft/codebert-base-mlm architecture
Optimized with 1M training steps

Core Capabilities

Code understanding and evaluation
Masked language modeling for C code
Integration with CodeBERTScore framework
Support for automated code assessment

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for C programming language understanding and evaluation, making it particularly effective for code assessment tasks using the CodeBERTScore framework.

Q: What are the recommended use cases?

The model is primarily designed for code evaluation using CodeBERTScore, but can be adapted for various C code understanding tasks, including code analysis, completion, and quality assessment.

codebert-c