codebert-c

Maintained By
neulab

CodeBERT-C

PropertyValue
Authorneulab
Task TypeFill-Mask
FrameworkPyTorch
PaperCodeBERTScore Paper

What is codebert-c?

CodeBERT-C is a specialized language model based on microsoft/codebert-base-mlm, specifically trained on C programming language code from the codeparrot/github-code-clean dataset. The model underwent extensive training for 1,000,000 steps with a batch size of 32, focusing on masked language modeling tasks for C code understanding and generation.

Implementation Details

The model is implemented using the RoBERTa architecture and is primarily designed to support CodeBERTScore, a novel method for evaluating code generation. It leverages transformer-based architecture and supports inference endpoints for practical applications.

  • Trained on clean C code from GitHub
  • Utilizes masked language modeling for training
  • Built on microsoft/codebert-base-mlm architecture
  • Optimized with 1M training steps

Core Capabilities

  • Code understanding and evaluation
  • Masked language modeling for C code
  • Integration with CodeBERTScore framework
  • Support for automated code assessment

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically optimized for C programming language understanding and evaluation, making it particularly effective for code assessment tasks using the CodeBERTScore framework.

Q: What are the recommended use cases?

The model is primarily designed for code evaluation using CodeBERTScore, but can be adapted for various C code understanding tasks, including code analysis, completion, and quality assessment.

The first platform built for prompt engineering