VulBERTa-MLP-Devign

Maintained By
claudios

VulBERTa-MLP-Devign

PropertyValue
Parameter Count125M
LicenseMIT
PaperarXiv:2205.12424
Accuracy64.71%
F1 Score56.93%

What is VulBERTa-MLP-Devign?

VulBERTa-MLP-Devign is a specialized deep learning model designed for detecting security vulnerabilities in source code. Based on the RoBERTa architecture, it combines a pre-trained transformer model with an MLP (Multi-Layer Perceptron) classification head, specifically optimized for analyzing C/C++ code.

Implementation Details

The model utilizes a custom tokenization pipeline that includes automatic comment removal and specialized code processing. It requires libclang for tokenization and must be initialized with trust_remote_code=True due to its custom components.

  • Pre-trained on real-world code from open-source C/C++ projects
  • Implements binary classification for vulnerability detection
  • Achieves 64.71% accuracy and 71.02% ROC-AUC score
  • Uses F32 tensor type for computations

Core Capabilities

  • Automated vulnerability detection in C/C++ source code
  • Deep semantic code analysis
  • Binary classification of secure vs vulnerable code segments
  • Support for complex code structures and patterns

Frequently Asked Questions

Q: What makes this model unique?

VulBERTa stands out for its simplified yet effective approach to code vulnerability detection, achieving state-of-the-art performance with a relatively modest parameter count of 125M. Its custom tokenization pipeline and specialized pre-training on real-world code make it particularly effective for practical applications.

Q: What are the recommended use cases?

The model is specifically designed for security teams and developers who need to analyze C/C++ codebases for potential security vulnerabilities. It's particularly useful in automated code review processes and continuous integration pipelines where security scanning is required.

The first platform built for prompt engineering