codegen-16B-mono

Maintained By
Salesforce

CodeGen-16B-Mono

PropertyValue
AuthorSalesforce
Parameters16 Billion
LicenseBSD-3-Clause
PaperView Paper
Training Data71.7B tokens of Python code

What is codegen-16B-mono?

CodeGen-16B-mono is an advanced autoregressive language model specifically designed for program synthesis. Developed by Salesforce, it represents the largest variant (16B parameters) of the CodeGen family, pre-trained on a massive Python programming language dataset. The model was initially initialized with CodeGen-Multi 16B and further specialized on Python code, making it particularly effective at generating executable code from natural language descriptions.

Implementation Details

The model utilizes a transformer-based architecture and was trained using multiple TPU-v4-512 systems from Google, implementing both data and model parallelism. It employs cross-entropy loss to maximize the likelihood of sequential inputs and was trained on the BigPython dataset containing 71.7B tokens of Python code.

  • Built on transformer architecture for code generation
  • Trained using advanced TPU systems
  • Optimized for Python code generation
  • Supports autoregressive text generation

Core Capabilities

  • Program synthesis from natural language prompts
  • Code completion and generation
  • Python-specific code generation
  • Processing of both natural language and programming language inputs

Frequently Asked Questions

Q: What makes this model unique?

This model stands out due to its specialized training on Python code and its large parameter count (16B), making it particularly effective at understanding and generating Python code. It's specifically optimized for program synthesis tasks, converting natural language descriptions into executable code.

Q: What are the recommended use cases?

The model is best suited for program synthesis tasks where natural language descriptions need to be converted into executable Python code. It can also be used for code completion tasks and generating code snippets based on partial inputs. The model works best when prompts are provided in the form of comment strings.

The first platform built for prompt engineering