flan-ul2

Maintained By
google

Flan-UL2

PropertyValue
Model Size20B parameters
ArchitectureT5-based Encoder-Decoder
LicenseApache 2.0
PaperUnifying Language Learning Paradigms

What is Flan-UL2?

Flan-UL2 is an advanced language model that combines the UL2 architecture with Flan instruction tuning. It represents a significant evolution in language model development, featuring a 2048-token receptive field and removing the need for mode tokens present in the original UL2. The model has been trained on a massive scale using the C4 corpus and fine-tuned with the Flan instruction dataset collection.

Implementation Details

The model architecture consists of 32 encoder layers and 32 decoder layers, with a model dimension of 4096 and feed-forward dimension of 16384. It employs 16 attention heads, each with a dimension of 256. Training involved 1 trillion tokens over 2 million steps, using a batch size of 1024 and sequence length of 512/512 for inputs and targets.

  • Utilizes Mixture-of-Denoisers (MoD) pre-training objective
  • Implements three denoising strategies: R-Denoiser, S-Denoiser, and X-Denoiser
  • Supports efficient 8-bit quantization for memory optimization
  • Compatible with bfloat16 precision training

Core Capabilities

  • Multi-language support (English, French, Romanian, German)
  • Advanced text generation and translation
  • Question answering and logical reasoning
  • Mathematical problem solving
  • Step-by-step reasoning tasks
  • Scientific knowledge processing

Frequently Asked Questions

Q: What makes this model unique?

Flan-UL2 uniquely combines the UL2 architecture with Flan instruction tuning, offering improved few-shot learning capabilities and removing the need for mode tokens. It achieved significant improvements over its predecessor, showing gains of up to 7.4% on certain tasks.

Q: What are the recommended use cases?

The model excels in diverse NLP tasks including language generation, understanding, text classification, question answering, commonsense reasoning, long text reasoning, and information retrieval. It's particularly effective for tasks requiring few-shot learning and complex reasoning.

The first platform built for prompt engineering