fnet-large

Maintained By
google

FNet-Large

PropertyValue
Architecture24-layer Transformer with Fourier Transform
Hidden Dimension1024
Training DataC4 (Cleaned Common Crawl)
PaperFNet: Mixing Tokens with Fourier Transforms
DeveloperGoogle

What is fnet-large?

FNet-large is an innovative transformer model that replaces traditional attention mechanisms with Fourier transforms, offering a unique approach to natural language processing. Developed by Google, it achieves impressive performance while potentially reducing computational complexity. The model is pre-trained on the C4 dataset using masked language modeling (MLM) and next sentence prediction (NSP) objectives, achieving 0.58 accuracy on MLM and 0.80 on NSP tasks.

Implementation Details

The model employs a 24-layer architecture with 1024 hidden dimensions. It processes input text using SentencePiece tokenization with a 32,000-token vocabulary. During training, the model was optimized using Adam with specific parameters (learning rate: 1e-4, β1=0.9, β2=0.999) and trained on 4 cloud TPUs in Pod configuration for one million steps.

  • Uses Fourier transforms instead of attention mechanisms
  • Maximum sequence length of 512 tokens
  • Implements masked language modeling with 15% token masking
  • Trained with weight decay of 0.01 and learning rate warmup for 10,000 steps

Core Capabilities

  • Masked Language Modeling (MLM)
  • Next Sentence Prediction (NSP)
  • Strong performance on GLUE benchmark tasks (81.9 average score)
  • Effective for sequence classification and token classification tasks
  • Suitable for fine-tuning on downstream tasks

Frequently Asked Questions

Q: What makes this model unique?

FNet-large's distinctive feature is its use of Fourier transforms instead of attention mechanisms, which potentially offers computational advantages while maintaining strong performance on NLP tasks. This makes it an interesting alternative to traditional transformer architectures.

Q: What are the recommended use cases?

The model is best suited for tasks that involve whole-sentence processing, such as sequence classification, token classification, and question answering. It's designed to be fine-tuned on downstream tasks rather than used directly for text generation. The model particularly excels in scenarios where bidirectional context is important.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.