BERTweet-base
Property | Value |
---|---|
License | MIT |
Author | VINAI |
Downloads | 81,578 |
Framework | PyTorch, TensorFlow |
What is bertweet-base?
BERTweet-base is a groundbreaking language model specifically pre-trained for English Tweets. As the first public large-scale language model of its kind, it leverages the RoBERTa pre-training procedure and has been trained on an impressive dataset of 850M English Tweets, including 845M general tweets from 2012-2019 and 5M COVID-19 related tweets.
Implementation Details
The model is built on the RoBERTa architecture and has been trained on approximately 16B word tokens, equivalent to about 80GB of text data. It supports multiple deep learning frameworks including PyTorch and TensorFlow, making it versatile for different development environments.
- Pre-trained on 850M English Tweets
- Implements RoBERTa architecture
- Supports multiple frameworks
- Includes COVID-19 specific data
Core Capabilities
- Part-of-Speech Tagging
- Named Entity Recognition
- Sentiment Analysis
- Irony Detection
- Fill-Mask Task Support
Frequently Asked Questions
Q: What makes this model unique?
BERTweet is the first large-scale language model specifically designed for Twitter content, combining both general tweets and pandemic-related data for comprehensive coverage of social media language patterns.
Q: What are the recommended use cases?
The model excels in social media text analysis tasks including sentiment analysis, named entity recognition, part-of-speech tagging, and irony detection, making it ideal for Twitter-focused NLP applications.