bertweet-base

Maintained By
vinai

BERTweet-base

PropertyValue
LicenseMIT
AuthorVINAI
Downloads81,578
FrameworkPyTorch, TensorFlow

What is bertweet-base?

BERTweet-base is a groundbreaking language model specifically pre-trained for English Tweets. As the first public large-scale language model of its kind, it leverages the RoBERTa pre-training procedure and has been trained on an impressive dataset of 850M English Tweets, including 845M general tweets from 2012-2019 and 5M COVID-19 related tweets.

Implementation Details

The model is built on the RoBERTa architecture and has been trained on approximately 16B word tokens, equivalent to about 80GB of text data. It supports multiple deep learning frameworks including PyTorch and TensorFlow, making it versatile for different development environments.

  • Pre-trained on 850M English Tweets
  • Implements RoBERTa architecture
  • Supports multiple frameworks
  • Includes COVID-19 specific data

Core Capabilities

  • Part-of-Speech Tagging
  • Named Entity Recognition
  • Sentiment Analysis
  • Irony Detection
  • Fill-Mask Task Support

Frequently Asked Questions

Q: What makes this model unique?

BERTweet is the first large-scale language model specifically designed for Twitter content, combining both general tweets and pandemic-related data for comprehensive coverage of social media language patterns.

Q: What are the recommended use cases?

The model excels in social media text analysis tasks including sentiment analysis, named entity recognition, part-of-speech tagging, and irony detection, making it ideal for Twitter-focused NLP applications.

The first platform built for prompt engineering