legal-bert-base-uncased

Maintained By
nlpaueb

Legal-BERT-Base-Uncased

PropertyValue
ArchitectureBERT-Base (12-layer, 768-hidden, 12-heads)
Parameters110M
LicenseCC-BY-SA-4.0
AuthorAUEB NLP Group
Training Data12GB of Legal Texts

What is legal-bert-base-uncased?

Legal-BERT is a specialized BERT model pre-trained on a diverse collection of legal texts, specifically designed for legal NLP applications. The model was trained on 12GB of legal documents including legislation, court cases, and contracts from various jurisdictions including the EU, UK, and USA.

Implementation Details

The model follows the BERT-base architecture with 12 transformer layers, 768 hidden dimensions, and 12 attention heads. It was trained for 1 million steps with a batch size of 256 and sequence length of 512, using a Google Cloud TPU v3-8.

  • Pre-trained on 6 different types of legal documents including EU legislation, UK legislation, and US contracts
  • Implements the standard BERT masked language modeling objective
  • Achieves superior performance on legal domain tasks compared to generic BERT

Core Capabilities

  • Legal document classification and analysis
  • Legal named entity recognition
  • Legal text completion and understanding
  • Case law analysis and prediction

Frequently Asked Questions

Q: What makes this model unique?

Legal-BERT is specifically trained on legal texts, making it more accurate for legal domain tasks than general-purpose language models. It offers specialized variants for specific legal subdomains like contracts and EU law.

Q: What are the recommended use cases?

The model is ideal for legal text analysis, contract review, compliance checking, legal research, and other legal NLP applications. It performs particularly well on tasks involving legal terminology and concepts.

The first platform built for prompt engineering