HTML-Pruner-Llama-1B

Maintained By
zstanjj

HTML-Pruner-Llama-1B

PropertyValue
Parameter Count1.24B
LicenseApache 2.0
PaperHtmlRAG Paper
Base Modelmeta-llama/Llama-3.2-1B

What is HTML-Pruner-Llama-1B?

HTML-Pruner-Llama-1B is a specialized language model designed for efficient HTML content pruning in Retrieval-Augmented Generation (RAG) systems. It's a key component of the HtmlRAG framework, which innovatively uses HTML instead of plain text for handling external knowledge in RAG systems. The model implements a two-step block-tree-based HTML pruning approach to optimize content selection while preserving semantic structure.

Implementation Details

The model operates through a sophisticated two-step pruning process: first using embedding-based scoring, followed by path-generative pruning. It's built on the Llama 1B architecture and operates with BF16 precision, making it both efficient and effective for production deployments.

  • Implements Lossless HTML Cleaning for maintaining semantic integrity
  • Features Block-Tree-Based HTML pruning for optimal content selection
  • Supports flexible context window management
  • Includes built-in tokenization and processing capabilities

Core Capabilities

  • Efficient HTML document processing and cleaning
  • Intelligent content ranking and selection
  • Integration with various embedding models
  • Support for custom tokenizer implementations
  • Competitive performance across multiple benchmark datasets

Frequently Asked Questions

Q: What makes this model unique?

This model is specifically designed for HTML processing in RAG systems, offering a novel approach to content pruning while maintaining HTML structure integrity. It achieves competitive results across various datasets including ASQA, HotpotQA, and NQ.

Q: What are the recommended use cases?

The model is ideal for RAG systems that need to process HTML content efficiently, particularly in applications requiring intelligent content selection and summarization while maintaining HTML structure. It's especially useful in scenarios where context length is a constraint.

The first platform built for prompt engineering