LLaMA-Mesh
Property | Value |
---|---|
Parameter Count | 8.03B |
License | LLaMA 3.1 |
Paper | Link |
Architecture | Transformer (LLaMA 3.1) |
Training Data | Objaverse (30k mesh data) |
What is LLaMA-Mesh?
LLaMA-Mesh is a groundbreaking model that bridges the gap between language models and 3D mesh generation. Built on the LLaMA 3.1 architecture, it uniquely represents 3D mesh data (vertices and faces) as plain text, enabling seamless integration with language model capabilities. The model was developed by researchers from NVIDIA and various institutions to enable conversational 3D generation and mesh understanding.
Implementation Details
The model leverages a BF16 tensor type and processes both text and 3D mesh data through a unified format. It's trained on a carefully curated dataset of 30k mesh samples from Objaverse, specifically filtered to include shapes with fewer than 500 faces. The training process utilized 32 GPUs and focused on maintaining both strong text generation capabilities while adding 3D mesh generation abilities.
- 8B parameter model based on LLaMA 3.1
- Supports up to 8k token length
- Trained on filtered Objaverse dataset
- Implements vertex and face tokenization as text
Core Capabilities
- Generate 3D meshes from text prompts
- Produce interleaved text and 3D mesh outputs
- Understand and interpret 3D meshes
- Maintain strong text generation performance
- Enable conversational 3D generation
Frequently Asked Questions
Q: What makes this model unique?
LLaMA-Mesh is the first model to successfully unify 3D mesh generation with language models by representing mesh data as text, allowing it to leverage spatial knowledge embedded in language models while maintaining text generation capabilities.
Q: What are the recommended use cases?
The model is ideal for applications requiring 3D content generation from text descriptions, interactive 3D modeling through conversation, and tasks involving both text and 3D mesh understanding. It's particularly useful in creative tools, design applications, and research contexts.