coreml-depth-anything-v2-small

Maintained By
apple

Depth Anything V2 Core ML

PropertyValue
Parameters24.8M
LicenseApache-2.0
PaperLink to Paper
AuthorApple

What is coreml-depth-anything-v2-small?

Depth Anything V2 is a state-of-the-art depth estimation model optimized for Apple devices using Core ML. It employs the DPT architecture with a DINOv2 backbone, trained on an extensive dataset of 600K synthetic labeled images and 62 million real unlabeled images.

Implementation Details

The model comes in two variants: a Float32 version (99.2MB) and a Float16 version (49.8MB). Both versions maintain high accuracy, with the F32 version achieving a 0.0072 abs-rel error and the F16 version at 0.0089. The model is optimized to run on Apple's Neural Engine, achieving impressive inference times across different devices - from 24.58ms on M3 Max to 33.90ms on iPhone 15 Pro Max.

  • Leverages DPT architecture with DINOv2 backbone
  • Trained on massive synthetic and real image datasets
  • Optimized for Apple's Neural Engine
  • Available in both F32 and F16 precision variants

Core Capabilities

  • High-quality depth estimation from single images
  • Fast inference times on Apple devices
  • Support for both relative and absolute depth estimation
  • Efficient memory usage with F16 optimization option

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimization for Apple devices through Core ML, offering exceptional performance while maintaining high accuracy in depth estimation. Its dual precision options (F16/F32) provide flexibility for different use cases.

Q: What are the recommended use cases?

The model is ideal for iOS and macOS applications requiring real-time depth estimation, including AR applications, computational photography, and 3D scene understanding. The F16 variant is particularly suitable for mobile devices where memory efficiency is crucial.

🍰 Interesting in building your own agents?
PromptLayer provides Huggingface integration tools to manage and monitor prompts with your whole team. Get started here.