Depth Anything V2 Core ML

Property	Value
Parameters	24.8M
License	Apache-2.0
Paper	Link to Paper
Author	Apple

What is coreml-depth-anything-v2-small?

Depth Anything V2 is a state-of-the-art depth estimation model optimized for Apple devices using Core ML. It employs the DPT architecture with a DINOv2 backbone, trained on an extensive dataset of 600K synthetic labeled images and 62 million real unlabeled images.

Implementation Details

The model comes in two variants: a Float32 version (99.2MB) and a Float16 version (49.8MB). Both versions maintain high accuracy, with the F32 version achieving a 0.0072 abs-rel error and the F16 version at 0.0089. The model is optimized to run on Apple's Neural Engine, achieving impressive inference times across different devices - from 24.58ms on M3 Max to 33.90ms on iPhone 15 Pro Max.

Leverages DPT architecture with DINOv2 backbone
Trained on massive synthetic and real image datasets
Optimized for Apple's Neural Engine
Available in both F32 and F16 precision variants

Core Capabilities

High-quality depth estimation from single images
Fast inference times on Apple devices
Support for both relative and absolute depth estimation
Efficient memory usage with F16 optimization option

Frequently Asked Questions

Q: What makes this model unique?

This model stands out for its optimization for Apple devices through Core ML, offering exceptional performance while maintaining high accuracy in depth estimation. Its dual precision options (F16/F32) provide flexibility for different use cases.

Q: What are the recommended use cases?

The model is ideal for iOS and macOS applications requiring real-time depth estimation, including AR applications, computational photography, and 3D scene understanding. The F16 variant is particularly suitable for mobile devices where memory efficiency is crucial.

coreml-depth-anything-v2-small