Lijun Wang
July 1, 2018
By Eigen et al., NIPS 2014
$L_{\textrm{single}} + \Omega_{\textrm{set}}$
$D^* = \arg \min \limits_{D} \sum \limits _{p=1}^{N} \phi (D^p - D^p_{est}) \\ + \alpha \sum \limits_{p=1}^{N} [\phi (\nabla_x D^p - G_x^p) + \phi (\nabla_y D^p - G_y^p)]$
DataSet | Statics | Anotation | Scene |
---|---|---|---|
NYUD-v2 | 1449 + 407K raw | Depth + Segmentation | Indoor |
KITTI | 94k frames | Depth aligned with raw data | Street |
Make3D | 500 low-resolution | Depth | Outdoor |
SUNRGB-D | 10k | Depth, Segmentation, 3D bounding box | Indoor |
Different training strategies:
By Chen et al, NIPS 2016
Increase scene diversity with interenet images.
Humans are better at judging relative depth:
“Is point A closer than point B?”
A relative depth data set
Ranking Loss:
$L(I,R,z)=\sum \limits_k \psi(I, i_k, j_k, r, z)$
where the loss for the $k$-th quiry:
$\psi(I, i_k, j_k, r, z) = \begin{cases} \log (1+\exp (-z_{i_k} + z_{j_k})), & \mbox{if } r_k=+1\\ \log (1+\exp (z_{i_k} - z_{j_k})), & \mbox{if } r_k=-1 \\ (z_{i_k} - z_{j_k})^2, & \mbox{if } r_k=0 \end{cases}$
If $\ge 30\%$ valid depth ⇒ Euclidean loss
Otherwise ⇒ ordinal loss
Determine foreground with semantic info
$L_{\mbox{grad}}=\frac{1}{n} \sum \limits_k \sum \limits_i (|\nabla_x R_i^k + \nabla_y R_i^k|)$
[5] Unsupervised Learning of Depth and Ego-Motion from Video, CVPR 2017
[6]GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose, CVPR 2018
[7]Learning Depth from Monocular Videos using Direct Methods, CVPR 2018
By Srinivasan et al, CVPR 2018
Image ⇒ Depth ⇒ Rendering function ⇒ Shallow DoF
Scenarios are limited, mainly flowers
[8]PAD-Net: Multi-Tasks Guided Prediction-and-Distillation Network for Simultaneous Depth Estimation and Scene Parsing, CVPR 18
[9]AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimatio, CVPR 18
[10]Salience Guided Depth Calibration for Perceptually Optimized Compressive Light Field 3D Display, CVPR 18