Note that this model works well at higher resolution too (in contrast with MiDaS family) given the depth head is convolution and the DINO v2 base can interpolate positional encoding to generate higher-resolution features: https://x.com/drawthingsapp/status/1757486115802726744