Focal Length and Object Pose Estimation via Render and Compare

CVPR 2022
1LIGM, École des Ponts, Univ Gustave Eiffel, CNRS 2CIIRC CTU 3ENS/Inria 4Adobe Research

Given a single input photograph (left) and a known 3D model, our approach accurately estimates the 6D camera-object pose together with the focal length of the camera (right), here shown by overlaying the aligned 3D model over the input image. Our approach handles a large range of focal lengths and the resulting perspective effects.

Abstract

We introduce FocalPose, a neural render-and-compare method for jointly estimating the camera-object 6D pose and camera focal length given a single RGB input image depicting a known object. The contributions of this work are twofold. First, we derive a focal length update rule that extends an existing state-of-the-art render-and-compare 6D pose estimator to address the joint estimation task. Second, we investigate several different loss functions for jointly estimating the object pose and focal length. We find that a combination of direct focal length regression with a reprojection loss disentangling the contribution of translation, rotation, and focal length leads to improved results. We show results on three challenging benchmark datasets that depict known 3D models in uncontrolled settings. We demonstrate that our focal length and 6D pose estimates have lower error than the existing state-of-the-art methods.

Approach Overview

Approach overview.

FocalPose overview. Given a single in-the-wild RGB input image \(I\) of a known object 3D model \(\mathcal{M}\), parameters \(\theta^k\) composed of focal length \(f^k\) and the object 6D pose (3D translation \(t^k\) and 3D rotation \(R^k\)) are iteratively updated using our render-and-compare approach. Rendering \(R\), together with the input image \(I\), are given to a deep neural network \(F\) that predicts update \(\Delta \theta_k\), which is then converted into parameter update \(\theta^{k+1}\) using a non-linear update rule \(U\).

BibTeX

@inproceedings{ponimatkin2022focal, 
      title={{{Focal}} {{Length}} and {{Object}} {{Pose}} {{Estimation}} via {{Render}} and {{Compare}}}, 
      author={G. {Ponimatkin} and Y. {Labb{\'e}} and B. {Russell} and M. {Aubry} and J. {Sivic}}, 
      booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, 
      year={2022}
}
    

Acknowledgements

This work was partly supported by the European Regional Development Fund under the project IMPACT (reg. no. CZ.02.1.01/0.0/0.0/15_003/0000468), the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90140), the French government under management of Agence Nationale de la Recherche as part of the ''Investissements d'avenir'' program, reference ANR19-P3IA-0001 (PRAIRIE 3IA Institute).