BOP-Distrib: Revisiting 6D Pose Estimation Benchmarks for Better Evaluation under Visual Ambiguities
- Boris Meden¹
- Asma Brazi¹²
- Fabrice Mayran de Chamisso¹
- Steve Bourgeois¹
- Vincent Lepetit² ¹Université Paris-Saclay, CEA List, F-91120, Palaiseau, France, ²LIGM, Ecole des Ponts, Univ Gustave Eiffel, CNRS, Marne-la-vallée, France
We provide for the first time 6D pose annotations in the form of a per-image object pose distribution. Current annotations in BOP [21] datasets are given as a single pose, shown here as a circle in the SO(3) representations. BOP also provides a symmetry pattern per object, from which a distribution can be computed (the colored points in SO(3)). Such distribution however does not cover many cases [35]: In this example, when only the core is visible (Case 1), the pose is fully ambiguous and should be represented by a continuous distribution in SO(3). When the sides of the head are visible (Case 2), there are still ambiguities and the distribution is made of 6 modes. When the hole is visible (Case 3), the pose distribution should be concentrated around one non-ambiguous pose. Our method annotates scenes with per-image distributions, taking into account the partial occlusions and allowing us to evaluate a predicted pose properly. We show that considering these distributions for evaluation results in a significant change of ranking for the BOP challenge. Such ground truth distributions also become a key asset when it comes to evaluating pose distribution estimation methods [13, 23]. With appropriate metrics, we demonstrate the first quantitative evaluation of pose distribution methods on real images, as an extension to single pose methods.
Abstract
6D pose estimation aims at determining the object pose that best explains the camera observation. The unique solution for non-ambiguous objects can turn into a multi-modal pose distribution for symmetrical objects or when occlusions of symmetry-breaking elements happen, depending on the viewpoint. Currently, 6D pose estimation methods are benchmarked on datasets that consider, for their ground truth annotations, visual ambiguities as only related to global object symmetries, whereas they should be defined per-image to account for the camera viewpoint. We thus first propose an automatic method to re-annotate those datasets with a 6D pose distribution specific to each image, taking into account the object surface visibility in the image to correctly determine the visual ambiguities. Second, given this improved ground truth, we re-evaluate the state-of-the-art single pose methods and show that this greatly modifies the ranking of these methods. Third, as some recent works focus on estimating the complete set of solutions, we derive a precision/recall formulation to evaluate them against our image-wise distribution ground truth, making it the first benchmark for pose distribution methods on real images.
Per-image pose distribution annotation method
Metrics for evaluating pose distribution evaluation methods
We propose an adaptation of Precision and Recall to the distribution to evaluate how accurate the estimated poses are (Precision), but also how well they cover the ground truth distribution (Recall). The poses comparison is done with registration errors such as MPD and MSD.
Results
Pose distribution evaluations of SpyroPose and LiePose
We present here the first quantitative evaluation of pose distribution methods SpyroPose and LiePose on real data (T-LESS). The graphs below also incorporate the results from Corr2Distrib.