CEA-IMSOLD: a Dataset for Multi-Scale Object Localization in Industry

Abstract

We introduce the CEA Industrial Multi-Scale Object Localization Dataset (CEA-IMSOLD), a new BOP format dataset for 6-DoF object localization, crucial for robotics. This dataset aims to evaluate the current localization methods with respect to a new difficulty: large variations in observation distance and, consequently, large variations in image appearance. Compared to the other publicly available datasets, our dataset provides both images with objects small and completely visible in the image, and images where objects are observed close enough so they appear larger than the field of view of the cam- era. We also propose to consider the observation distance in the evaluation process and introduce new metrics to do so. Finally, our dataset contains a large variety of industrial objects, from small and simple objects such as bolts to sizable and complex ones such as large car parts.

Dataset overview


Our dataset covers the following aspectcs, until now unconsidered:

  1. Camera to object distance variation: as illustrated above, our dataset includes large observation distance variations while this variation is limited in BOP challenge datasets. This makes this suitable for use cases such as: inspection, augmented reality, mobile robotics, etc.
  2. Depth sensor quality variety: our dataset includes depth images provided by both high-end and low-cost sensors, and covers both industrial and consumer applications.
  3. Object complexity variation: our dataset includes both simple (e.g. bottle cap), and complex objects (e.g. car engine), covering use cases such as pick-and-place and quality inspection.
  4. High accuracy ground-truth: the annotation process based on robotic arm and high-end depth sensors allows to reach high accuracy ground-truth.
  5. A new metric taking into account distance to objects in performance evaluation.



IMSOLD statistics



Objects and 3D models

The dataset is composed of 25 objects with different characteristics in terms of surface (reflecting vs. lambertian), symmetries (with vs. without full rotational symmetry), complexity (primitive shapes vs. complex objects), flatness (flat vs. voluminous), details (with vs. without very fine details on their surface), compactness (long vs. compact), and size (diameters from 23 mm to 1278 mm).

PVC elbow (diameter: 91 mm)
Hinge 4 (diameter: 128 mm)
Cylinder 133 63 (diameter: 146 mm)
Sprayer (diameter: 115 mm)
Metallic tee (diameter: 74 mm)
Hinge 1 (diameter: 158 mm)
Gear (small) (diameter: 66 mm)
Car door (diameter: 1278 mm)
Engine (diameter: 904 mm)
Cylinder head (diameter: 473 mm)


Each scene was acquired with a calibrated setup consisting of an industrial grade structured light RGB-D sensor (Zivid 2, with spatial resolution of 0.39mm at 700mm), a consumer grade active stereo RGB-D sensor (Realsense D415) and an industrial RGB camera (FLIR camera with resolution 2048x1536), all the three rigidly mounted on a 6 DOF robotic manipulator (UR10e from Universal Robots, with a position repeatability of 0.05 mm). Each scene was acquired from multiple viewpoints and with large observation distance variations.



Ground-truth annotations

CEA-IMSOLD proposes a variety of challenges based on object geometry, material, appearance and scale variation. Below we show some examples of ground thruth annotations for some of the objects in the dataset


Industrial grade RGB-D sensor


Consumer grade RGB-D sensor


Industrial RGB camera


Citation

Contact

For more information contact boris.meden[at]cea.fr

Acknowledgements

The website template was borrowed from RING-NeRF, Ref-NeRF and nerfies.