DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF

🎉 DiSCO-3D has been accepted to ICCV'25 ! See you in Hawaii !! 🎉

We introduce DiSCO-3D, the first method designed to solve the novel task of Open-Vocabulary Sub-Concepts Discovery.

Abstract

3D semantic segmentation provides high-level scene understanding for applications in robotics, autonomous systems, etc. Traditional methods adapt exclusively to either task-specific goals (open-vocabulary segmentation) or scene content (unsupervised semantic segmentation). We propose DiSCO-3D, the first method addressing the broader problem of 3D Open-Vocabulary Sub-concepts Discovery, which aims to provide a 3D semantic segmentation that adapts to both the scene and user queries. We build DiSCO-3D on Neural Fields representations, combining unsupervised segmentation with weak open-vocabulary guidance. Our evaluations demonstrate that DiSCO-3D achieves effective performance in Open-Vocabulary Sub-concepts Discovery and exhibits state-of-the-art results in the edge cases of both open-vocabulary and unsupervised segmentation.

What is Open-Vocabulary Sub-Concepts Discovery ?

Open-Vocabulary Sub-Concepts Discovery consists in performing semantic clustering on objects relevant to a general user query. Contrary to Open-Vocabulary Segmentation, where the user queries a single object, here the query is a general concept (such as "seatings" or "furniture") and the goal is not only to segments relevant objects but also to cluster them into semantic sub-concepts.

How does DiSCO-3D work ?

To solve this novel task, we introduce DiSCO-3D ! Built on Neural Fields representations, it plugs into traditional feature fields (e.g. LeRF, OpenNeRF) without any additional models or data.

DiSCO-3D is built on two main modules of Unsupervised Semantic Segmentation and Open-Vocabulary Segmentation. Rather than performing those two tasks successively, we design DiSCO to perform them jointly and demonstrate that it improves heavily OV-SD results.

Results

Open-Vocabulary Sub-Concepts Discovery

DiSCO can perform OV-SD on multiple types of queries (different modalities, ...), scenes (outdoor/indoor) and feature fields (LeRF, OpenNeRF, ...). It can even handle multiple queries at once (both disjoint, overlapping or nested).

Open-Vocabulary Segmentation

Finally, DiSCO-3D also performs well on specific edge cases of OV-SD, obtaining SoTA performances first on Open-Vocabulary Segmentation, where we aim to find one single sub-concept per query.

Unsupervised Semantic Segmentation

The other specific edge case of DiSCO-3D is Unsupervised Semantic Segmentation, where we consider the user query to be null and thus segment the whole scene.

Citation

Acknowledgements

This publication was made possible by the use of the CEA List FactoryIA supercomputer, financially supported by the Ile-de-France Regional Council

The website template was borrowed from Michaël Gharbi, Ref-NeRF and nerfies.