DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF

Doriand Petit¹²
Steve Bourgeois¹
Vincent Gay-Bellile¹
Florian Chabot¹
Loïc Barthe²

🎉 DiSCO-3D has been accepted to ICCV'25 ! See you in Hawaii !! 🎉

We introduce DiSCO-3D, the first method designed to solve the novel task of Open-Vocabulary Sub-Concepts Discovery.

Abstract

3D semantic segmentation provides high-level scene understanding for applications in robotics, autonomous systems, etc. Traditional methods adapt exclusively to either task-specific goals (open-vocabulary segmentation) or scene content (unsupervised semantic segmentation). We propose DiSCO-3D, the first method addressing the broader problem of 3D Open-Vocabulary Sub-concepts Discovery, which aims to provide a 3D semantic segmentation that adapts to both the scene and user queries. We build DiSCO-3D on Neural Fields representations, combining unsupervised segmentation with weak open-vocabulary guidance. Our evaluations demonstrate that DiSCO-3D achieves effective performance in Open-Vocabulary Sub-concepts Discovery and exhibits state-of-the-art results in the edge cases of both open-vocabulary and unsupervised segmentation.

What is Open-Vocabulary Sub-Concepts Discovery ?

Open-Vocabulary Sub-Concepts Discovery consists in performing semantic clustering on objects relevant to a general user query. Contrary to Open-Vocabulary Segmentation, where the user queries a single object, here the query is a general concept (such as "seatings" or "furniture") and the goal is not only to segments relevant objects but also to cluster them into semantic sub-concepts.

How does DiSCO-3D work ?

To solve this novel task, we introduce DiSCO-3D ! Built on Neural Fields representations, it plugs into traditional feature fields (e.g. LeRF, OpenNeRF) without any additional models or data.

DiSCO-3D is built on two main modules of Unsupervised Semantic Segmentation and Open-Vocabulary Segmentation. Rather than performing those two tasks successively, we design DiSCO to perform them jointly and demonstrate that it improves heavily OV-SD results.

Results

Open-Vocabulary Sub-Concepts Discovery

DiSCO can perform OV-SD on multiple types of queries (different modalities, ...), scenes (outdoor/indoor) and feature fields (LeRF, OpenNeRF, ...). It can even handle multiple queries at once (both disjoint, overlapping or nested).

Open-Vocabulary Segmentation

Finally, DiSCO-3D also performs well on specific edge cases of OV-SD, obtaining SoTA performances first on Open-Vocabulary Segmentation, where we aim to find one single sub-concept per query.

Unsupervised Semantic Segmentation

The other specific edge case of DiSCO-3D is Unsupervised Semantic Segmentation, where we consider the user query to be null and thus segment the whole scene.

Citation


@inproceedings{petit2025disco3d,
    title={DiSCO-3D : Discovering and segmenting Sub-Concepts from Open-vocabulary queries in NeRF},
    author={Doriand Petit and Steve Bourgeois and Vincent Gay-Bellile and Florian Chabot and Loïc Barthe},
    journal={International Conference on Computer Vision (ICCV)},
    year={2025},
}

Acknowledgements

This publication was made possible by the use of the CEA List FactoryIA supercomputer, financially supported by the Ile-de-France Regional Council

The website template was borrowed from Michaël Gharbi, Ref-NeRF and nerfies.