Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing

1University of Science and Technology of China

Abstract

Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions, which has significant potential for various practical applications ranging from 3D games to film production. Existing methods typically follow a view-indiscriminate paradigm: editing 2D views indiscriminately and projecting them back into 3D space. However, they overlook the different cross-view interdependencies, resulting in inconsistent multi-view editing. In this study, we argue that ideal consistent 3D editing can be achieved through a progressive-views paradigm, which propagates editing semantics from the editing-salient view to other editing-sparse views. Specifically, we propose Pro3D-Editor, a novel framework, which mainly includes Primary-view Sampler, Key-view Render, and Full-view Refiner. Primary-view Sampler dynamically samples and edits the most editing-salient view as the primary view. Key-view Render accurately propagates editing semantics from the primary view to other key views through its Mixture-of-View-Experts Low-Rank Adaption (MoVE-LoRA). Full-view Refiner edits and refines the 3D object based on the edited multi-views. Extensive experiments demonstrate that our method outperforms existing methods in editing accuracy and spatial consistency.

Example Image
We introduce a progressive-views paradigm, a novel 3D editing paradigm that propagates editing semantics from editing-salient views to editing-sparse views, thereby achieving precise, consistent, and high-quality 3D editing.

How does it work?

Pro3D-Editor constructs a hierarchical "primary-view → key-views → full-views" editing pipeline based on the dynamic editing salience across different views. Specifically, the Pro3D-Editor consists of three successive modules: (1) Primary-view Sampler module dynamically samples and edits the most editing salient view as the primary view by calculating the salience score between each view and the editing signal, which is further linearly extrapolated with its corresponding negative view to amplify accuracy. (2) Key-view Render module takes the edited primary view as the anchor and propagates its editing semantics to other key views. This is achieved through a novel Mixture-of-View-Experts Low-Rank Adaption (MoVE-LoRA), which learns feature correspondences from the primary view to the remaining key views while blocking reverse learning to avoid conflicts. (3) Full-view Refiner module repairs numerous newly rendered views to refine the edited 3D result, which is achieved by fusing the editing information from the edited key multi-views.

Example Image

Comparison

We provide extensive visual comparison trajectory videos to demonstrate the improvements of our method over existing methods in terms of editing accuracy and consistency.

BibTeX

BibTex Code Here