SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation

Standard deterministic deep networks often fail or become "overconfident" when faced with partial observability, occlusions, and object symmetries. We propose SE(3)-PoseFlow, a novel probabilistic framework that leverages flow matching on the SE(3) manifold to estimate full 6D object pose distributions. Unlike methods that regress a single output, our approach models multi-modal hypotheses, enabling robots to reason about uncertainty for safer and more reliable manipulation in the real world.

Given an RGB-D input, we extract object-centric RGB crops and partial point clouds using off-the-shelf detectors. These visual and geometric features, together with timestep and sampled poses, are encoded and fused via DiT* blocks with masked cross-attention to predict conditional velocity fields for SE(3) Flow Matching. This process generates a sample-based estimate of the pose distribution that naturally captures uncertainty in ambiguous cases like symmetry or severe occlusion. To resolve final ambiguities, the framework supports two complementary pose selection strategies: a model-free clustering approach (DBSCAN) and a model-based geometric scoring (using SDF or Chamfer distance).

SE(3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation

Abstract

Method

BibTeX