Standard deterministic deep networks often fail or become "overconfident" when faced with partial observability, occlusions, and object symmetries. We propose SE(3)-PoseFlow, a novel probabilistic framework that leverages flow matching on the SE(3) manifold to estimate full 6D object pose distributions. Unlike methods that regress a single output, our approach models multi-modal hypotheses, enabling robots to reason about uncertainty for safer and more reliable manipulation in the real world.
Given an RGB-D input, we extract object-centric RGB crops and partial point clouds using off-the-shelf detectors. These visual and geometric features, together with timestep and sampled poses, are encoded and fused via DiT* blocks with masked cross-attention to predict conditional velocity fields for SE(3) Flow Matching. This process generates a sample-based estimate of the pose distribution that naturally captures uncertainty in ambiguous cases like symmetry or severe occlusion. To resolve final ambiguities, the framework supports two complementary pose selection strategies: a model-free clustering approach (DBSCAN) and a model-based geometric scoring (using SDF or Chamfer distance).
@article{jin2025se,
title={SE (3)-PoseFlow: Estimating 6D Pose Distributions for Uncertainty-Aware Robotic Manipulation},
author={Jin, Yufeng and Funk, Niklas and Prasad, Vignesh and Li, Zechu and Franzius, Mathias and Peters, Jan and Chalvatzaki, Georgia},
journal={arXiv preprint arXiv:2511.01501},
year={2025}
}