Remote Sensing, Vol. 15, Pages 2692: FusionPillars: A 3D Object Detection Network with Cross-Fusion and Self-Fusion
Remote Sensing doi: 10.3390/rs15102692
Authors: Jing Zhang Da Xu Yunsong Li Liping Zhao Rui Su
In the field of unmanned systems, cameras and LiDAR are important sensors that provide complementary information. However, the question of how to effectively fuse data from two different modalities has always been a great challenge. In this paper, inspired by the idea of deep fusion, we propose a one-stage end-to-end network named FusionPillars to fuse multisensor data (namely LiDAR point cloud and camera images). It includes three branches: a point-based branch, a voxel-based branch, and an image-based branch. We design two modules to enhance the voxel-wise features in the pseudo-image: the Set Abstraction Self (SAS) fusion module and the Pseudo View Cross (PVC) fusion module. For the data from a single sensor, by considering the relationship between the point-wise and voxel-wise features, the SAS fusion module self-fuses the point-based branch and the voxel-based branch to enhance the spatial information of the pseudo-image. For the data from two sensors, through the transformation of the images’ view, the PVC fusion module introduces the RGB information as auxiliary information and cross-fuses the pseudo-image and RGB image of different scales to supplement the color information of the pseudo-image. Experimental results revealed that, compared to existing current one-stage fusion networks, FusionPillars yield superior performance, with a considerable improvement in the detection precision for small objects.