Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chao Ma

Frame Fusion with Vehicle Motion Prediction for 3D Object Detection

Jun 19, 2023

Xirui Li, Feng Wang, Naiyan Wang, Chao Ma

Figure 1 for Frame Fusion with Vehicle Motion Prediction for 3D Object Detection

Figure 2 for Frame Fusion with Vehicle Motion Prediction for 3D Object Detection

Figure 3 for Frame Fusion with Vehicle Motion Prediction for 3D Object Detection

Figure 4 for Frame Fusion with Vehicle Motion Prediction for 3D Object Detection

Abstract:In LiDAR-based 3D detection, history point clouds contain rich temporal information helpful for future prediction. In the same way, history detections should contribute to future detections. In this paper, we propose a detection enhancement method, namely FrameFusion, which improves 3D object detection results by fusing history frames. In FrameFusion, we ''forward'' history frames to the current frame and apply weighted Non-Maximum-Suppression on dense bounding boxes to obtain a fused frame with merged boxes. To ''forward'' frames, we use vehicle motion models to estimate the future pose of the bounding boxes. However, the commonly used constant velocity model fails naturally on turning vehicles, so we explore two vehicle motion models to address this issue. On Waymo Open Dataset, our FrameFusion method consistently improves the performance of various 3D detectors by about $2$ vehicle level 2 APH with negligible latency and slightly enhances the performance of the temporal fusion method MPPNet. We also conduct extensive experiments on motion model selection.

Via

Access Paper or Ask Questions

Understanding Multi-phase Optimization Dynamics and Rich Nonlinear Behaviors of ReLU Networks

May 21, 2023

Mingze Wang, Chao Ma

Abstract:The training process of ReLU neural networks often exhibits complicated nonlinear phenomena. The nonlinearity of models and non-convexity of loss pose significant challenges for theoretical analysis. Therefore, most previous theoretical works on the optimization dynamics of neural networks focus either on local analysis (like the end of training) or approximate linear models (like Neural Tangent Kernel). In this work, we conduct a complete theoretical characterization of the training process of a two-layer ReLU network trained by Gradient Flow on a linearly separable data. In this specific setting, our analysis captures the whole optimization process starting from random initialization to final convergence. Despite the relatively simple model and data that we studied, we reveal four different phases from the whole training process showing a general simplifying-to-complicating learning trend. Specific nonlinear behaviors can also be precisely identified and captured theoretically, such as initial condensation, saddle-to-plateau dynamics, plateau escape, changes of activation patterns, learning with increasing complexity, etc.

Via

Access Paper or Ask Questions

Understanding Causality with Large Language Models: Feasibility and Opportunities

Apr 11, 2023

Cheng Zhang, Stefan Bauer, Paul Bennett, Jiangfeng Gao, Wenbo Gong, Agrin Hilmkil, Joel Jennings, Chao Ma, Tom Minka, Nick Pawlowski(+1 more)

Figure 1 for Understanding Causality with Large Language Models: Feasibility and Opportunities

Figure 2 for Understanding Causality with Large Language Models: Feasibility and Opportunities

Figure 3 for Understanding Causality with Large Language Models: Feasibility and Opportunities

Figure 4 for Understanding Causality with Large Language Models: Feasibility and Opportunities

Abstract:We assess the ability of large language models (LLMs) to answer causal questions by analyzing their strengths and weaknesses against three types of causal question. We believe that current LLMs can answer causal questions with existing causal knowledge as combined domain experts. However, they are not yet able to provide satisfactory answers for discovering new knowledge or for high-stakes decision-making tasks with high precision. We discuss possible future directions and opportunities, such as enabling explicit and implicit causal modules as well as deep causal-aware LLMs. These will not only enable LLMs to answer many different types of causal questions for greater impact but also enable LLMs to be more trustworthy and efficient in general.

Via

Access Paper or Ask Questions

UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

Mar 27, 2023

Shengchao Zhou, Weizhou Liu, Chen Hu, Shuchang Zhou, Chao Ma

Abstract:In the field of 3D object detection for autonomous driving, the sensor portfolio including multi-modality and single-modality is diverse and complex. Since the multi-modal methods have system complexity while the accuracy of single-modal ones is relatively low, how to make a tradeoff between them is difficult. In this work, we propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors. Specifically, during training, UniDistill projects the features of both the teacher and the student detector into Bird's-Eye-View (BEV), which is a friendly representation for different modalities. Then, three distillation losses are calculated to sparsely align the foreground features, helping the student learn from the teacher without introducing additional cost during inference. Taking advantage of the similar detection paradigm of different detectors in BEV, UniDistill easily supports LiDAR-to-camera, camera-to-LiDAR, fusion-to-LiDAR and fusion-to-camera distillation paths. Furthermore, the three distillation losses can filter the effect of misaligned background information and balance between objects of different sizes, improving the distillation effectiveness. Extensive experiments on nuScenes demonstrate that UniDistill effectively improves the mAP and NDS of student detectors by 2.0%~3.2%.

Via

Access Paper or Ask Questions

Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning

Mar 22, 2023

Matthew Ashman, Chao Ma, Agrin Hilmkil, Joel Jennings, Cheng Zhang

Abstract:Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradient-based approach to learning an ADMG with non-linear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with non-linear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows for ADMG learning. This not only enables us to determine complex causal structural relationships behind the data in the presence of latent confounding, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines.

* Camera ready version for ICLR 2023

Via

Access Paper or Ask Questions

Pillar R-CNN for Point Cloud 3D Object Detection

Feb 26, 2023

Guangsheng Shi, Ruifeng Li, Chao Ma

Figure 1 for Pillar R-CNN for Point Cloud 3D Object Detection

Figure 2 for Pillar R-CNN for Point Cloud 3D Object Detection

Figure 3 for Pillar R-CNN for Point Cloud 3D Object Detection

Figure 4 for Pillar R-CNN for Point Cloud 3D Object Detection

Abstract:The performance of point cloud 3D object detection hinges on effectively representing raw points, grid-based voxels or pillars. Recent two-stage 3D detectors typically take the point-voxel-based R-CNN paradigm, i.e., the first stage resorts to the 3D voxel-based backbone for 3D proposal generation on bird-eye-view (BEV) representation and the second stage refines them via the intermediate point representation. Their primary mechanisms involve the utilization of intermediary keypoints to restore the substantial 3D structure context from the converted BEV representation. The skilled point-voxel feature interaction, however, makes the entire detection pipeline more complex and compute-intensive. In this paper, we take a different viewpoint -- the pillar-based BEV representation owns sufficient capacity to preserve the 3D structure. In light of the latest advances in BEV-based perception, we devise a conceptually simple yet effective two-stage 3D detection architecture, named Pillar R-CNN. On top of densified BEV feature maps, Pillar R-CNN can easily introduce the feature pyramid architecture to generate 3D proposals at various scales and take the simple 2D R-CNN style detect head for box refinement. Our Pillar R-CNN performs favorably against state-of-the-art 3D detectors on the large-scale Waymo Open Dataset but at a small extra cost. It should be highlighted that further exploration into BEV perception for applications involving autonomous driving is now possible thanks to the effective and elegant Pillar R-CNN architecture.

Via

Access Paper or Ask Questions

Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis

Jan 24, 2023

Ruibo Tu, Chao Ma, Cheng Zhang

Figure 1 for Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis

Figure 2 for Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis

Figure 3 for Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis

Figure 4 for Causal-Discovery Performance of ChatGPT in the context of Neuropathic Pain Diagnosis

Abstract:ChatGPT has demonstrated exceptional proficiency in natural language conversation, e.g., it can answer a wide range of questions while no previous large language models can. Thus, we would like to push its limit and explore its ability to answer causal discovery questions by using a medical benchmark (Tu et al. 2019) in causal discovery.

Via

Access Paper or Ask Questions

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

Jan 18, 2023

Zongwei Wu, Guillaume Allibert, Fabrice Meriaudeau, Chao Ma, Cédric Demonceaux

Abstract:RGB-D saliency detection aims to fuse multi-modal cues to accurately localize salient regions. Existing works often adopt attention modules for feature modeling, with few methods explicitly leveraging fine-grained details to merge with semantic cues. Thus, despite the auxiliary depth information, it is still challenging for existing models to distinguish objects with similar appearances but at distinct camera distances. In this paper, from a new perspective, we propose a novel Hierarchical Depth Awareness network (HiDAnet) for RGB-D saliency detection. Our motivation comes from the observation that the multi-granularity properties of geometric priors correlate well with the neural network hierarchies. To realize multi-modal and multi-level fusion, we first use a granularity-based attention scheme to strengthen the discriminatory power of RGB and depth features separately. Then we introduce a unified cross dual-attention module for multi-modal and multi-level fusion in a coarse-to-fine manner. The encoded multi-modal features are gradually aggregated into a shared decoder. Further, we exploit a multi-scale loss to take full advantage of the hierarchical information. Extensive experiments on challenging benchmark datasets demonstrate that our HiDAnet performs favorably over the state-of-the-art methods by large margins.

Via

Access Paper or Ask Questions

Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

Nov 27, 2022

Hongguang Sun, Chao Ma, Linyi Zhang, Jiahui Li, Xijun Wang, Shuqin Li, Tony Q. S. Quek

Figure 1 for Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

Figure 2 for Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

Figure 3 for Coverage Analysis for Cellular-Connected Random 3D Mobile UAVs with Directional Antennas

Abstract:This letter proposes an analytical framework to evaluate the coverage performance of a cellular-connected unmanned aerial vehicle (UAV) network in which UAV user equipments (UAV-UEs) are equipped with directional antennas and move according to a three-dimensional (3D) mobility model. The ground base stations (GBSs) equipped with practical down-tilted antennas are distributed according to a Poisson point process (PPP). With tools from stochastic geometry, we derive the handover probability and coverage probability of a random UAV-UE under the strongest average received signal strength (RSS) association strategy. The proposed analytical framework allows to investigate the effect of UAV-UE antenna beamwidth, mobility speed, cell association, and vertical motions on both the handover probability and coverage probability. We conclude that the optimal UAV-UE antenna beamwidth decreases with the GBS density, and the omnidirectional antenna model is preferred in the sparse network scenario. What's more, the superiority of the strongest average RSS association over the nearest association diminishes with the increment of GBS density.

* 5 pages, 5 figures, submitted to IEEE Wireless Communications Letters

Via

Access Paper or Ask Questions

Why self-attention is Natural for Sequence-to-Sequence Problems? A Perspective from Symmetries

Oct 13, 2022

Chao Ma, Lexing Ying

Abstract:In this paper, we show that structures similar to self-attention are natural to learn many sequence-to-sequence problems from the perspective of symmetry. Inspired by language processing applications, we study the orthogonal equivariance of seq2seq functions with knowledge, which are functions taking two inputs -- an input sequence and a ``knowledge'' -- and outputting another sequence. The knowledge consists of a set of vectors in the same embedding space as the input sequence, containing the information of the language used to process the input sequence. We show that orthogonal equivariance in the embedding space is natural for seq2seq functions with knowledge, and under such equivariance the function must take the form close to the self-attention. This shows that network structures similar to self-attention are the right structures to represent the target function of many seq2seq problems. The representation can be further refined if a ``finite information principle'' is considered, or a permutation equivariance holds for the elements of the input sequence.

Via

Access Paper or Ask Questions