Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rui Huang

College of Computer Science and Technology, Civil Aviation University of China, China

Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Jul 04, 2024

Laiyan Ding, Hualie Jiang, Jie Li, Yongquan Chen, Rui Huang

Figure 1 for Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Figure 2 for Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Figure 3 for Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Figure 4 for Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation

Abstract:Depth estimation is a cornerstone for autonomous driving, yet acquiring per-pixel depth ground truth for supervised learning is challenging. Self-Supervised Surround Depth Estimation (SSSDE) from consecutive images offers an economical alternative. While previous SSSDE methods have proposed different mechanisms to fuse information across images, few of them explicitly consider the cross-view constraints, leading to inferior performance, particularly in overlapping regions. This paper proposes an efficient and consistent pose estimation design and two loss functions to enhance cross-view consistency for SSSDE. For pose estimation, we propose to use only front-view images to reduce training memory and sustain pose estimation consistency. The first loss function is the dense depth consistency loss, which penalizes the difference between predicted depths in overlapping regions. The second one is the multi-view reconstruction consistency loss, which aims to maintain consistency between reconstruction from spatial and spatial-temporal contexts. Additionally, we introduce a novel flipping augmentation to improve the performance further. Our techniques enable a simple neural model to achieve state-of-the-art performance on the DDAD and nuScenes datasets. Last but not least, our proposed techniques can be easily applied to other methods. The code will be made public.

Via

Access Paper or Ask Questions

Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

Jun 07, 2024

Bingheng Wang, Kuankuan Sima, Rui Huang, Lin Zhao

Figure 1 for Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

Figure 2 for Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

Figure 3 for Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

Figure 4 for Auto-Multilift: Distributed Learning and Control for Cooperative Load Transportation With Quadrotors

Abstract:Designing motion control and planning algorithms for multilift systems remains challenging due to the complexities of dynamics, collision avoidance, actuator limits, and scalability. Existing methods that use optimization and distributed techniques effectively address these constraints and scalability issues. However, they often require substantial manual tuning, leading to suboptimal performance. This paper proposes Auto-Multilift, a novel framework that automates the tuning of model predictive controllers (MPCs) for multilift systems. We model the MPC cost functions with deep neural networks (DNNs), enabling fast online adaptation to various scenarios. We develop a distributed policy gradient algorithm to train these DNNs efficiently in a closed-loop manner. Central to our algorithm is distributed sensitivity propagation, which parallelizes gradient computation across quadrotors, focusing on actual system state sensitivities relative to key MPC parameters. We also provide theoretical guarantees for the convergence of this algorithm. Extensive simulations show rapid convergence and favorable scalability to a large number of quadrotors. Our method outperforms a state-of-the-art open-loop MPC tuning approach by effectively learning adaptive MPCs from trajectory tracking errors and handling the unique dynamics couplings within the multilift system. Additionally, our framework can learn an adaptive reference for reconfigurating the system when traversing through multiple narrow slots.

Via

Access Paper or Ask Questions

Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

May 08, 2024

Kai Zheng, Haijun Zhao, Rui Huang, Beichuan Zhang, Na Mou, Yanan Niu, Yang Song, Hongning Wang, Kun Gai

Figure 1 for Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

Figure 2 for Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

Figure 3 for Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

Figure 4 for Full Stage Learning to Rank: A Unified Framework for Multi-Stage Systems

Abstract:The Probability Ranking Principle (PRP) has been considered as the foundational standard in the design of information retrieval (IR) systems. The principle requires an IR module's returned list of results to be ranked with respect to the underlying user interests, so as to maximize the results' utility. Nevertheless, we point out that it is inappropriate to indiscriminately apply PRP through every stage of a contemporary IR system. Such systems contain multiple stages (e.g., retrieval, pre-ranking, ranking, and re-ranking stages, as examined in this paper). The \emph{selection bias} inherent in the model of each stage significantly influences the results that are ultimately presented to users. To address this issue, we propose an improved ranking principle for multi-stage systems, namely the Generalized Probability Ranking Principle (GPRP), to emphasize both the selection bias in each stage of the system pipeline as well as the underlying interest of users. We realize GPRP via a unified algorithmic framework named Full Stage Learning to Rank. Our core idea is to first estimate the selection bias in the subsequent stages and then learn a ranking model that best complies with the downstream modules' selection bias so as to deliver its top ranked results to the final ranked list in the system's output. We performed extensive experiment evaluations of our developed Full Stage Learning to Rank solution, using both simulations and online A/B tests in one of the leading short-video recommendation platforms. The algorithm is proved to be effective in both retrieval and ranking stages. Since deployed, the algorithm has brought consistent and significant performance gain to the platform.

* Accepted by WWW 2024

Via

Access Paper or Ask Questions

Structured Click Control in Transformer-based Interactive Segmentation

May 07, 2024

Long Xu, Yongquan Chen, Rui Huang, Feng Wu, Shiwu Lai

Figure 1 for Structured Click Control in Transformer-based Interactive Segmentation

Figure 2 for Structured Click Control in Transformer-based Interactive Segmentation

Figure 3 for Structured Click Control in Transformer-based Interactive Segmentation

Figure 4 for Structured Click Control in Transformer-based Interactive Segmentation

Abstract:Click-point-based interactive segmentation has received widespread attention due to its efficiency. However, it's hard for existing algorithms to obtain precise and robust responses after multiple clicks. In this case, the segmentation results tend to have little change or are even worse than before. To improve the robustness of the response, we propose a structured click intent model based on graph neural networks, which adaptively obtains graph nodes via the global similarity of user-clicked Transformer tokens. Then the graph nodes will be aggregated to obtain structured interaction features. Finally, the dual cross-attention will be used to inject structured interaction features into vision Transformer features, thereby enhancing the control of clicks over segmentation results. Extensive experiments demonstrated the proposed algorithm can serve as a general structure in improving Transformer-based interactive segmenta?tion performance. The code and data will be released at https://github.com/hahamyt/scc.

* 10 pages, 6 figures, submitted to NeurIPS 2024

Via

Access Paper or Ask Questions

HybriMap: Hybrid Clues Utilization for Effective Vectorized HD Map Construction

Apr 17, 2024

Chi Zhang, Qi Song, Feifei Li, Yongquan Chen, Rui Huang

Abstract:Constructing vectorized high-definition maps from surround-view cameras has garnered significant attention in recent years. However, the commonly employed multi-stage sequential workflow in prevailing approaches often leads to the loss of early-stage information, particularly in perspective-view features. Usually, such loss is observed as an instance missing or shape mismatching in the final birds-eye-view predictions. To address this concern, we propose a novel approach, namely \textbf{HybriMap}, which effectively exploits clues from hybrid features to ensure the delivery of valuable information. Specifically, we design the Dual Enhancement Module, to enable both explicit integration and implicit modification under the guidance of hybrid features. Additionally, the perspective keypoints are utilized as supervision, further directing the feature enhancement process. Extensive experiments conducted on existing benchmarks have demonstrated the state-of-the-art performance of our proposed approach.

Via

Access Paper or Ask Questions

Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function

Mar 25, 2024

Laiyan Ding, Panwen Hu, Jie Li, Rui Huang

Abstract:Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used naive addition will result in inconsistent results. We argue that the inconsistency comes from the sparsity of RGB features upon projecting into 3D space, while TSDF features are dense, leading to imbalanced feature maps when summed up. To address this RGB-TSDF distribution difference, we propose a two-stage network with a 3D RGB feature completion module that completes RGB features with meaningful values for occluded areas. Moreover, we propose an effective classwise entropy loss function to punish inconsistency. Extensive experiments on public datasets verify that our method achieves state-of-the-art performance among methods that do not adopt extra data.

Via

Access Paper or Ask Questions

Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

Feb 29, 2024

Rui Huang, Sikun Yang, Heinz Koeppl

Figure 1 for Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

Figure 2 for Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

Figure 3 for Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

Figure 4 for Negative-Binomial Randomized Gamma Markov Processes for Heterogeneous Overdispersed Count Time Series

Abstract:Modeling count-valued time series has been receiving increasing attention since count time series naturally arise in physical and social domains. Poisson gamma dynamical systems (PGDSs) are newly-developed methods, which can well capture the expressive latent transition structure and bursty dynamics behind count sequences. In particular, PGDSs demonstrate superior performance in terms of data imputation and prediction, compared with canonical linear dynamical system (LDS) based methods. Despite these advantages, PGDS cannot capture the heterogeneous overdispersed behaviours of the underlying dynamic processes. To mitigate this defect, we propose a negative-binomial-randomized gamma Markov process, which not only significantly improves the predictive performance of the proposed dynamical system, but also facilitates the fast convergence of the inference algorithm. Moreover, we develop methods to estimate both factor-structured and graph-structured transition dynamics, which enable us to infer more explainable latent structure, compared with PGDSs. Finally, we demonstrate the explainable latent structure learned by the proposed method, and show its superior performance in imputing missing data and forecasting future observations, compared with the related models.

Via

Access Paper or Ask Questions

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

Jan 22, 2024

Rui Huang, Qingyi Zhao, Yan Xing, Sihua Gao, Weifeng Xu, Yuxiang Zhang, Wei Fan

Abstract:Multiscale convolutional neural network (CNN) has demonstrated remarkable capabilities in solving various vision problems. However, fusing features of different scales alwaysresults in large model sizes, impeding the application of multiscale CNNs in RGB-D saliency detection. In this paper, we propose a customized feature fusion module, called Saliency Enhanced Feature Fusion (SEFF), for RGB-D saliency detection. SEFF utilizes saliency maps of the neighboring scales to enhance the necessary features for fusing, resulting in more representative fused features. Our multiscale RGB-D saliency detector uses SEFF and processes images with three different scales. SEFF is used to fuse the features of RGB and depth images, as well as the features of decoders at different scales. Extensive experiments on five benchmark datasets have demonstrated the superiority of our method over ten SOTA saliency detectors.

* Accpeted by 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

Via

Access Paper or Ask Questions

Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Jan 04, 2024

Ruofei Wang, Renjie Wan, Zongyu Guo, Qing Guo, Rui Huang

Figure 1 for Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Figure 2 for Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Figure 3 for Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Figure 4 for Spy-Watermark: Robust Invisible Watermarking for Backdoor Attack

Abstract:Backdoor attack aims to deceive a victim model when facing backdoor instances while maintaining its performance on benign data. Current methods use manual patterns or special perturbations as triggers, while they often overlook the robustness against data corruption, making backdoor attacks easy to defend in practice. To address this issue, we propose a novel backdoor attack method named Spy-Watermark, which remains effective when facing data collapse and backdoor defense. Therein, we introduce a learnable watermark embedded in the latent domain of images, serving as the trigger. Then, we search for a watermark that can withstand collapse during image decoding, cooperating with several anti-collapse operations to further enhance the resilience of our trigger against data corruption. Extensive experiments are conducted on CIFAR10, GTSRB, and ImageNet datasets, demonstrating that Spy-Watermark overtakes ten state-of-the-art methods in terms of robustness and stealthiness.

* Accepted by ICASSP2024

Via

Access Paper or Ask Questions

Segment3D: Learning Fine-Grained Class-Agnostic 3D Segmentation without Manual Labels

Dec 28, 2023

Rui Huang, Songyou Peng, Ayca Takmaz, Federico Tombari, Marc Pollefeys, Shiji Song, Gao Huang, Francis Engelmann

Abstract:Current 3D scene segmentation methods are heavily dependent on manually annotated 3D training datasets. Such manual annotations are labor-intensive, and often lack fine-grained details. Importantly, models trained on this data typically struggle to recognize object classes beyond the annotated classes, i.e., they do not generalize well to unseen domains and require additional domain-specific annotations. In contrast, 2D foundation models demonstrate strong generalization and impressive zero-shot abilities, inspiring us to incorporate these characteristics from 2D models into 3D models. Therefore, we explore the use of image segmentation foundation models to automatically generate training labels for 3D segmentation. We propose Segment3D, a method for class-agnostic 3D scene segmentation that produces high-quality 3D segmentation masks. It improves over existing 3D segmentation models (especially on fine-grained masks), and enables easily adding new training data to further boost the segmentation performance -- all without the need for manual training labels.

* Project Page: http://segment3d.github.io

Via

Access Paper or Ask Questions