Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Sun

SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage

Dec 14, 2024

Zexuan Fan, Sunchun Zhou, Hengye Yang, Junyi Cai, Ran Cheng, Lige Liu, Tao Sun

Figure 1 for SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage

Figure 2 for SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage

Figure 3 for SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage

Figure 4 for SHIFT Planner: Speedy Hybrid Iterative Field and Segmented Trajectory Optimization with IKD-tree for Uniform Lightweight Coverage

Abstract:This paper introduces a comprehensive planning and navigation framework that address these limitations by integrating semantic mapping, adaptive coverage planning, dynamic obstacle avoidance and precise trajectory tracking. Our framework begins by generating panoptic occupancy local semantic maps and accurate localization information from data aligned between a monocular camera, IMU, and GPS. This information is combined with input terrain point clouds or preloaded terrain information to initialize the planning process. We propose the Radiant Field-Informed Coverage Planning algorithm, which utilizes a diffusion field model to dynamically adjust the robot's coverage trajectory and speed based on environmental attributes such as dirtiness and dryness. By modeling the spatial influence of the robot's actions using a Gaussian field, ensures a speed-optimized, uniform coverage trajectory while adapting to varying environmental conditions.

Via

Access Paper or Ask Questions

Average-Over-Time Spiking Neural Networks for Uncertainty Estimation in Regression

Nov 29, 2024

Tao Sun, Sander Bohté

Abstract:Uncertainty estimation is a standard tool to quantify the reliability of modern deep learning models, and crucial for many real-world applications. However, efficient uncertainty estimation methods for spiking neural networks, particularly for regression models, have been lacking. Here, we introduce two methods that adapt the Average-Over-Time Spiking Neural Network (AOT-SNN) framework to regression tasks, enhancing uncertainty estimation in event-driven models. The first method uses the heteroscedastic Gaussian approach, where SNNs predict both the mean and variance at each time step, thereby generating a conditional probability distribution of the target variable. The second method leverages the Regression-as-Classification (RAC) approach, reformulating regression as a classification problem to facilitate uncertainty estimation. We evaluate our approaches on both a toy dataset and several benchmark datasets, demonstrating that the proposed AOT-SNN models achieve performance comparable to or better than state-of-the-art deep neural network methods, particularly in uncertainty estimation. Our findings highlight the potential of SNNs for uncertainty estimation in regression tasks, providing an efficient and biologically inspired alternative for applications requiring both accuracy and energy efficiency.

Via

Access Paper or Ask Questions

Deep learning-based spatio-temporal fusion for high-fidelity ultra-high-speed x-ray radiography

Nov 27, 2024

Songyuan Tang, Tekin Bicer, Tao Sun, Kamel Fezzaa, Samuel J. Clark

Abstract:Full-field ultra-high-speed (UHS) x-ray imaging experiments have been well established to characterize various processes and phenomena. However, the potential of UHS experiments through the joint acquisition of x-ray videos with distinct configurations has not been fully exploited. In this paper, we investigate the use of a deep learning-based spatio-temporal fusion (STF) framework to fuse two complementary sequences of x-ray images and reconstruct the target image sequence with high spatial resolution, high frame rate, and high fidelity. We applied a transfer learning strategy to train the model and compared the peak signal-to-noise ratio (PSNR), average absolute difference (AAD), and structural similarity (SSIM) of the proposed framework on two independent x-ray datasets with those obtained from a baseline deep learning model, a Bayesian fusion framework, and the bicubic interpolation method. The proposed framework outperformed the other methods with various configurations of the input frame separations and image noise levels. With 3 subsequent images from the low resolution (LR) sequence of a 4-time lower spatial resolution and another 2 images from the high resolution (HR) sequence of a 20-time lower frame rate, the proposed approach achieved an average PSNR of 37.57 dB and 35.15 dB, respectively. When coupled with the appropriate combination of high-speed cameras, the proposed approach will enhance the performance and therefore scientific value of the UHS x-ray imaging experiments.

Via

Access Paper or Ask Questions

Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach

Nov 25, 2024

Weiyi Gong, Tao Sun, Hexin Bai, Jeng-Yuan Tsai, Haibin Ling, Qimin Yan

Figure 1 for Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach

Figure 2 for Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach

Figure 3 for Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach

Figure 4 for Graph Transformer Networks for Accurate Band Structure Prediction: An End-to-End Approach

Abstract:Predicting electronic band structures from crystal structures is crucial for understanding structure-property correlations in materials science. First-principles approaches are accurate but computationally intensive. Recent years, machine learning (ML) has been extensively applied to this field, while existing ML models predominantly focus on band gap predictions or indirect band structure estimation via solving predicted Hamiltonians. An end-to-end model to predict band structure accurately and efficiently is still lacking. Here, we introduce a graph Transformer-based end-to-end approach that directly predicts band structures from crystal structures with high accuracy. Our method leverages the continuity of the k-path and treat continuous bands as a sequence. We demonstrate that our model not only provides accurate band structure predictions but also can derive other properties (such as band gap, band center, and band dispersion) with high accuracy. We verify the model performance on large and diverse datasets.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions

MdEval: Massively Multilingual Code Debugging

Nov 04, 2024

Shukai Liu, Linzheng Chai, Jian Yang, Jiajun Shi, He Zhu, Liran Wang, Ke Jin, Wei Zhang, Hualei Zhu, Shuyue Guo(+8 more)

Figure 1 for MdEval: Massively Multilingual Code Debugging

Figure 2 for MdEval: Massively Multilingual Code Debugging

Figure 3 for MdEval: Massively Multilingual Code Debugging

Figure 4 for MdEval: Massively Multilingual Code Debugging

Abstract:Code large language models (LLMs) have made significant progress in code debugging by directly generating the correct code based on the buggy code snippet. Programming benchmarks, typically consisting of buggy code snippet and their associated test cases, are used to assess the debugging capabilities of LLMs. However, many existing benchmarks primarily focus on Python and are often limited in terms of language diversity (e.g., DebugBench and DebugEval). To advance the field of multilingual debugging with LLMs, we propose the first massively multilingual debugging benchmark, which includes 3.6K test samples of 18 programming languages and covers the automated program repair (APR) task, the code review (CR) task, and the bug identification (BI) task. Further, we introduce the debugging instruction corpora MDEVAL-INSTRUCT by injecting bugs into the correct multilingual queries and solutions (xDebugGen). Further, a multilingual debugger xDebugCoder trained on MDEVAL-INSTRUCT as a strong baseline specifically to handle the bugs of a wide range of programming languages (e.g. "Missing Mut" in language Rust and "Misused Macro Definition" in language C). Our extensive experiments on MDEVAL reveal a notable performance gap between open-source models and closed-source LLMs (e.g., GPT and Claude series), highlighting huge room for improvement in multilingual code debugging scenarios.

* 15 pages

Via

Access Paper or Ask Questions

Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Oct 21, 2024

Tao Sun, Xinwang Liu, Kun Yuan

Figure 1 for Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Figure 2 for Gradient Normalization with(out) Clipping Ensures Convergence of Nonconvex SGD under Heavy-Tailed Noise with Improved Results

Abstract:This paper investigates Gradient Normalization Stochastic Gradient Descent without Clipping (NSGDC) and its variance reduction variant (NSGDC-VR) for nonconvex optimization under heavy-tailed noise. We present significant improvements in the theoretical results for both algorithms, including the removal of logarithmic factors from the convergence rates and the recovery of the convergence rate to match the deterministic case when the noise variance {\sigma} is zero. Additionally, we demonstrate that gradient normalization alone, assuming individual Lipschitz smoothness, is sufficient to ensure convergence of SGD under heavy-tailed noise, eliminating the need for gradient clipping. Furthermore, we introduce accelerated nonconvex algorithms that utilize second-order Lipschitz smoothness to achieve enhanced convergence rates in the presence of heavy-tailed noise. Our findings offer a deeper understanding of how gradient normalization and variance reduction techniques can be optimized for robust performance in challenging optimization scenarios.

Via

Access Paper or Ask Questions

Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Oct 17, 2024

Tangwen Qian, Junhe Li, Yile Chen, Gao Cong, Tao Sun, Fei Wang, Yongjun Xu

Figure 1 for Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Figure 2 for Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Figure 3 for Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Figure 4 for Context-Enhanced Multi-View Trajectory Representation Learning: Bridging the Gap through Self-Supervised Models

Abstract:Modeling trajectory data with generic-purpose dense representations has become a prevalent paradigm for various downstream applications, such as trajectory classification, travel time estimation and similarity computation. However, existing methods typically rely on trajectories from a single spatial view, limiting their ability to capture the rich contextual information that is crucial for gaining deeper insights into movement patterns across different geospatial contexts. To this end, we propose MVTraj, a novel multi-view modeling method for trajectory representation learning. MVTraj integrates diverse contextual knowledge, from GPS to road network and points-of-interest to provide a more comprehensive understanding of trajectory data. To align the learning process across multiple views, we utilize GPS trajectories as a bridge and employ self-supervised pretext tasks to capture and distinguish movement patterns across different spatial views. Following this, we treat trajectories from different views as distinct modalities and apply a hierarchical cross-modal interaction module to fuse the representations, thereby enriching the knowledge derived from multiple sources. Extensive experiments on real-world datasets demonstrate that MVTraj significantly outperforms existing baselines in tasks associated with various spatial views, validating its effectiveness and practical utility in spatio-temporal modeling.

Via

Access Paper or Ask Questions

Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

Oct 16, 2024

Lu Pang, Tao Sun, Weimin Lyu, Haibin Ling, Chao Chen

Figure 1 for Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

Figure 2 for Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

Figure 3 for Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

Figure 4 for Long-Tailed Backdoor Attack Using Dynamic Data Augmentation Operations

Abstract:Recently, backdoor attack has become an increasing security threat to deep neural networks and drawn the attention of researchers. Backdoor attacks exploit vulnerabilities in third-party pretrained models during the training phase, enabling them to behave normally for clean samples and mispredict for samples with specific triggers. Existing backdoor attacks mainly focus on balanced datasets. However, real-world datasets often follow long-tailed distributions. In this paper, for the first time, we explore backdoor attack on such datasets. Specifically, we first analyze the influence of data imbalance on backdoor attack. Based on our analysis, we propose an effective backdoor attack named Dynamic Data Augmentation Operation (D$^2$AO). We design D$^2$AO selectors to select operations depending jointly on the class, sample type (clean vs. backdoored) and sample features. Meanwhile, we develop a trigger generator to generate sample-specific triggers. Through simultaneous optimization of the backdoored model and trigger generator, guided by dynamic data augmentation operation selectors, we achieve significant advancements. Extensive experiments demonstrate that our method can achieve the state-of-the-art attack performance while preserving the clean accuracy.

Via

Access Paper or Ask Questions

Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Oct 10, 2024

Shenao Zhang, Zhihan Liu, Boyi Liu, Yufeng Zhang, Yingxiang Yang, Yongfei Liu, Liyu Chen, Tao Sun, Zhaoran Wang

Figure 1 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Figure 2 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Figure 3 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Figure 4 for Reward-Augmented Data Enhances Direct Preference Alignment of LLMs

Abstract:Preference alignment in Large Language Models (LLMs) has significantly improved their ability to adhere to human instructions and intentions. However, existing direct alignment algorithms primarily focus on relative preferences and often overlook the qualitative aspects of responses. Striving to maximize the implicit reward gap between the chosen and the slightly inferior rejected responses can cause overfitting and unnecessary unlearning of the high-quality rejected responses. The unawareness of the reward scores also drives the LLM to indiscriminately favor the low-quality chosen responses and fail to generalize to responses with the highest rewards, which are sparse in data. To overcome these shortcomings, our study introduces reward-conditioned LLM policies that discern and learn from the entire spectrum of response quality within the dataset, helping extrapolate to more optimal regions. We propose an effective yet simple data relabeling method that conditions the preference pairs on quality scores to construct a reward-augmented dataset. This dataset is easily integrated with existing direct alignment algorithms and is applicable to any preference dataset. The experimental results across instruction-following benchmarks including AlpacaEval, MT-Bench, and Arena-Hard-Auto demonstrate that our approach consistently boosts the performance of DPO by a considerable margin across diverse models. Additionally, our method improves the average accuracy on various academic benchmarks. When applying our method to on-policy data, the resulting DPO model achieves SOTA results on AlpacaEval. Through ablation studies, we demonstrate that our method not only maximizes the utility of preference data but also mitigates the issue of unlearning, demonstrating its broad effectiveness beyond mere dataset expansion. Our code is available at https://github.com/shenao-zhang/reward-augmented-preference.

Via

Access Paper or Ask Questions

Backdooring Vision-Language Models with Out-Of-Distribution Data

Oct 02, 2024

Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi, Lijie Hu, Haibin Ling, Chao Chen

Figure 1 for Backdooring Vision-Language Models with Out-Of-Distribution Data

Figure 2 for Backdooring Vision-Language Models with Out-Of-Distribution Data

Figure 3 for Backdooring Vision-Language Models with Out-Of-Distribution Data

Figure 4 for Backdooring Vision-Language Models with Out-Of-Distribution Data

Abstract:The emergence of Vision-Language Models (VLMs) represents a significant advancement in integrating computer vision with Large Language Models (LLMs) to generate detailed text descriptions from visual inputs. Despite their growing importance, the security of VLMs, particularly against backdoor attacks, is under explored. Moreover, prior works often assume attackers have access to the original training data, which is often unrealistic. In this paper, we address a more practical and challenging scenario where attackers must rely solely on Out-Of-Distribution (OOD) data. We introduce VLOOD (Backdooring Vision-Language Models with Out-of-Distribution Data), a novel approach with two key contributions: (1) demonstrating backdoor attacks on VLMs in complex image-to-text tasks while minimizing degradation of the original semantics under poisoned inputs, and (2) proposing innovative techniques for backdoor injection without requiring any access to the original training data. Our evaluation on image captioning and visual question answering (VQA) tasks confirms the effectiveness of VLOOD, revealing a critical security vulnerability in VLMs and laying the foundation for future research on securing multimodal models against sophisticated threats.

Via

Access Paper or Ask Questions