Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yao Li

TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Aug 27, 2024

Bongsoo Yi, Rongjie Lai, Yao Li

Figure 1 for TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Figure 2 for TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Figure 3 for TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Figure 4 for TART: Boosting Clean Accuracy Through Tangent Direction Guided Adversarial Training

Abstract:Adversarial training has been shown to be successful in enhancing the robustness of deep neural networks against adversarial attacks. However, this robustness is accompanied by a significant decline in accuracy on clean data. In this paper, we propose a novel method, called Tangent Direction Guided Adversarial Training (TART), that leverages the tangent space of the data manifold to ameliorate the existing adversarial defense algorithms. We argue that training with adversarial examples having large normal components significantly alters the decision boundary and hurts accuracy. TART mitigates this issue by estimating the tangent direction of adversarial examples and allocating an adaptive perturbation limit according to the norm of their tangential component. To the best of our knowledge, our paper is the first work to consider the concept of tangent space and direction in the context of adversarial defense. We validate the effectiveness of TART through extensive experiments on both simulated and benchmark datasets. The results demonstrate that TART consistently boosts clean accuracy while retaining a high level of robustness against adversarial attacks. Our findings suggest that incorporating the geometric properties of data can lead to more effective and efficient adversarial training methods.

Via

Access Paper or Ask Questions

Biased Dueling Bandits with Stochastic Delayed Feedback

Aug 26, 2024

Bongsoo Yi, Yue Kang, Yao Li

Figure 1 for Biased Dueling Bandits with Stochastic Delayed Feedback

Figure 2 for Biased Dueling Bandits with Stochastic Delayed Feedback

Abstract:The dueling bandit problem, an essential variation of the traditional multi-armed bandit problem, has become significantly prominent recently due to its broad applications in online advertising, recommendation systems, information retrieval, and more. However, in many real-world applications, the feedback for actions is often subject to unavoidable delays and is not immediately available to the agent. This partially observable issue poses a significant challenge to existing dueling bandit literature, as it significantly affects how quickly and accurately the agent can update their policy on the fly. In this paper, we introduce and examine the biased dueling bandit problem with stochastic delayed feedback, revealing that this new practical problem will delve into a more realistic and intriguing scenario involving a preference bias between the selections. We present two algorithms designed to handle situations involving delay. Our first algorithm, requiring complete delay distribution information, achieves the optimal regret bound for the dueling bandit problem when there is no delay. The second algorithm is tailored for situations where the distribution is unknown, but only the expected value of delay is available. We provide a comprehensive regret analysis for the two proposed algorithms and then evaluate their empirical performance on both synthetic and real datasets.

Via

Access Paper or Ask Questions

RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Jul 27, 2024

Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Yao Li, Yanyong Zhang

Figure 1 for RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Figure 2 for RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Figure 3 for RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Figure 4 for RayFormer: Improving Query-Based Multi-Camera 3D Object Detection via Ray-Centric Strategies

Abstract:The recent advances in query-based multi-camera 3D object detection are featured by initializing object queries in the 3D space, and then sampling features from perspective-view images to perform multi-round query refinement. In such a framework, query points near the same camera ray are likely to sample similar features from very close pixels, resulting in ambiguous query features and degraded detection accuracy. To this end, we introduce RayFormer, a camera-ray-inspired query-based 3D object detector that aligns the initialization and feature extraction of object queries with the optical characteristics of cameras. Specifically, RayFormer transforms perspective-view image features into bird's eye view (BEV) via the lift-splat-shoot method and segments the BEV map to sectors based on the camera rays. Object queries are uniformly and sparsely initialized along each camera ray, facilitating the projection of different queries onto different areas in the image to extract distinct features. Besides, we leverage the instance information of images to supplement the uniformly initialized object queries by further involving additional queries along the ray from 2D object detection boxes. To extract unique object-level features that cater to distinct queries, we design a ray sampling method that suitably organizes the distribution of feature sampling points on both images and bird's eye view. Extensive experiments are conducted on the nuScenes dataset to validate our proposed ray-inspired model design. The proposed RayFormer achieves 55.5% mAP and 63.3% NDS, respectively. Our codes will be made available.

* Accepted by ACM Multimedia 2024

Via

Access Paper or Ask Questions

SciCode: A Research Coding Benchmark Curated by Scientists

Jul 18, 2024

Minyang Tian, Luyu Gao, Shizhuo Dylan Zhang, Xinan Chen, Cunwei Fan, Xuefei Guo, Roland Haas, Pan Ji, Kittithat Krongchon, Yao Li(+20 more)

Figure 1 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 2 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 3 for SciCode: A Research Coding Benchmark Curated by Scientists

Figure 4 for SciCode: A Research Coding Benchmark Curated by Scientists

Abstract:Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generate code for solving real scientific research problems. Incorporating input from scientists and AI researchers in 16 diverse natural science sub-fields, including mathematics, physics, chemistry, biology, and materials science, we created a scientist-curated coding benchmark, SciCode. The problems in SciCode naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems. It offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. We believe that SciCode demonstrates both contemporary LMs' progress towards becoming helpful scientific assistants and sheds light on the development and evaluation of scientific AI in the future.

* 25 pages, 9 figures, 7 tables

Via

Access Paper or Ask Questions

Uniformly Accelerated Motion Model for Inter Prediction

Jul 16, 2024

Zhuoyuan Li, Yao Li, Chuanbo Tang, Li Li, Dong Liu, Feng Wu

Figure 1 for Uniformly Accelerated Motion Model for Inter Prediction

Figure 2 for Uniformly Accelerated Motion Model for Inter Prediction

Figure 3 for Uniformly Accelerated Motion Model for Inter Prediction

Figure 4 for Uniformly Accelerated Motion Model for Inter Prediction

Abstract:Inter prediction is a key technology to reduce the temporal redundancy in video coding. In natural videos, there are usually multiple moving objects with variable velocity, resulting in complex motion fields that are difficult to represent compactly. In Versatile Video Coding (VVC), existing inter prediction methods usually assume uniform speed motion between consecutive frames and use the linear models for motion estimation (ME) and motion compensation (MC), which may not well handle the complex motion fields in the real world. To address these issues, we introduce a uniformly accelerated motion model (UAMM) to exploit motion-related elements (velocity, acceleration) of moving objects between the video frames, and further combine them to assist the inter prediction methods to handle the variable motion in the temporal domain. Specifically, first, the theory of UAMM is mentioned. Second, based on that, we propose the UAMM-based parameter derivation and extrapolation schemes in the coding process. Third, we integrate the UAMM into existing inter prediction modes (Merge, MMVD, CIIP) to achieve higher prediction accuracy. The proposed method is implemented into the VVC reference software, VTM version 12.0. Experimental results show that the proposed method achieves up to 0.38% and on average 0.13% BD-rate reduction compared to the VTM anchor, under the Low-delay P configuration, with a slight increase of time complexity on the encoding/decoding side.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions

Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Jul 16, 2024

Guoliang You, Xiaomeng Chu, Yifan Duan, Wenyu Zhang, Xingchen Li, Sha Zhang, Yao Li, Jianmin Ji, Yanyong Zhang

Figure 1 for Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Figure 2 for Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Figure 3 for Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Figure 4 for Perception Helps Planning: Facilitating Multi-Stage Lane-Level Integration via Double-Edge Structures

Abstract:When planning for autonomous driving, it is crucial to consider essential traffic elements such as lanes, intersections, traffic regulations, and dynamic agents. However, they are often overlooked by the traditional end-to-end planning methods, likely leading to inefficiencies and non-compliance with traffic regulations. In this work, we endeavor to integrate the perception of these elements into the planning task. To this end, we propose Perception Helps Planning (PHP), a novel framework that reconciles lane-level planning with perception. This integration ensures that planning is inherently aligned with traffic constraints, thus facilitating safe and efficient driving. Specifically, PHP focuses on both edges of a lane for planning and perception purposes, taking into consideration the 3D positions of both lane edges and attributes for lane intersections, lane directions, lane occupancy, and planning. In the algorithmic design, the process begins with the transformer encoding multi-camera images to extract the above features and predicting lane-level perception results. Next, the hierarchical feature early fusion module refines the features for predicting planning attributes. Finally, the double-edge interpreter utilizes a late-fusion process specifically designed to integrate lane-level perception and planning information, culminating in the generation of vehicle control signals. Experiments on three Carla benchmarks show significant improvements in driving score of 27.20%, 33.47%, and 15.54% over existing algorithms, respectively, achieving the state-of-the-art performance, with the system operating up to 22.57 FPS.

Via

Access Paper or Ask Questions

In-Loop Filtering via Trained Look-Up Tables

Jul 15, 2024

Zhuoyuan Li, Jiacheng Li, Yao Li, Li Li, Dong Liu, Feng Wu

Figure 1 for In-Loop Filtering via Trained Look-Up Tables

Figure 2 for In-Loop Filtering via Trained Look-Up Tables

Figure 3 for In-Loop Filtering via Trained Look-Up Tables

Figure 4 for In-Loop Filtering via Trained Look-Up Tables

Abstract:In-loop filtering (ILF) is a key technology for removing the artifacts in image/video coding standards. Recently, neural network-based in-loop filtering methods achieve remarkable coding gains beyond the capability of advanced video coding standards, which becomes a powerful coding tool candidate for future video coding standards. However, the utilization of deep neural networks brings heavy time and computational complexity, and high demands of high-performance hardware, which is challenging to apply to the general uses of coding scene. To address this limitation, inspired by explorations in image restoration, we propose an efficient and practical in-loop filtering scheme by adopting the Look-up Table (LUT). We train the DNN of in-loop filtering within a fixed filtering reference range, and cache the output values of the DNN into a LUT via traversing all possible inputs. At testing time in the coding process, the filtered pixel is generated by locating input pixels (to-be-filtered pixel with reference pixels) and interpolating cached filtered pixel values. To further enable the large filtering reference range with the limited storage cost of LUT, we introduce the enhanced indexing mechanism in the filtering process, and clipping/finetuning mechanism in the training. The proposed method is implemented into the Versatile Video Coding (VVC) reference software, VTM-11.0. Experimental results show that the ultrafast, very fast, and fast mode of the proposed method achieves on average 0.13%/0.34%/0.51%, and 0.10%/0.27%/0.39% BD-rate reduction, under the all intra (AI) and random access (RA) configurations. Especially, our method has friendly time and computational complexity, only 101%/102%-104%/108% time increase with 0.13-0.93 kMACs/pixel, and only 164-1148 KB storage cost for a single model. Our solution may shed light on the journey of practical neural network-based coding tool evolution.

* 11 pages, 6 figures

Via

Access Paper or Ask Questions

Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Jul 04, 2024

Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

Figure 1 for Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Figure 2 for Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Figure 3 for Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Figure 4 for Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Abstract:Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed. To fill this gap, this paper proposes a novel online defense algorithm that effectively counters syntax-based as well as special token-based backdoor attacks. The algorithm replaces semantically meaningful words in sentences with entirely different ones but preserves the syntactic templates or special tokens, and then compares the predicted labels before and after the substitution to determine whether a sentence contains triggers. Experimental results confirm the algorithm's performance against these two types of triggers, offering a comprehensive defense strategy for model integrity.

Via

Access Paper or Ask Questions

IVCA: Inter-Relation-Aware Video Complexity Analyzer

Jun 29, 2024

Junqi Liao, Yao Li, Zhuoyuan Li, Li Li, Dong Liu

Figure 1 for IVCA: Inter-Relation-Aware Video Complexity Analyzer

Figure 2 for IVCA: Inter-Relation-Aware Video Complexity Analyzer

Figure 3 for IVCA: Inter-Relation-Aware Video Complexity Analyzer

Abstract:To meet the real-time analysis requirements of video streaming applications, we propose an inter-relation-aware video complexity analyzer (IVCA) as an extension to VCA. The IVCA addresses the limitation of VCA by considering inter-frame relations, namely motion and reference structure. First, we enhance the accuracy of temporal features by introducing feature-domain motion estimation into the IVCA. Next, drawing inspiration from the hierarchical reference structure in codecs, we design layer-aware weights to adjust the majorities of frame complexity in different layers. Additionally, we expand the scope of temporal features by considering frames that be referred to, rather than relying solely on the previous frame. Experimental results show the significant improvement in complexity estimation accuracy achieved by IVCA, with minimal time complexity increase.

* The report for the solution of second prize winner in ICIP 2024 Grand Challenge on Video Complexity (Team: USTC-iVC_Team1, USTC-iVC_Team2)

Via

Access Paper or Ask Questions

Modeling, Inference, and Prediction in Mobility-Based Compartmental Models for Epidemiology

Jun 17, 2024

Ning Jiang, Weiqi Chu, Yao Li

Abstract:Classical compartmental models in epidemiology often struggle to accurately capture real-world dynamics due to their inability to address the inherent heterogeneity of populations. In this paper, we introduce a novel approach that incorporates heterogeneity through a mobility variable, transforming the traditional ODE system into a system of integro-differential equations that describe the dynamics of population densities across different compartments. Our results show that, for the same basic reproduction number, our mobility-based model predicts a smaller final pandemic size compared to classic compartmental models, whose population densities are represented as Dirac delta functions in our density-based framework. This addresses the overestimation issue common in many classical models. Additionally, we demonstrate that the time series of the infected population is sufficient to uniquely identify the mobility distribution. We reconstruct this distribution using a machine-learning-based framework, providing both theoretical and algorithmic support to effectively constrain the mobility-based model with real-world data.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions