Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiming Li

Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

Dec 21, 2024

Xuemin Chi, Yiming Li, Jihao Huang, Bolun Dai, Zhitao Liu, Sylvain Calinon

Figure 1 for Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

Figure 2 for Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

Figure 3 for Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

Figure 4 for Safe Dynamic Motion Generation in Configuration Space Using Differentiable Distance Fields

Abstract:Generating collision-free motions in dynamic environments is a challenging problem for high-dimensional robotics, particularly under real-time constraints. Control Barrier Functions (CBFs), widely utilized in safety-critical control, have shown significant potential for motion generation. However, for high-dimensional robot manipulators, existing QP formulations and CBF-based methods rely on positional information, overlooking higher-order derivatives such as velocities. This limitation may lead to reduced success rates, decreased performance, and inadequate safety constraints. To address this, we construct time-varying CBFs (TVCBFs) that consider velocity conditions for obstacles. Our approach leverages recent developments on distance fields for articulated manipulators, a differentiable representation that enables the mapping of objects' position and velocity into the robot's joint space, offering a comprehensive understanding of the system's interactions. This allows the manipulator to be treated as a point-mass system thus simplifying motion generation tasks. Additionally, we introduce a time-varying control Lyapunov function (TVCLF) to enable whole-body contact motions. Our approach integrates the TVCBF, TVCLF, and manipulator physical constraints within a unified QP framework. We validate our method through simulations and comparisons with state-of-the-art approaches, demonstrating its effectiveness on a 7-axis Franka robot in real-world experiments.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Dec 19, 2024

Qingjie Zhang, Han Qiu, Di Wang, Haoting Qian, Yiming Li, Tianwei Zhang, Minlie Huang

Figure 1 for Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Figure 2 for Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Figure 3 for Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Figure 4 for Understanding the Dark Side of LLMs' Intrinsic Self-Correction

Abstract:Intrinsic self-correction was proposed to improve LLMs' responses via feedback prompts solely based on their inherent capability. However, recent works show that LLMs' intrinsic self-correction fails without oracle labels as feedback prompts. In this paper, we aim to interpret LLMs' intrinsic self-correction for different tasks, especially for those failure cases. By including one simple task and three complex tasks with state-of-the-art (SOTA) LLMs like ChatGPT families (o1, 4o, 3.5-turbo) and Llama families (2-7B, 3-8B, and 3.1-8B), we design three interpretation methods to reveal the dark side of LLMs' intrinsic self-correction. We identify intrinsic self-correction can (1) cause LLMs to waver both intermedia and final answers and lead to prompt bias on simple factual questions; (2) introduce human-like cognitive bias on complex tasks. In light of our findings, we also provide two simple yet effective strategies for alleviation: question repeating and supervised fine-tuning with a few samples. We open-source our work at https://x-isc.info/.

Via

Access Paper or Ask Questions

SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution

Dec 13, 2024

Runyi Hu, Jie Zhang, Yiming Li, Jiwei Li, Qing Guo, Han Qiu, Tianwei Zhang

Figure 1 for SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution

Figure 2 for SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution

Figure 3 for SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution

Figure 4 for SuperMark: Robust and Training-free Image Watermarking via Diffusion-based Super-Resolution

Abstract:In today's digital landscape, the blending of AI-generated and authentic content has underscored the need for copyright protection and content authentication. Watermarking has become a vital tool to address these challenges, safeguarding both generated and real content. Effective watermarking methods must withstand various distortions and attacks. Current deep watermarking techniques often use an encoder-noise layer-decoder architecture and include distortions to enhance robustness. However, they struggle to balance robustness and fidelity and remain vulnerable to adaptive attacks, despite extensive training. To overcome these limitations, we propose SuperMark, a robust, training-free watermarking framework. Inspired by the parallels between watermark embedding/extraction in watermarking and the denoising/noising processes in diffusion models, SuperMark embeds the watermark into initial Gaussian noise using existing techniques. It then applies pre-trained Super-Resolution (SR) models to denoise the watermarked noise, producing the final watermarked image. For extraction, the process is reversed: the watermarked image is inverted back to the initial watermarked noise via DDIM Inversion, from which the embedded watermark is extracted. This flexible framework supports various noise injection methods and diffusion-based SR models, enabling enhanced customization. The robustness of the DDIM Inversion process against perturbations allows SuperMark to achieve strong resilience to distortions while maintaining high fidelity. Experiments demonstrate that SuperMark achieves fidelity comparable to existing methods while significantly improving robustness. Under standard distortions, it achieves an average watermark extraction accuracy of 99.46%, and 89.29% under adaptive attacks. Moreover, SuperMark shows strong transferability across datasets, SR models, embedding methods, and resolutions.

* robust image watermarking

Via

Access Paper or Ask Questions

Extrapolated Urban View Synthesis Benchmark

Dec 10, 2024

Xiangyu Han, Zhen Jia, Boyi Li, Yan Wang, Boris Ivanovic, Yurong You, Lingjie Liu, Yue Wang, Marco Pavone, Chen Feng(+1 more)

Figure 1 for Extrapolated Urban View Synthesis Benchmark

Figure 2 for Extrapolated Urban View Synthesis Benchmark

Figure 3 for Extrapolated Urban View Synthesis Benchmark

Figure 4 for Extrapolated Urban View Synthesis Benchmark

Abstract:Photorealistic simulators are essential for the training and evaluation of vision-centric autonomous vehicles (AVs). At their core is Novel View Synthesis (NVS), a crucial capability that generates diverse unseen viewpoints to accommodate the broad and continuous pose distribution of AVs. Recent advances in radiance fields, such as 3D Gaussian Splatting, achieve photorealistic rendering at real-time speeds and have been widely used in modeling large-scale driving scenes. However, their performance is commonly evaluated using an interpolated setup with highly correlated training and test views. In contrast, extrapolation, where test views largely deviate from training views, remains underexplored, limiting progress in generalizable simulation technology. To address this gap, we leverage publicly available AV datasets with multiple traversals, multiple vehicles, and multiple cameras to build the first Extrapolated Urban View Synthesis (EUVS) benchmark. Meanwhile, we conduct quantitative and qualitative evaluations of state-of-the-art Gaussian Splatting methods across different difficulty levels. Our results show that Gaussian Splatting is prone to overfitting to training views. Besides, incorporating diffusion priors and improving geometry cannot fundamentally improve NVS under large view changes, highlighting the need for more robust approaches and large-scale training. We have released our data to help advance self-driving and urban robotics simulation technology.

* Project page: https://ai4ce.github.io/EUVS-Benchmark/

Via

Access Paper or Ask Questions

A Riemannian Take on Distance Fields and Geodesic Flows in Robotics

Dec 09, 2024

Yiming Li, Jiacheng Qiu, Sylvain Calinon

Abstract:Distance functions are crucial in robotics for representing spatial relationships between the robot and the environment. It provides an implicit representation of continuous and differentiable shapes, which can seamlessly be combined with control, optimization, and learning techniques. While standard distance fields rely on the Euclidean metric, many robotic tasks inherently involve non-Euclidean structures. To this end, we generalize the use of Euclidean distance fields to more general metric spaces by solving a Riemannian eikonal equation, a first-order partial differential equation, whose solution defines a distance field and its associated gradient flow on the manifold, enabling the computation of geodesics and globally length-minimizing paths. We show that this \emph{geodesic distance field} can also be exploited in the robot configuration space. To realize this concept, we exploit physics-informed neural networks to solve the eikonal equation for high-dimensional spaces, which provides a flexible and scalable representation without the need for discretization. Furthermore, a variant of our neural eikonal solver is introduced, which enables the gradient flow to march across both task and configuration spaces. As an example of application, we validate the proposed approach in an energy-aware motion generation task. This is achieved by considering a manifold defined by a Riemannian metric in configuration space, effectively taking the property of the robot's dynamics into account. Our approach produces minimal-energy trajectories for a 7-axis Franka robot by iteratively tracking geodesics through gradient flow backpropagation.

* 17 pages, 11 figures

Via

Access Paper or Ask Questions

SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Dec 06, 2024

Zilan Wang, Junfeng Guo, Jiacheng Zhu, Yiming Li, Heng Huang, Muhao Chen, Zhengzhong Tu

Figure 1 for SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Figure 2 for SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Figure 3 for SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Figure 4 for SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models

Abstract:Recent advances in large-scale text-to-image (T2I) diffusion models have enabled a variety of downstream applications, including style customization, subject-driven personalization, and conditional generation. As T2I models require extensive data and computational resources for training, they constitute highly valued intellectual property (IP) for their legitimate owners, yet making them incentive targets for unauthorized fine-tuning by adversaries seeking to leverage these models for customized, usually profitable applications. Existing IP protection methods for diffusion models generally involve embedding watermark patterns and then verifying ownership through generated outputs examination, or inspecting the model's feature space. However, these techniques are inherently ineffective in practical scenarios when the watermarked model undergoes fine-tuning, and the feature space is inaccessible during verification ((i.e., black-box setting). The model is prone to forgetting the previously learned watermark knowledge when it adapts to a new task. To address this challenge, we propose SleeperMark, a novel framework designed to embed resilient watermarks into T2I diffusion models. SleeperMark explicitly guides the model to disentangle the watermark information from the semantic concepts it learns, allowing the model to retain the embedded watermark while continuing to be fine-tuned to new downstream tasks. Our extensive experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models, including latent diffusion models (e.g., Stable Diffusion) and pixel diffusion models (e.g., DeepFloyd-IF), showing robustness against downstream fine-tuning and various attacks at both the image and model levels, with minimal impact on the model's generative capability. The code is available at https://github.com/taco-group/SleeperMark.

Via

Access Paper or Ask Questions

FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Nov 29, 2024

Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li

Figure 1 for FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Figure 2 for FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Figure 3 for FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Figure 4 for FLARE: Towards Universal Dataset Purification against Backdoor Attacks

Abstract:Deep neural networks (DNNs) are susceptible to backdoor attacks, where adversaries poison datasets with adversary-specified triggers to implant hidden backdoors, enabling malicious manipulation of model predictions. Dataset purification serves as a proactive defense by removing malicious training samples to prevent backdoor injection at its source. We first reveal that the current advanced purification methods rely on a latent assumption that the backdoor connections between triggers and target labels in backdoor attacks are simpler to learn than the benign features. We demonstrate that this assumption, however, does not always hold, especially in all-to-all (A2A) and untargeted (UT) attacks. As a result, purification methods that analyze the separation between the poisoned and benign samples in the input-output space or the final hidden layer space are less effective. We observe that this separability is not confined to a single layer but varies across different hidden layers. Motivated by this understanding, we propose FLARE, a universal purification method to counter various backdoor attacks. FLARE aggregates abnormal activations from all hidden layers to construct representations for clustering. To enhance separation, FLARE develops an adaptive subspace selection algorithm to isolate the optimal space for dividing an entire dataset into two clusters. FLARE assesses the stability of each cluster and identifies the cluster with higher stability as poisoned. Extensive evaluations on benchmark datasets demonstrate the effectiveness of FLARE against 22 representative backdoor attacks, including all-to-one (A2O), all-to-all (A2A), and untargeted (UT) attacks, and its robustness to adaptive attacks.

* 13 pages

Via

Access Paper or Ask Questions

Unleashing the Power of Data Synthesis in Visual Localization

Nov 28, 2024

Sihang Li, Siqi Tan, Bowen Chang, Jing Zhang, Chen Feng, Yiming Li

Figure 1 for Unleashing the Power of Data Synthesis in Visual Localization

Figure 2 for Unleashing the Power of Data Synthesis in Visual Localization

Figure 3 for Unleashing the Power of Data Synthesis in Visual Localization

Figure 4 for Unleashing the Power of Data Synthesis in Visual Localization

Abstract:Visual localization, which estimates a camera's pose within a known scene, is a long-standing challenge in vision and robotics. Recent end-to-end methods that directly regress camera poses from query images have gained attention for fast inference. However, existing methods often struggle to generalize to unseen views. In this work, we aim to unleash the power of data synthesis to promote the generalizability of pose regression. Specifically, we lift real 2D images into 3D Gaussian Splats with varying appearance and deblurring abilities, which are then used as a data engine to synthesize more posed images. To fully leverage the synthetic data, we build a two-branch joint training pipeline, with an adversarial discriminator to bridge the syn-to-real gap. Experiments on established benchmarks show that our method outperforms state-of-the-art end-to-end approaches, reducing translation and rotation errors by 50% and 21.6% on indoor datasets, and 35.56% and 38.7% on outdoor datasets. We also validate the effectiveness of our method in dynamic driving scenarios under varying weather conditions. Notably, as data synthesis scales up, our method exhibits a growing ability to interpolate and extrapolate training data for localizing unseen views. Project Page: https://ai4ce.github.io/RAP/

* 24 pages, 21 figures

Via

Access Paper or Ask Questions

When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Nov 19, 2024

Huaizhi Ge, Yiming Li, Qifan Wang, Yongfeng Zhang, Ruixiang Tang

Figure 1 for When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Figure 2 for When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Figure 3 for When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Figure 4 for When Backdoors Speak: Understanding LLM Backdoor Attacks Through Model-Generated Explanations

Abstract:Large Language Models (LLMs) are vulnerable to backdoor attacks, where hidden triggers can maliciously manipulate model behavior. While several backdoor attack methods have been proposed, the mechanisms by which backdoor functions operate in LLMs remain underexplored. In this paper, we move beyond attacking LLMs and investigate backdoor functionality through the novel lens of natural language explanations. Specifically, we leverage LLMs' generative capabilities to produce human-understandable explanations for their decisions, allowing us to compare explanations for clean and poisoned samples. We explore various backdoor attacks and embed the backdoor into LLaMA models for multiple tasks. Our experiments show that backdoored models produce higher-quality explanations for clean data compared to poisoned data, while generating significantly more consistent explanations for poisoned data than for clean data. We further analyze the explanation generation process, revealing that at the token level, the explanation token of poisoned samples only appears in the final few transformer layers of the LLM. At the sentence level, attention dynamics indicate that poisoned inputs shift attention from the input context when generating the explanation. These findings deepen our understanding of backdoor attack mechanisms in LLMs and offer a framework for detecting such vulnerabilities through explainability techniques, contributing to the development of more secure LLMs.

Via

Access Paper or Ask Questions

A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Nov 06, 2024

Yiming Li, Fang Li, Kirk Roberts, Licong Cui, Cui Tao, Hua Xu

Figure 1 for A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Figure 2 for A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Figure 3 for A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Figure 4 for A Comparative Study of Recent Large Language Models on Generating Hospital Discharge Summaries for Lung Cancer Patients

Abstract:Generating discharge summaries is a crucial yet time-consuming task in clinical practice, essential for conveying pertinent patient information and facilitating continuity of care. Recent advancements in large language models (LLMs) have significantly enhanced their capability in understanding and summarizing complex medical texts. This research aims to explore how LLMs can alleviate the burden of manual summarization, streamline workflow efficiencies, and support informed decision-making in healthcare settings. Clinical notes from a cohort of 1,099 lung cancer patients were utilized, with a subset of 50 patients for testing purposes, and 102 patients used for model fine-tuning. This study evaluates the performance of multiple LLMs, including GPT-3.5, GPT-4, GPT-4o, and LLaMA 3 8b, in generating discharge summaries. Evaluation metrics included token-level analysis (BLEU, ROUGE-1, ROUGE-2, ROUGE-L) and semantic similarity scores between model-generated summaries and physician-written gold standards. LLaMA 3 8b was further tested on clinical notes of varying lengths to examine the stability of its performance. The study found notable variations in summarization capabilities among LLMs. GPT-4o and fine-tuned LLaMA 3 demonstrated superior token-level evaluation metrics, while LLaMA 3 consistently produced concise summaries across different input lengths. Semantic similarity scores indicated GPT-4o and LLaMA 3 as leading models in capturing clinical relevance. This study contributes insights into the efficacy of LLMs for generating discharge summaries, highlighting LLaMA 3's robust performance in maintaining clarity and relevance across varying clinical contexts. These findings underscore the potential of automated summarization tools to enhance documentation precision and efficiency, ultimately improving patient care and operational capability in healthcare settings.

Via

Access Paper or Ask Questions