Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yiming Li

NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Oct 28, 2024

Taiyi Pan, Junyang He, Chao Chen, Yiming Li, Chen Feng

Figure 1 for NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Figure 2 for NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Figure 3 for NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Figure 4 for NYC-Event-VPR: A Large-Scale High-Resolution Event-Based Visual Place Recognition Dataset in Dense Urban Environments

Abstract:Visual place recognition (VPR) enables autonomous robots to identify previously visited locations, which contributes to tasks like simultaneous localization and mapping (SLAM). VPR faces challenges such as accurate image neighbor retrieval and appearance change in scenery. Event cameras, also known as dynamic vision sensors, are a new sensor modality for VPR and offer a promising solution to the challenges with their unique attributes: high temporal resolution (1MHz clock), ultra-low latency (in {\mu}s), and high dynamic range (>120dB). These attributes make event cameras less susceptible to motion blur and more robust in variable lighting conditions, making them suitable for addressing VPR challenges. However, the scarcity of event-based VPR datasets, partly due to the novelty and cost of event cameras, hampers their adoption. To fill this data gap, our paper introduces the NYC-Event-VPR dataset to the robotics and computer vision communities, featuring the Prophesee IMX636 HD event sensor (1280x720 resolution), combined with RGB camera and GPS module. It encompasses over 13 hours of geotagged event data, spanning 260 kilometers across New York City, covering diverse lighting and weather conditions, day/night scenarios, and multiple visits to various locations. Furthermore, our paper employs three frameworks to conduct generalization performance assessments, promoting innovation in event-based VPR and its integration into robotics applications.

Via

Access Paper or Ask Questions

Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

Oct 21, 2024

Yiming Li, Hanchi Ren, Jingjing Deng, Xianghua Xie

Figure 1 for Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

Figure 2 for Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

Figure 3 for Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

Figure 4 for Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images

Abstract:Reliable object grasping is one of the fundamental tasks in robotics. However, determining grasping pose based on single-image input has long been a challenge due to limited visual information and the complexity of real-world objects. In this paper, we propose Triplane Grasping, a fast grasping decision-making method that relies solely on a single RGB-only image as input. Triplane Grasping creates a hybrid Triplane-Gaussian 3D representation through a point decoder and a triplane decoder, which produce an efficient and high-quality reconstruction of the object to be grasped to meet real-time grasping requirements. We propose to use an end-to-end network to generate 6-DoF parallel-jaw grasp distributions directly from 3D points in the point cloud as potential grasp contacts and anchor the grasp pose in the observed data. Experiments demonstrate that our method achieves rapid modeling and grasping pose decision-making for daily objects, and exhibits a high grasping success rate in zero-shot scenarios.

Via

Access Paper or Ask Questions

Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Oct 15, 2024

Bobing Zhang, Yiyuan Zhang, Yiming Li, Sicheng Xuan, Hong Wei Ng, Yuliang Liufu, Zhiqiang Tang, Cecilia Laschi

Figure 1 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 2 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 3 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Figure 4 for Octopus-Swimming-Like Robot with Soft Asymmetric Arms

Abstract:Underwater vehicles have seen significant development over the past seventy years. However, bio-inspired propulsion robots are still in their early stages and require greater interdisciplinary collaboration between biologists and roboticists. The octopus, one of the most intelligent marine animals, exhibits remarkable abilities such as camouflaging, exploring, and hunting while swimming with its arms. Although bio-inspired robotics researchers have aimed to replicate these abilities, the complexity of designing an eight-arm bionic swimming platform has posed challenges from the beginning. In this work, we propose a novel bionic robot swimming platform that combines asymmetric passive morphing arms with an umbrella-like quick-return mechanism. Using only two simple constant-speed motors, this design achieves efficient swimming by replicating octopus-like arm movements and stroke time ratios. The robot reached a peak speed of 314 mm/s during its second power stroke. This design reduces the complexity of traditional octopus-like swimming robot actuation systems while maintaining good swimming performance. It offers a more achievable and efficient platform for biologists and roboticists conducting more profound octopus-inspired robotic and biological studies.

Via

Access Paper or Ask Questions

Open World Object Detection: A Survey

Oct 15, 2024

Yiming Li, Yi Wang, Wenqian Wang, Dan Lin, Bingbing Li, Kim-Hui Yap

Figure 1 for Open World Object Detection: A Survey

Figure 2 for Open World Object Detection: A Survey

Figure 3 for Open World Object Detection: A Survey

Figure 4 for Open World Object Detection: A Survey

Abstract:Exploring new knowledge is a fundamental human ability that can be mirrored in the development of deep neural networks, especially in the field of object detection. Open world object detection (OWOD) is an emerging area of research that adapts this principle to explore new knowledge. It focuses on recognizing and learning from objects absent from initial training sets, thereby incrementally expanding its knowledge base when new class labels are introduced. This survey paper offers a thorough review of the OWOD domain, covering essential aspects, including problem definitions, benchmark datasets, source codes, evaluation metrics, and a comparative study of existing methods. Additionally, we investigate related areas like open set recognition (OSR) and incremental learning (IL), underlining their relevance to OWOD. Finally, the paper concludes by addressing the limitations and challenges faced by current OWOD algorithms and proposes directions for future research. To our knowledge, this is the first comprehensive survey of the emerging OWOD field with over one hundred references, marking a significant step forward for object detection technology. A comprehensive source code and benchmarks are archived and concluded at https://github.com/ArminLee/OWOD Review.

Via

Access Paper or Ask Questions

Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Oct 14, 2024

Boheng Li, Yanhao Wei, Yankai Fu, Zhenting Wang, Yiming Li, Jie Zhang, Run Wang, Tianwei Zhang

Figure 1 for Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Figure 2 for Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Figure 3 for Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Figure 4 for Towards Reliable Verification of Unauthorized Data Usage in Personalized Text-to-Image Diffusion Models

Abstract:Text-to-image diffusion models are pushing the boundaries of what generative AI can achieve in our lives. Beyond their ability to generate general images, new personalization techniques have been proposed to customize the pre-trained base models for crafting images with specific themes or styles. Such a lightweight solution, enabling AI practitioners and developers to easily build their own personalized models, also poses a new concern regarding whether the personalized models are trained from unauthorized data. A promising solution is to proactively enable data traceability in generative models, where data owners embed external coatings (e.g., image watermarks or backdoor triggers) onto the datasets before releasing. Later the models trained over such datasets will also learn the coatings and unconsciously reproduce them in the generated mimicries, which can be extracted and used as the data usage evidence. However, we identify the existing coatings cannot be effectively learned in personalization tasks, making the corresponding verification less reliable. In this paper, we introduce SIREN, a novel methodology to proactively trace unauthorized data usage in black-box personalized text-to-image diffusion models. Our approach optimizes the coating in a delicate way to be recognized by the model as a feature relevant to the personalization task, thus significantly improving its learnability. We also utilize a human perceptual-aware constraint, a hypersphere classification technique, and a hypothesis-testing-guided verification method to enhance the stealthiness and detection accuracy of the coating. The effectiveness of SIREN is verified through extensive experiments on a diverse set of benchmark datasets, models, and learning algorithms. SIREN is also effective in various real-world scenarios and evaluated against potential countermeasures. Our code is publicly available.

* To appear in the IEEE Symposium on Security & Privacy, May 2025

Via

Access Paper or Ask Questions

Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Aug 15, 2024

Yiming Li, Zhifang Guo, Xiangdong Wang, Hong Liu

Figure 1 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Figure 2 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Figure 3 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Figure 4 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Abstract:Recent advances have been witnessed in audio-language joint learning, such as CLAP, that shows much success in multi-modal understanding tasks. These models usually aggregate uni-modal local representations, namely frame or word features, into global ones, on which the contrastive loss is employed to reach coarse-grained cross-modal alignment. However, frame-level correspondence with texts may be ignored, making it ill-posed on explainability and fine-grained challenges which may also undermine performances on coarse-grained tasks. In this work, we aim to improve both coarse- and fine-grained audio-language alignment in large-scale contrastive pre-training. To unify the granularity and latent distribution of two modalities, a shared codebook is adopted to represent multi-modal global features with common bases, and each codeword is regularized to encode modality-shared semantics, bridging the gap between frame and word features. Based on it, a locality-aware block is involved to purify local patterns, and a hard-negative guided loss is devised to boost alignment. Experiments on eleven zero-shot coarse- and fine-grained tasks suggest that our model not only surpasses the baseline CLAP significantly but also yields superior or competitive results compared to current SOTA works.

* ACM MM 2024 (Oral)

Via

Access Paper or Ask Questions

PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Aug 10, 2024

Cheng Wei, Yang Wang, Kuofeng Gao, Shuo Shao, Yiming Li, Zhibo Wang, Zhan Qin

Figure 1 for PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Figure 2 for PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Figure 3 for PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Figure 4 for PointNCBW: Towards Dataset Ownership Verification for Point Clouds via Negative Clean-label Backdoor Watermark

Abstract:Recently, point clouds have been widely used in computer vision, whereas their collection is time-consuming and expensive. As such, point cloud datasets are the valuable intellectual property of their owners and deserve protection. To detect and prevent unauthorized use of these datasets, especially for commercial or open-sourced ones that cannot be sold again or used commercially without permission, we intend to identify whether a suspicious third-party model is trained on our protected dataset under the black-box setting. We achieve this goal by designing a scalable clean-label backdoor-based dataset watermark for point clouds that ensures both effectiveness and stealthiness. Unlike existing clean-label watermark schemes, which are susceptible to the number of categories, our method could watermark samples from all classes instead of only from the target one. Accordingly, it can still preserve high effectiveness even on large-scale datasets with many classes. Specifically, we perturb selected point clouds with non-target categories in both shape-wise and point-wise manners before inserting trigger patterns without changing their labels. The features of perturbed samples are similar to those of benign samples from the target class. As such, models trained on the watermarked dataset will have a distinctive yet stealthy backdoor behavior, i.e., misclassifying samples from the target class whenever triggers appear, since the trained DNNs will treat the inserted trigger pattern as a signal to deny predicting the target label. We also design a hypothesis-test-guided dataset ownership verification based on the proposed watermark. Extensive experiments on benchmark datasets are conducted, verifying the effectiveness of our method and its resistance to potential removal methods.

* 12 pages

Via

Access Paper or Ask Questions

Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Jul 29, 2024

Zhixuan Chu, Hui Ding, Guang Zeng, Shiyu Wang, Yiming Li

Figure 1 for Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Figure 2 for Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Figure 3 for Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Figure 4 for Causal Interventional Prediction System for Robust and Explainable Effect Forecasting

Abstract:Although the widespread use of AI systems in today's world is growing, many current AI systems are found vulnerable due to hidden bias and missing information, especially in the most commonly used forecasting system. In this work, we explore the robustness and explainability of AI-based forecasting systems. We provide an in-depth analysis of the underlying causality involved in the effect prediction task and further establish a causal graph based on treatment, adjustment variable, confounder, and outcome. Correspondingly, we design a causal interventional prediction system (CIPS) based on a variational autoencoder and fully conditional specification of multiple imputations. Extensive results demonstrate the superiority of our system over state-of-the-art methods and show remarkable versatility and extensibility in practice.

* Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM '24), October 21--25, 2024, Boise, ID, USA

Via

Access Paper or Ask Questions

TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Jul 12, 2024

Yuchen Yang, Hongwei Yao, Bingrun Yang, Yiling He, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren

Figure 1 for TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Figure 2 for TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Figure 3 for TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Figure 4 for TAPI: Towards Target-Specific and Adversarial Prompt Injection against Code LLMs

Abstract:Recently, code-oriented large language models (Code LLMs) have been widely and successfully used to simplify and facilitate code programming. With these tools, developers can easily generate desired complete functional codes based on incomplete code and natural language prompts. However, a few pioneering works revealed that these Code LLMs are also vulnerable, e.g., against backdoor and adversarial attacks. The former could induce LLMs to respond to triggers to insert malicious code snippets by poisoning the training data or model parameters, while the latter can craft malicious adversarial input codes to reduce the quality of generated codes. However, both attack methods have underlying limitations: backdoor attacks rely on controlling the model training process, while adversarial attacks struggle with fulfilling specific malicious purposes. To inherit the advantages of both backdoor and adversarial attacks, this paper proposes a new attack paradigm, i.e., target-specific and adversarial prompt injection (TAPI), against Code LLMs. TAPI generates unreadable comments containing information about malicious instructions and hides them as triggers in the external source code. When users exploit Code LLMs to complete codes containing the trigger, the models will generate attacker-specified malicious code snippets at specific locations. We evaluate our TAPI attack on four representative LLMs under three representative malicious objectives and seven cases. The results show that our method is highly threatening (achieving an attack success rate of up to 89.3\%) and stealthy (saving an average of 53.1\% of tokens in the trigger design). In particular, we successfully attack some famous deployed code completion integrated applications, including CodeGeex and Github Copilot. This further confirms the realistic threat of our attack.

Via

Access Paper or Ask Questions

Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources

Jun 26, 2024

Yiming Li, Deepthi Viswaroopan, William He, Jianfu Li, Xu Zuo, Hua Xu, Cui Tao

Abstract:Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance. In this study, we utilized reports and posts from the VAERS (n=621), Twitter (n=9,133), and Reddit (n=131) as our corpora. Our goal was to extract three types of entities: "vaccine", "shot", and "ae". We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, and Llama-2, as well as traditional deep learning models like RNN and BioBERT. To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance. The ensemble model achieved the highest performance in "vaccine", "shot", and "ae" with strict F1-scores of 0.878, 0.930, and 0.925, respectively, along with a micro-average score of 0.903. In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information. This study contributes to the advancement of biomedical natural language processing, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance.

Via

Access Paper or Ask Questions