Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tao Zhou

Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification

Dec 28, 2024

Haiming Yao, Wei Luo, Tao Zhou, Ang Gao, Xue Wang

Figure 1 for Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification

Figure 2 for Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification

Figure 3 for Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification

Figure 4 for Self-Calibrated Dual Contrasting for Annotation-Efficient Bacteria Raman Spectroscopy Clustering and Classification

Abstract:Raman scattering is based on molecular vibration spectroscopy and provides a powerful technology for pathogenic bacteria diagnosis using the unique molecular fingerprint information of a substance. The integration of deep learning technology has significantly improved the efficiency and accuracy of intelligent Raman spectroscopy (RS) recognition. However, the current RS recognition methods based on deep neural networks still require the annotation of a large amount of spectral data, which is labor-intensive. This paper presents a novel annotation-efficient Self-Calibrated Dual Contrasting (SCDC) method for RS recognition that operates effectively with few or no annotation. Our core motivation is to represent the spectrum from two different perspectives in two distinct subspaces: embedding and category. The embedding perspective captures instance-level information, while the category perspective reflects category-level information. Accordingly, we have implemented a dual contrastive learning approach from two perspectives to obtain discriminative representations, which are applicable for Raman spectroscopy recognition under both unsupervised and semi-supervised learning conditions. Furthermore, a self-calibration mechanism is proposed to enhance robustness. Validation of the identification task on three large-scale bacterial Raman spectroscopy datasets demonstrates that our SCDC method achieves robust recognition performance with very few (5$\%$ or 10$\%$) or no annotations, highlighting the potential of the proposed method for biospectral identification in annotation-efficient clinical scenarios.

Via

Access Paper or Ask Questions

Enhancing Large-scale UAV Route Planing with Global and Local Features via Reinforcement Graph Fusion

Dec 20, 2024

Tao Zhou, Kai Ye, Zeyu Shi, Jiajing Lin, Dejun Xu, Min Jiang

Abstract:Numerous remarkable advancements have been made in accuracy, speed, and parallelism for solving the Unmanned Aerial Vehicle Route Planing (UAVRP). However, existing UAVRP solvers face challenges when attempting to scale effectively and efficiently for larger instances. In this paper, we present a generalization framework that enables current UAVRP solvers to robustly extend their capabilities to larger instances, accommodating up to 10,000 points, using widely recognized test sets. The UAVRP under a large number of patrol points is a typical large-scale TSP problem.Our proposed framework comprises three distinct steps. Firstly, we employ Delaunay triangulation to extract subgraphs from large instances while preserving global features. Secondly, we utilize an embedded TSP solver to obtain sub-results, followed by graph fusion. Finally, we implement a decoding strategy customizable to the user's requirements, resulting in high-quality solutions, complemented by a warming-up process for the heatmap. To demonstrate the flexibility of our approach, we integrate two representative TSP solvers into our framework and conduct a comprehensive comparative analysis against existing algorithms using large TSP benchmark datasets. The results unequivocally demonstrate that our framework efficiently scales existing TSP solvers to handle large instances and consistently outperforms state-of-the-art (SOTA) methods. Furthermore, since our proposed framework does not necessitate additional training or fine-tuning, we believe that its generality can significantly advance research on end-to-end UAVRP solvers, enabling the application of a broader range of methods to real-world scenarios.

Via

Access Paper or Ask Questions

Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation

Dec 18, 2024

Kaiwen Huang, Tao Zhou, Huazhu Fu, Yizhe Zhang, Yi Zhou, Chen Gong, Dong Liang

Figure 1 for Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation

Figure 2 for Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation

Figure 3 for Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation

Figure 4 for Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation

Abstract:The limited availability of labeled data has driven advancements in semi-supervised learning for medical image segmentation. Modern large-scale models tailored for general segmentation, such as the Segment Anything Model (SAM), have revealed robust generalization capabilities. However, applying these models directly to medical image segmentation still exposes performance degradation. In this paper, we propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Firstly, we propose a Multi-view Co-training (MC) strategy that employs two distinct sub-networks to employ a co-teaching paradigm, resulting in more robust outcomes. Secondly, we present a Learnable Prompt Strategy (LPS) to dynamically produce dense prompts and integrate an adapter to fine-tune SAM specifically for medical image segmentation tasks. Moreover, we propose SAM-induced Knowledge Distillation (SKD) to transfer useful knowledge from SAM to two sub-networks, enabling them to learn from SAM's predictions and alleviate the effects of incorrect pseudo-labels during training. Notably, the predictions generated by our subnets are used to produce mask prompts for SAM, facilitating effective inter-module information exchange. Extensive experimental results on various medical segmentation tasks demonstrate that our model outperforms the state-of-the-art semi-supervised segmentation approaches. Crucially, our SAM distillation framework can be seamlessly integrated into other semi-supervised segmentation methods to enhance performance. The code will be released upon acceptance of this manuscript at: https://github.com/taozh2017/KnowSAM

* 12 pages, 7 figures

Via

Access Paper or Ask Questions

Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Dec 18, 2024

Zheng Hu, Zhe Li, Ziyun Jiao, Satoshi Nakagawa, Jiawen Deng, Shimin Cai, Tao Zhou, Fuji Ren

Figure 1 for Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Figure 2 for Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Figure 3 for Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Figure 4 for Bridging the User-side Knowledge Gap in Knowledge-aware Recommendations with Large Language Models

Abstract:In recent years, knowledge graphs have been integrated into recommender systems as item-side auxiliary information, enhancing recommendation accuracy. However, constructing and integrating structural user-side knowledge remains a significant challenge due to the improper granularity and inherent scarcity of user-side features. Recent advancements in Large Language Models (LLMs) offer the potential to bridge this gap by leveraging their human behavior understanding and extensive real-world knowledge. Nevertheless, integrating LLM-generated information into recommender systems presents challenges, including the risk of noisy information and the need for additional knowledge transfer. In this paper, we propose an LLM-based user-side knowledge inference method alongside a carefully designed recommendation framework to address these challenges. Our approach employs LLMs to infer user interests based on historical behaviors, integrating this user-side information with item-side and collaborative data to construct a hybrid structure: the Collaborative Interest Knowledge Graph (CIKG). Furthermore, we propose a CIKG-based recommendation framework that includes a user interest reconstruction module and a cross-domain contrastive learning module to mitigate potential noise and facilitate knowledge transfer. We conduct extensive experiments on three real-world datasets to validate the effectiveness of our method. Our approach achieves state-of-the-art performance compared to competitive baselines, particularly for users with sparse interactions.

* Accepted at AAAI 2025

Via

Access Paper or Ask Questions

CALA: A Class-Aware Logit Adapter for Few-Shot Class-Incremental Learning

Dec 17, 2024

Chengyan Liu, Linglan Zhao, Fan Lyu, Kaile Du, Fuyuan Hu, Tao Zhou

Figure 1 for CALA: A Class-Aware Logit Adapter for Few-Shot Class-Incremental Learning

Figure 2 for CALA: A Class-Aware Logit Adapter for Few-Shot Class-Incremental Learning

Figure 3 for CALA: A Class-Aware Logit Adapter for Few-Shot Class-Incremental Learning

Figure 4 for CALA: A Class-Aware Logit Adapter for Few-Shot Class-Incremental Learning

Abstract:Few-Shot Class-Incremental Learning (FSCIL) defines a practical but challenging task where models are required to continuously learn novel concepts with only a few training samples. Due to data scarcity, existing FSCIL methods resort to training a backbone with abundant base data and then keeping it frozen afterward. However, the above operation often causes the backbone to overfit to base classes while overlooking the novel ones, leading to severe confusion between them. To address this issue, we propose Class-Aware Logit Adapter (CALA). Our method involves a lightweight adapter that learns to rectify biased predictions through a pseudo-incremental learning paradigm. In the real FSCIL process, we use the learned adapter to dynamically generate robust balancing factors. These factors can adjust confused novel instances back to their true label space based on their similarity to base classes. Specifically, when confusion is more likely to occur in novel instances that closely resemble base classes, greater rectification is required. Notably, CALA operates on the classifier level, preserving the original feature space, thus it can be flexibly plugged into most of the existing FSCIL works for improved performance. Experiments on three benchmark datasets consistently validate the effectiveness and flexibility of CALA. Codes will be available upon acceptance.

* 10 pages

Via

Access Paper or Ask Questions

Bringing Multimodality to Amazon Visual Search System

Dec 17, 2024

Xinliang Zhu, Michael Huang, Han Ding, Jinyu Yang, Kelvin Chen, Tao Zhou, Tal Neiman, Ouye Xie, Son Tran, Benjamin Yao(+3 more)

Figure 1 for Bringing Multimodality to Amazon Visual Search System

Figure 2 for Bringing Multimodality to Amazon Visual Search System

Figure 3 for Bringing Multimodality to Amazon Visual Search System

Figure 4 for Bringing Multimodality to Amazon Visual Search System

Abstract:Image to image matching has been well studied in the computer vision community. Previous studies mainly focus on training a deep metric learning model matching visual patterns between the query image and gallery images. In this study, we show that pure image-to-image matching suffers from false positives caused by matching to local visual patterns. To alleviate this issue, we propose to leverage recent advances in vision-language pretraining research. Specifically, we introduce additional image-text alignment losses into deep metric learning, which serve as constraints to the image-to-image matching loss. With additional alignments between the text (e.g., product title) and image pairs, the model can learn concepts from both modalities explicitly, which avoids matching low-level visual features. We progressively develop two variants, a 3-tower and a 4-tower model, where the latter takes one more short text query input. Through extensive experiments, we show that this change leads to a substantial improvement to the image to image matching problem. We further leveraged this model for multimodal search, which takes both image and reformulation text queries to improve search quality. Both offline and online experiments show strong improvements on the main metrics. Specifically, we see 4.95% relative improvement on image matching click through rate with the 3-tower model and 1.13% further improvement from the 4-tower model.

* Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Via

Access Paper or Ask Questions

DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions

Dec 11, 2024

Haiming Yao, Wei Luo, Ang Gao, Tao Zhou, Xue Wang

Figure 1 for DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions

Figure 2 for DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions

Figure 3 for DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions

Figure 4 for DiffRaman: A Conditional Latent Denoising Diffusion Probabilistic Model for Bacterial Raman Spectroscopy Identification Under Limited Data Conditions

Abstract:Raman spectroscopy has attracted significant attention in various biochemical detection fields, especially in the rapid identification of pathogenic bacteria. The integration of this technology with deep learning to facilitate automated bacterial Raman spectroscopy diagnosis has emerged as a key focus in recent research. However, the diagnostic performance of existing deep learning methods largely depends on a sufficient dataset, and in scenarios where there is a limited availability of Raman spectroscopy data, it is inadequate to fully optimize the numerous parameters of deep neural networks. To address these challenges, this paper proposes a data generation method utilizing deep generative models to expand the data volume and enhance the recognition accuracy of bacterial Raman spectra. Specifically, we introduce DiffRaman, a conditional latent denoising diffusion probability model for Raman spectra generation. Experimental results demonstrate that synthetic bacterial Raman spectra generated by DiffRaman can effectively emulate real experimental spectra, thereby enhancing the performance of diagnostic models, especially under conditions of limited data. Furthermore, compared to existing generative models, the proposed DiffRaman offers improvements in both generation quality and computational efficiency. Our DiffRaman approach offers a well-suited solution for automated bacteria Raman spectroscopy diagnosis in data-scarce scenarios, offering new insights into alleviating the labor of spectroscopic measurements and enhancing rare bacteria identification.

Via

Access Paper or Ask Questions

Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Nov 05, 2024

Ao Fu, Yi Zhou, Tao Zhou, Yi Yang, Bojun Gao, Qun Li, Guobin Wu, Ling Shao

Figure 1 for Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Figure 2 for Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Figure 3 for Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Figure 4 for Exploring the Interplay Between Video Generation and World Models in Autonomous Driving: A Survey

Abstract:World models and video generation are pivotal technologies in the domain of autonomous driving, each playing a critical role in enhancing the robustness and reliability of autonomous systems. World models, which simulate the dynamics of real-world environments, and video generation models, which produce realistic video sequences, are increasingly being integrated to improve situational awareness and decision-making capabilities in autonomous vehicles. This paper investigates the relationship between these two technologies, focusing on how their structural parallels, particularly in diffusion-based models, contribute to more accurate and coherent simulations of driving scenarios. We examine leading works such as JEPA, Genie, and Sora, which exemplify different approaches to world model design, thereby highlighting the lack of a universally accepted definition of world models. These diverse interpretations underscore the field's evolving understanding of how world models can be optimized for various autonomous driving tasks. Furthermore, this paper discusses the key evaluation metrics employed in this domain, such as Chamfer distance for 3D scene reconstruction and Fr\'echet Inception Distance (FID) for assessing the quality of generated video content. By analyzing the interplay between video generation and world models, this survey identifies critical challenges and future research directions, emphasizing the potential of these technologies to jointly advance the performance of autonomous driving systems. The findings presented in this paper aim to provide a comprehensive understanding of how the integration of video generation and world models can drive innovation in the development of safer and more reliable autonomous vehicles.

Via

Access Paper or Ask Questions

An Efficient Dynamic Resource Allocation Framework for Evolutionary Bilevel Optimization

Oct 31, 2024

Dejun Xu, Kai Ye, Zimo Zheng, Tao Zhou, Gary G. Yen, Min Jiang

Abstract:Bilevel optimization problems are characterized by an interactive hierarchical structure, where the upper level seeks to optimize its strategy while simultaneously considering the response of the lower level. Evolutionary algorithms are commonly used to solve complex bilevel problems in practical scenarios, but they face significant resource consumption challenges due to the nested structure imposed by the implicit lower-level optimality condition. This challenge becomes even more pronounced as problem dimensions increase. Although recent methods have enhanced bilevel convergence through task-level knowledge sharing, further efficiency improvements are still hindered by redundant lower-level iterations that consume excessive resources while generating unpromising solutions. To overcome this challenge, this paper proposes an efficient dynamic resource allocation framework for evolutionary bilevel optimization, named DRC-BLEA. Compared to existing approaches, DRC-BLEA introduces a novel competitive quasi-parallel paradigm, in which multiple lower-level optimization tasks, derived from different upper-level individuals, compete for resources. A continuously updated selection probability is used to prioritize execution opportunities to promising tasks. Additionally, a cooperation mechanism is integrated within the competitive framework to further enhance efficiency and prevent premature convergence. Experimental results compared with chosen state-of-the-art algorithms demonstrate the effectiveness of the proposed method. Specifically, DRC-BLEA achieves competitive accuracy across diverse problem sets and real-world scenarios, while significantly reducing the number of function evaluations and overall running time.

Via

Access Paper or Ask Questions

Rank Aggregation in Crowdsourcing for Listwise Annotations

Oct 10, 2024

Wenshui Luo, Haoyu Liu, Yongliang Ding, Tao Zhou, Sheng wan, Runze Wu, Minmin Lin, Cong Zhang, Changjie Fan, Chen Gong

Figure 1 for Rank Aggregation in Crowdsourcing for Listwise Annotations

Figure 2 for Rank Aggregation in Crowdsourcing for Listwise Annotations

Figure 3 for Rank Aggregation in Crowdsourcing for Listwise Annotations

Figure 4 for Rank Aggregation in Crowdsourcing for Listwise Annotations

Abstract:Rank aggregation through crowdsourcing has recently gained significant attention, particularly in the context of listwise ranking annotations. However, existing methods primarily focus on a single problem and partial ranks, while the aggregation of listwise full ranks across numerous problems remains largely unexplored. This scenario finds relevance in various applications, such as model quality assessment and reinforcement learning with human feedback. In light of practical needs, we propose LAC, a Listwise rank Aggregation method in Crowdsourcing, where the global position information is carefully measured and included. In our design, an especially proposed annotation quality indicator is employed to measure the discrepancy between the annotated rank and the true rank. We also take the difficulty of the ranking problem itself into consideration, as it directly impacts the performance of annotators and consequently influences the final results. To our knowledge, LAC is the first work to directly deal with the full rank aggregation problem in listwise crowdsourcing, and simultaneously infer the difficulty of problems, the ability of annotators, and the ground-truth ranks in an unsupervised way. To evaluate our method, we collect a real-world business-oriented dataset for paragraph ranking. Experimental results on both synthetic and real-world benchmark datasets demonstrate the effectiveness of our proposed LAC method.

* 19 pages

Via

Access Paper or Ask Questions