Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

You Zhou

Victor

Inter-event Interval Microscopy for Event Cameras

Apr 08, 2025

Changqing Su, Yanqin Chen, Zihan Lin, Zhen Cheng, You Zhou, Bo Xiong, Zhaofei Yu, Tiejun Huang

Abstract:Event cameras, an innovative bio-inspired sensor, differ from traditional cameras by sensing changes in intensity rather than directly perceiving intensity and recording these variations as a continuous stream of "events". The intensity reconstruction from these sparse events has long been a challenging problem. Previous approaches mainly focused on transforming motion-induced events into videos or achieving intensity imaging for static scenes by integrating modulation devices at the event camera acquisition end. In this paper, for the first time, we achieve event-to-intensity conversion using a static event camera for both static and dynamic scenes in fluorescence microscopy. Unlike conventional methods that primarily rely on event integration, the proposed Inter-event Interval Microscopy (IEIM) quantifies the time interval between consecutive events at each pixel. With a fixed threshold in the event camera, the time interval can precisely represent the intensity. At the hardware level, the proposed IEIM integrates a pulse light modulation device within a microscope equipped with an event camera, termed Pulse Modulation-based Event-driven Fluorescence Microscopy. Additionally, we have collected IEIMat dataset under various scenes including high dynamic range and high-speed scenarios. Experimental results on the IEIMat dataset demonstrate that the proposed IEIM achieves superior spatial and temporal resolution, as well as a higher dynamic range, with lower bandwidth compared to other methods. The code and the IEIMat dataset will be made publicly available.

Via

Access Paper or Ask Questions

Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

Feb 10, 2025

You Zhou, Jiangshan Zhao, Deyu Zeng, Zuo Zuo, Weixiang Liu, Zongze Wu

Figure 1 for Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

Figure 2 for Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

Figure 3 for Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

Figure 4 for Multimodal Task Representation Memory Bank vs. Catastrophic Forgetting in Anomaly Detection

Abstract:Unsupervised Continuous Anomaly Detection (UCAD) faces significant challenges in multi-task representation learning, with existing methods suffering from incomplete representation and catastrophic forgetting. Unlike supervised models, unsupervised scenarios lack prior information, making it difficult to effectively distinguish redundant and complementary multimodal features. To address this, we propose the Multimodal Task Representation Memory Bank (MTRMB) method through two key technical innovations: A Key-Prompt-Multimodal Knowledge (KPMK) mechanism that uses concise key prompts to guide cross-modal feature interaction between BERT and ViT. Refined Structure-based Contrastive Learning (RSCL) leveraging Grounding DINO and SAM to generate precise segmentation masks, pulling features of the same structural region closer while pushing different structural regions apart. Experiments on MVtec AD and VisA datasets demonstrate MTRMB's superiority, achieving an average detection accuracy of 0.921 at the lowest forgetting rate, significantly outperforming state-of-the-art methods. We plan to open source on GitHub.

Via

Access Paper or Ask Questions

An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Jan 23, 2025

Mingzhao Wang, You Zhou, Zhiguang Cao, Yubin Xiao, Xuan Wu, Wei Pang, Yuan Jiang, Hui Yang, Peng Zhao, Yuanshu Li

Figure 1 for An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Figure 2 for An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Figure 3 for An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Figure 4 for An Efficient Diffusion-based Non-Autoregressive Solver for Traveling Salesman Problem

Abstract:Recent advances in neural models have shown considerable promise in solving Traveling Salesman Problems (TSPs) without relying on much hand-crafted engineering. However, while non-autoregressive (NAR) approaches benefit from faster inference through parallelism, they typically deliver solutions of inferior quality compared to autoregressive ones. To enhance the solution quality while maintaining fast inference, we propose DEITSP, a diffusion model with efficient iterations tailored for TSP that operates in a NAR manner. Firstly, we introduce a one-step diffusion model that integrates the controlled discrete noise addition process with self-consistency enhancement, enabling optimal solution prediction through simultaneous denoising of multiple solutions. Secondly, we design a dual-modality graph transformer to bolster the extraction and fusion of features from node and edge modalities, while further accelerating the inference with fewer layers. Thirdly, we develop an efficient iterative strategy that alternates between adding and removing noise to improve exploration compared to previous diffusion methods. Additionally, we devise a scheduling framework to progressively refine the solution space by adjusting noise levels, facilitating a smooth search for optimal solutions. Extensive experiments on real-world and large-scale TSP instances demonstrate that DEITSP performs favorably against existing neural approaches in terms of solution quality, inference latency, and generalization ability. Our code is available at $\href{https://github.com/DEITSP/DEITSP}{https://github.com/DEITSP/DEITSP}$.

* Accepted at KDD2025

Via

Access Paper or Ask Questions

A Value Mapping Virtual Staining Framework for Large-scale Histological Imaging

Jan 07, 2025

Junjia Wang, Bo Xiong, You Zhou, Xun Cao, Zhan Ma

Figure 1 for A Value Mapping Virtual Staining Framework for Large-scale Histological Imaging

Figure 2 for A Value Mapping Virtual Staining Framework for Large-scale Histological Imaging

Figure 3 for A Value Mapping Virtual Staining Framework for Large-scale Histological Imaging

Figure 4 for A Value Mapping Virtual Staining Framework for Large-scale Histological Imaging

Abstract:The emergence of virtual staining technology provides a rapid and efficient alternative for researchers in tissue pathology. It enables the utilization of unlabeled microscopic samples to generate virtual replicas of chemically stained histological slices, or facilitate the transformation of one staining type into another. The remarkable performance of generative networks, such as CycleGAN, offers an unsupervised learning approach for virtual coloring, overcoming the limitations of high-quality paired data required in supervised learning. Nevertheless, large-scale color transformation necessitates processing large field-of-view images in patches, often resulting in significant boundary inconsistency and artifacts. Additionally, the transformation between different colorized modalities typically needs further efforts to modify loss functions and tune hyperparameters for independent training of networks. In this study, we introduce a general virtual staining framework that is adaptable to various conditions. We propose a loss function based on the value mapping constraint to ensure the accuracy of virtual coloring between different pathological modalities, termed the Value Mapping Generative Adversarial Network (VM-GAN). Meanwhile, we present a confidence-based tiling method to address the challenge of boundary inconsistency arising from patch-wise processing. Experimental results on diverse data with varying staining protocols demonstrate that our method achieves superior quantitative indicators and improved visual perception.

Via

Access Paper or Ask Questions

A Separable Self-attention Inspired by the State Space Model for Computer Vision

Jan 03, 2025

Juntao Zhang, Shaogeng Liu, Kun Bian, You Zhou, Pei Zhang, Jianning Liu, Jun Zhou, Bingyan Liu

Figure 1 for A Separable Self-attention Inspired by the State Space Model for Computer Vision

Figure 2 for A Separable Self-attention Inspired by the State Space Model for Computer Vision

Figure 3 for A Separable Self-attention Inspired by the State Space Model for Computer Vision

Figure 4 for A Separable Self-attention Inspired by the State Space Model for Computer Vision

Abstract:Mamba is an efficient State Space Model (SSM) with linear computational complexity. Although SSMs are not suitable for handling non-causal data, Vision Mamba (ViM) methods still demonstrate good performance in tasks such as image classification and object detection. Recent studies have shown that there is a rich theoretical connection between state space models and attention variants. We propose a novel separable self attention method, for the first time introducing some excellent design concepts of Mamba into separable self-attention. To ensure a fair comparison with ViMs, we introduce VMINet, a simple yet powerful prototype architecture, constructed solely by stacking our novel attention modules with the most basic down-sampling layers. Notably, VMINet differs significantly from the conventional Transformer architecture. Our experiments demonstrate that VMINet has achieved competitive results on image classification and high-resolution dense prediction tasks.Code is available at: \url{https://github.com/yws-wxs/VMINet}.

Via

Access Paper or Ask Questions

Communication Efficient Cooperative Edge AI via Event-Triggered Computation Offloading

Jan 01, 2025

You Zhou, Changsheng You, Kaibin Huang

Abstract:Rare events, despite their infrequency, often carry critical information and require immediate attentions in mission-critical applications such as autonomous driving, healthcare, and industrial automation. The data-intensive nature of these tasks and their need for prompt responses, combined with designing edge AI (or edge inference), pose significant challenges in systems and techniques. Existing edge inference approaches often suffer from communication bottlenecks due to high-dimensional data transmission and fail to provide timely responses to rare events, limiting their effectiveness for mission-critical applications in the sixth-generation (6G) mobile networks. To overcome these challenges, we propose a channel-adaptive, event-triggered edge-inference framework that prioritizes efficient rare-event processing. Central to this framework is a dual-threshold, multi-exit architecture, which enables early local inference for rare events detected locally while offloading more complex rare events to edge servers for detailed classification. To further enhance the system's performance, we developed a channel-adaptive offloading policy paired with an online algorithm to dynamically determine the optimal confidence thresholds for controlling offloading decisions. The associated optimization problem is solved by reformulating the original non-convex function into an equivalent strongly convex one. Using deep neural network classifiers and real medical datasets, our experiments demonstrate that the proposed framework not only achieves superior rare-event classification accuracy, but also effectively reduces communication overhead, as opposed to existing edge-inference approaches.

* 13 pages, 11 figures

Via

Access Paper or Ask Questions

Detecting AI-Generated Texts in Cross-Domains

Oct 17, 2024

You Zhou, Jie Wang

Figure 1 for Detecting AI-Generated Texts in Cross-Domains

Figure 2 for Detecting AI-Generated Texts in Cross-Domains

Figure 3 for Detecting AI-Generated Texts in Cross-Domains

Figure 4 for Detecting AI-Generated Texts in Cross-Domains

Abstract:Existing tools to detect text generated by a large language model (LLM) have met with certain success, but their performance can drop when dealing with texts in new domains. To tackle this issue, we train a ranking classifier called RoBERTa-Ranker, a modified version of RoBERTa, as a baseline model using a dataset we constructed that includes a wider variety of texts written by humans and generated by various LLMs. We then present a method to fine-tune RoBERTa-Ranker that requires only a small amount of labeled data in a new domain. Experiments show that this fine-tuned domain-aware model outperforms the popular DetectGPT and GPTZero on both in-domain and cross-domain texts, where AI-generated texts may either be in a different domain or generated by a different LLM not used to generate the training datasets. This approach makes it feasible and economical to build a single system to detect AI-generated texts across various domains.

* DocEng '24: Proceedings of the ACM Symposium on Document Engineering 2024

Via

Access Paper or Ask Questions

Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Oct 03, 2024

Junyi Hu, You Zhou, Jie Wang

Figure 1 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 2 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 3 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Figure 4 for Intrinsic Evaluation of RAG Systems for Deep-Logic Questions

Abstract:We introduce the Overall Performance Index (OPI), an intrinsic metric to evaluate retrieval-augmented generation (RAG) mechanisms for applications involving deep-logic queries. OPI is computed as the harmonic mean of two key metrics: the Logical-Relation Correctness Ratio and the average of BERT embedding similarity scores between ground-truth and generated answers. We apply OPI to assess the performance of LangChain, a popular RAG tool, using a logical relations classifier fine-tuned from GPT-4o on the RAG-Dataset-12000 from Hugging Face. Our findings show a strong correlation between BERT embedding similarity scores and extrinsic evaluation scores. Among the commonly used retrievers, the cosine similarity retriever using BERT-based embeddings outperforms others, while the Euclidean distance-based retriever exhibits the weakest performance. Furthermore, we demonstrate that combining multiple retrievers, either algorithmically or by merging retrieved sentences, yields superior performance compared to using any single retriever alone.

Via

Access Paper or Ask Questions

Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Jun 10, 2024

Yubin Xiao, Di Wang, Xuan Wu, Yuesong Wu, Boyang Li, Wei Du, Liupu Wang, You Zhou

Figure 1 for Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Figure 2 for Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Figure 3 for Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Figure 4 for Improving Generalization of Neural Vehicle Routing Problem Solvers Through the Lens of Model Architecture

Abstract:Neural models produce promising results when solving Vehicle Routing Problems (VRPs), but often fall short in generalization. Recent attempts to enhance model generalization often incur unnecessarily large training cost or cannot be directly applied to other models solving different VRP variants. To address these issues, we take a novel perspective on model architecture in this study. Specifically, we propose a plug-and-play Entropy-based Scaling Factor (ESF) and a Distribution-Specific (DS) decoder to enhance the size and distribution generalization, respectively. ESF adjusts the attention weight pattern of the model towards familiar ones discovered during training when solving VRPs of varying sizes. The DS decoder explicitly models VRPs of multiple training distribution patterns through multiple auxiliary light decoders, expanding the model representation space to encompass a broader range of distributional scenarios. We conduct extensive experiments on both synthetic and widely recognized real-world benchmarking datasets and compare the performance with seven baseline models. The results demonstrate the effectiveness of using ESF and DS decoder to obtain a more generalizable model and showcase their applicability to solve different VRP variants, i.e., travelling salesman problem and capacitated VRP. Notably, our proposed generic components require minimal computational resources, and can be effortlessly integrated into conventional generalization strategies to further elevate model generalization.

* 13 pages, 6 figures, and 6 tables

Via

Access Paper or Ask Questions

Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Jun 01, 2024

Xuan Wu, Di Wang, Lijie Wen, Yubin Xiao, Chunguo Wu, Yuesong Wu, Chaoyu Yu, Douglas L. Maskell, You Zhou

Figure 1 for Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Figure 2 for Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Figure 3 for Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Figure 4 for Neural Combinatorial Optimization Algorithms for Solving Vehicle Routing Problems: A Comprehensive Survey with Perspectives

Abstract:Although several surveys on Neural Combinatorial Optimization (NCO) solvers specifically designed to solve Vehicle Routing Problems (VRPs) have been conducted. These existing surveys did not cover the state-of-the-art (SOTA) NCO solvers emerged recently. More importantly, to provide a comprehensive taxonomy of NCO solvers with up-to-date coverage, based on our thorough review of relevant publications and preprints, we divide all NCO solvers into four distinct categories, namely Learning to Construct, Learning to Improve, Learning to Predict-Once, and Learning to Predict-Multiplicity solvers. Subsequently, we present the inadequacies of the SOTA solvers, including poor generalization, incapability to solve large-scale VRPs, inability to address most types of VRP variants simultaneously, and difficulty in comparing these NCO solvers with the conventional Operations Research algorithms. Simultaneously, we propose promising and viable directions to overcome these inadequacies. In addition, we compare the performance of representative NCO solvers from the Reinforcement, Supervised, and Unsupervised Learning paradigms across both small- and large-scale VRPs. Finally, following the proposed taxonomy, we provide an accompanying web page as a live repository for NCO solvers. Through this survey and the live repository, we hope to make the research community of NCO solvers for VRPs more thriving.

Via

Access Paper or Ask Questions