Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Zhang

Sid

OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest

Mar 03, 2025

Yuhan Jing, Jingyu Wang, Lei Zhang, Haifeng Sun, Bo He, Zirui Zhuang, Chengsen Wang, Qi Qi, Jianxin Liao

Figure 1 for OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest

Figure 2 for OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest

Figure 3 for OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest

Figure 4 for OIPR: Evaluation for Time-series Anomaly Detection Inspired by Operator Interest

Abstract:With the growing adoption of time-series anomaly detection (TAD) technology, numerous studies have employed deep learning-based detectors for analyzing time-series data in the fields of Internet services, industrial systems, and sensors. The selection and optimization of anomaly detectors strongly rely on the availability of an effective performance evaluation method for TAD. Since anomalies in time-series data often manifest as a sequence of points, conventional metrics that solely consider the detection of individual point are inadequate. Existing evaluation methods for TAD typically employ point-based or event-based metrics to capture the temporal context. However, point-based metrics tend to overestimate detectors that excel only in detecting long anomalies, while event-based metrics are susceptible to being misled by fragmented detection results. To address these limitations, we propose OIPR, a novel set of TAD evaluation metrics. It models the process of operators receiving detector alarms and handling faults, utilizing area under the operator interest curve to evaluate the performance of TAD algorithms. Furthermore, we build a special scenario dataset to compare the characteristics of different evaluation methods. Through experiments conducted on the special scenario dataset and five real-world datasets, we demonstrate the remarkable performance of OIPR in extreme and complex scenarios. It achieves a balance between point and event perspectives, overcoming their primary limitations and offering applicability to broader situations.

Via

Access Paper or Ask Questions

Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

Feb 27, 2025

Jianqiao Long, Lei Zhang, Miaowen Wen, Kezhi Wang, Natalio Krasnogor, Jichun Li

Figure 1 for Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

Figure 2 for Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

Figure 3 for Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

Figure 4 for Towards a Molecular Computer: Enabling Arithmetic Operations in Molecular Communication

Abstract:In current molecular communication (MC) systems, performing computational operations at the nanoscale remains challenging, restricting their applicability in complex scenarios such as adaptive biochemical control and advanced nanoscale sensing. To overcome this challenge, this paper proposes a novel framework that seamlessly integrates computation into the molecular communication process. The system enables arithmetic operations, namely addition, subtraction, multiplication, and division, by encoding numerical values into two types of molecules emitted by each transmitter to represent positive and negative values, respectively. Specifically, addition is achieved by transmitting non-reactive molecules, while subtraction employs reactive molecules that interact during propagation. The receiver demodulates molecular counts to directly compute the desired results. Theoretical analysis for an upper bound on the bit error rate (BER), and computational simulations confirm the system's robustness in performing complex arithmetic tasks. Compared to conventional MC methods, the proposed approach not only enables fundamental computational operations at the nanoscale but also lays the groundwork for intelligent, autonomous molecular networks.

* submitted for possible journal publication

Via

Access Paper or Ask Questions

External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Feb 26, 2025

Mingfu Liang, Xi Liu, Rong Jin, Boyang Liu, Qiuling Suo, Qinghai Zhou, Song Zhou, Laming Chen, Hua Zheng, Zhiyuan Li(+89 more)

Figure 1 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Figure 2 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Figure 3 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Figure 4 for External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation

Abstract:Ads recommendation is a prominent service of online advertising systems and has been actively studied. Recent studies indicate that scaling-up and advanced design of the recommendation model can bring significant performance improvement. However, with a larger model scale, such prior studies have a significantly increasing gap from industry as they often neglect two fundamental challenges in industrial-scale applications. First, training and inference budgets are restricted for the model to be served, exceeding which may incur latency and impair user experience. Second, large-volume data arrive in a streaming mode with data distributions dynamically shifting, as new users/ads join and existing users/ads leave the system. We propose the External Large Foundation Model (ExFM) framework to address the overlooked challenges. Specifically, we develop external distillation and a data augmentation system (DAS) to control the computational cost of training/inference while maintaining high performance. We design the teacher in a way like a foundation model (FM) that can serve multiple students as vertical models (VMs) to amortize its building cost. We propose Auxiliary Head and Student Adapter to mitigate the data distribution gap between FM and VMs caused by the streaming data issue. Comprehensive experiments on internal industrial-scale applications and public datasets demonstrate significant performance gain by ExFM.

* Accepted by the ACM Web Conference (WWW) 2025 Industrial Track as Oral Presentation

Via

Access Paper or Ask Questions

On-the-fly Preference Alignment via Principle-Guided Decoding

Feb 20, 2025

Mingye Zhu, Yi Liu, Lei Zhang, Junbo Guo, Zhendong Mao

Figure 1 for On-the-fly Preference Alignment via Principle-Guided Decoding

Figure 2 for On-the-fly Preference Alignment via Principle-Guided Decoding

Figure 3 for On-the-fly Preference Alignment via Principle-Guided Decoding

Figure 4 for On-the-fly Preference Alignment via Principle-Guided Decoding

Abstract:With the rapidly expanding landscape of large language models, aligning model generations with human values and preferences is becoming increasingly important. Popular alignment methods, such as Reinforcement Learning from Human Feedback, have shown significant success in guiding models with greater control. However, these methods require considerable computational resources, which is inefficient, and substantial collection of training data to accommodate the diverse and pluralistic nature of human preferences, which is impractical. These limitations significantly constrain the scope and efficacy of both task-specific and general preference alignment methods. In this work, we introduce On-the-fly Preference Alignment via Principle-Guided Decoding (OPAD) to directly align model outputs with human preferences during inference, eliminating the need for fine-tuning. Our approach involves first curating a surrogate solution to an otherwise infeasible optimization problem and then designing a principle-guided reward function based on this surrogate. The final aligned policy is derived by maximizing this customized reward, which exploits the discrepancy between the constrained policy and its unconstrained counterpart. OPAD directly modifies the model's predictions during inference, ensuring principle adherence without incurring the computational overhead of retraining or fine-tuning. Experiments show that OPAD achieves competitive or superior performance in both general and personalized alignment tasks, demonstrating its efficiency and effectiveness compared to state-of-the-art baselines.

* Accepted to ICLR 2025

Via

Access Paper or Ask Questions

Personalized Image Generation with Deep Generative Models: A Decade Survey

Feb 18, 2025

Yuxiang Wei, Yiheng Zheng, Yabo Zhang, Ming Liu, Zhilong Ji, Lei Zhang, Wangmeng Zuo

Figure 1 for Personalized Image Generation with Deep Generative Models: A Decade Survey

Figure 2 for Personalized Image Generation with Deep Generative Models: A Decade Survey

Figure 3 for Personalized Image Generation with Deep Generative Models: A Decade Survey

Figure 4 for Personalized Image Generation with Deep Generative Models: A Decade Survey

Abstract:Recent advancements in generative models have significantly facilitated the development of personalized content creation. Given a small set of images with user-specific concept, personalized image generation allows to create images that incorporate the specified concept and adhere to provided text descriptions. Due to its wide applications in content creation, significant effort has been devoted to this field in recent years. Nonetheless, the technologies used for personalization have evolved alongside the development of generative models, with their distinct and interrelated components. In this survey, we present a comprehensive review of generalized personalized image generation across various generative models, including traditional GANs, contemporary text-to-image diffusion models, and emerging multi-model autoregressive models. We first define a unified framework that standardizes the personalization process across different generative models, encompassing three key components, i.e., inversion spaces, inversion methods, and personalization schemes. This unified framework offers a structured approach to dissecting and comparing personalization techniques across different generative architectures. Building upon this unified framework, we further provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations. Through comparative analysis, this survey elucidates the current landscape of personalized image generation, identifying commonalities and distinguishing features among existing methods. Finally, we discuss the open challenges in the field and propose potential directions for future research. We keep tracing related works at https://github.com/csyxwei/Awesome-Personalized-Image-Generation.

* 39 pages; under submission; more information: https://github.com/csyxwei/Awesome-Personalized-Image-Generation

Via

Access Paper or Ask Questions

Identifying Flaky Tests in Quantum Code: A Machine Learning Approach

Feb 06, 2025

Khushdeep Kaur, Dongchan Kim, Ainaz Jamshidi, Lei Zhang

Abstract:Testing and debugging quantum software pose significant challenges due to the inherent complexities of quantum mechanics, such as superposition and entanglement. One challenge is indeterminacy, a fundamental characteristic of quantum systems, which increases the likelihood of flaky tests in quantum programs. To the best of our knowledge, there is a lack of comprehensive studies on quantum flakiness in the existing literature. In this paper, we present a novel machine learning platform that leverages multiple machine learning models to automatically detect flaky tests in quantum programs. Our evaluation shows that the extreme gradient boosting and decision tree-based models outperform other models (i.e., random forest, k-nearest neighbors, and support vector machine), achieving the highest F1 score and Matthews Correlation Coefficient in a balanced dataset and an imbalanced dataset, respectively. Furthermore, we expand the currently limited dataset for researchers interested in quantum flaky tests. In the future, we plan to explore the development of unsupervised learning techniques to detect and classify quantum flaky tests more effectively. These advancements aim to improve the reliability and robustness of quantum software testing.

* 8 pages, 1 figure, accepted by Q-SANER 2025

Via

Access Paper or Ask Questions

MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images

Feb 05, 2025

Dawei Lu, Deqiang Xiao, Danni Ai, Jingfan Fan, Tianyu Fu, Yucong Lin, Hong Song, Xujiong Ye, Lei Zhang, Jian Yang

Figure 1 for MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images

Figure 2 for MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images

Figure 3 for MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images

Figure 4 for MetaFE-DE: Learning Meta Feature Embedding for Depth Estimation from Monocular Endoscopic Images

Abstract:Depth estimation from monocular endoscopic images presents significant challenges due to the complexity of endoscopic surgery, such as irregular shapes of human soft tissues, as well as variations in lighting conditions. Existing methods primarily estimate the depth information from RGB images directly, and often surffer the limited interpretability and accuracy. Given that RGB and depth images are two views of the same endoscopic surgery scene, in this paper, we introduce a novel concept referred as ``meta feature embedding (MetaFE)", in which the physical entities (e.g., tissues and surgical instruments) of endoscopic surgery are represented using the shared features that can be alternatively decoded into RGB or depth image. With this concept, we propose a two-stage self-supervised learning paradigm for the monocular endoscopic depth estimation. In the first stage, we propose a temporal representation learner using diffusion models, which are aligned with the spatial information through the cross normalization to construct the MetaFE. In the second stage, self-supervised monocular depth estimation with the brightness calibration is applied to decode the meta features into the depth image. Extensive evaluation on diverse endoscopic datasets demonstrates that our approach outperforms the state-of-the-art method in depth estimation, achieving superior accuracy and generalization. The source code will be publicly available.

Via

Access Paper or Ask Questions

Incorporating Cyclic Group Equivariance into Deep Learning for Reliable Reconstruction of Rotationally Symmetric Tomography Systems

Feb 04, 2025

Yaogong Zhang, Fang-Fang Yin, Lei Zhang

Figure 1 for Incorporating Cyclic Group Equivariance into Deep Learning for Reliable Reconstruction of Rotationally Symmetric Tomography Systems

Figure 2 for Incorporating Cyclic Group Equivariance into Deep Learning for Reliable Reconstruction of Rotationally Symmetric Tomography Systems

Figure 3 for Incorporating Cyclic Group Equivariance into Deep Learning for Reliable Reconstruction of Rotationally Symmetric Tomography Systems

Figure 4 for Incorporating Cyclic Group Equivariance into Deep Learning for Reliable Reconstruction of Rotationally Symmetric Tomography Systems

Abstract:Rotational symmetry is a defining feature of many tomography systems, including computed tomography (CT) and emission computed tomography (ECT), where detectors are arranged in a circular or periodically rotating configuration. This study revisits the image reconstruction process from the perspective of hardware-induced rotational symmetry and introduces a cyclic group equivariance framework for deep learning-based reconstruction. Specifically, we derive a mathematical correspondence that couples cyclic rotations in the projection domain to discrete rotations in the image domain, both arising from the same cyclic group inherent in the hardware design. This insight also reveals the uniformly distributed circular structure of the projection space. Building on this principle, we provide a cyclic rotation equivariant convolution design method to preserve projection domain symmetry and a cyclic group equivariance regularization approach that enforces consistent rotational transformations across the entire network. We further integrate these modules into a domain transform reconstruction framework and validate them using digital brain phantoms, training on discrete models and testing on more complex and realistic fuzzy variants. Results indicate markedly improved generalization and stability, with fewer artifacts and better detail preservation, especially under data distribution deviation. These findings highlight the potential of cyclic group equivariance as a unifying principle for tomographic reconstruction in rotationally symmetric systems, offering a flexible and interpretable solution for scenarios with limited data.

Via

Access Paper or Ask Questions

Baichuan-Omni-1.5 Technical Report

Jan 26, 2025

Yadong Li, Jun Liu, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan(+82 more)

Figure 1 for Baichuan-Omni-1.5 Technical Report

Figure 2 for Baichuan-Omni-1.5 Technical Report

Figure 3 for Baichuan-Omni-1.5 Technical Report

Figure 4 for Baichuan-Omni-1.5 Technical Report

Abstract:We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.

Via

Access Paper or Ask Questions

Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models

Jan 23, 2025

Chaolei Han, Hongsong Wang, Jidong Kuang, Lei Zhang, Jie Gui

Abstract:Existing zero-shot temporal action detection (ZSTAD) methods predominantly use fully supervised or unsupervised strategies to recognize unseen activities. However, these training-based methods are prone to domain shifts and require high computational costs, which hinder their practical applicability in real-world scenarios. In this paper, unlike previous works, we propose a training-Free Zero-shot temporal Action Detection (FreeZAD) method, leveraging existing vision-language (ViL) models to directly classify and localize unseen activities within untrimmed videos without any additional fine-tuning or adaptation. We mitigate the need for explicit temporal modeling and reliance on pseudo-label quality by designing the LOGarithmic decay weighted Outer-Inner-Contrastive Score (LogOIC) and frequency-based Actionness Calibration. Furthermore, we introduce a test-time adaptation (TTA) strategy using Prototype-Centric Sampling (PCS) to expand FreeZAD, enabling ViL models to adapt more effectively for ZSTAD. Extensive experiments on the THUMOS14 and ActivityNet-1.3 datasets demonstrate that our training-free method outperforms state-of-the-art unsupervised methods while requiring only 1/13 of the runtime. When equipped with TTA, the enhanced method further narrows the gap with fully supervised methods.

Via

Access Paper or Ask Questions