Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jing-Hao Xue

PathoSCOPE: Few-Shot Pathology Detection via Self-Supervised Contrastive Learning and Pathology-Informed Synthetic Embeddings

May 23, 2025

Sinchee Chin, Yinuo Ma, Xiaochen Yang, Jing-Hao Xue, Wenming Yang

Abstract:Unsupervised pathology detection trains models on non-pathological data to flag deviations as pathologies, offering strong generalizability for identifying novel diseases and avoiding costly annotations. However, building reliable normality models requires vast healthy datasets, as hospitals' data is inherently biased toward symptomatic populations, while privacy regulations hinder the assembly of representative healthy cohorts. To address this limitation, we propose PathoSCOPE, a few-shot unsupervised pathology detection framework that requires only a small set of non-pathological samples (minimum 2 shots), significantly improving data efficiency. We introduce Global-Local Contrastive Loss (GLCL), comprised of a Local Contrastive Loss to reduce the variability of non-pathological embeddings and a Global Contrastive Loss to enhance the discrimination of pathological regions. We also propose a Pathology-informed Embedding Generation (PiEG) module that synthesizes pathological embeddings guided by the global loss, better exploiting the limited non-pathological samples. Evaluated on the BraTS2020 and ChestXray8 datasets, PathoSCOPE achieves state-of-the-art performance among unsupervised methods while maintaining computational efficiency (2.48 GFLOPs, 166 FPS).

Via

Access Paper or Ask Questions

VISTA: Unsupervised 2D Temporal Dependency Representations for Time Series Anomaly Detection

Apr 03, 2025

Sinchee Chin, Fan Zhang, Xiaochen Yang, Jing-Hao Xue, Wenming Yang, Peng Jia, Guijin Wang, Luo Yingqun

Abstract:Time Series Anomaly Detection (TSAD) is essential for uncovering rare and potentially harmful events in unlabeled time series data. Existing methods are highly dependent on clean, high-quality inputs, making them susceptible to noise and real-world imperfections. Additionally, intricate temporal relationships in time series data are often inadequately captured in traditional 1D representations, leading to suboptimal modeling of dependencies. We introduce VISTA, a training-free, unsupervised TSAD algorithm designed to overcome these challenges. VISTA features three core modules: 1) Time Series Decomposition using Seasonal and Trend Decomposition via Loess (STL) to decompose noisy time series into trend, seasonal, and residual components; 2) Temporal Self-Attention, which transforms 1D time series into 2D temporal correlation matrices for richer dependency modeling and anomaly detection; and 3) Multivariate Temporal Aggregation, which uses a pretrained feature extractor to integrate cross-variable information into a unified, memory-efficient representation. VISTA's training-free approach enables rapid deployment and easy hyperparameter tuning, making it suitable for industrial applications. It achieves state-of-the-art performance on five multivariate TSAD benchmarks.

Via

Access Paper or Ask Questions

Mind the Gap: Confidence Discrepancy Can Guide Federated Semi-Supervised Learning Across Pseudo-Mismatch

Mar 17, 2025

Yijie Liu, Xinyi Shang, Yiqun Zhang, Yang Lu, Chen Gong, Jing-Hao Xue, Hanzi Wang

Abstract:Federated Semi-Supervised Learning (FSSL) aims to leverage unlabeled data across clients with limited labeled data to train a global model with strong generalization ability. Most FSSL methods rely on consistency regularization with pseudo-labels, converting predictions from local or global models into hard pseudo-labels as supervisory signals. However, we discover that the quality of pseudo-label is largely deteriorated by data heterogeneity, an intrinsic facet of federated learning. In this paper, we study the problem of FSSL in-depth and show that (1) heterogeneity exacerbates pseudo-label mismatches, further degrading model performance and convergence, and (2) local and global models' predictive tendencies diverge as heterogeneity increases. Motivated by these findings, we propose a simple and effective method called Semi-supervised Aggregation for Globally-Enhanced Ensemble (SAGE), that can flexibly correct pseudo-labels based on confidence discrepancies. This strategy effectively mitigates performance degradation caused by incorrect pseudo-labels and enhances consensus between local and global models. Experimental results demonstrate that SAGE outperforms existing FSSL methods in both performance and convergence. Our code is available at https://github.com/Jay-Codeman/SAGE

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

A Survey on Text-Driven 360-Degree Panorama Generation

Feb 20, 2025

Hai Wang, Xiaoyu Xiang, Weihao Xia, Jing-Hao Xue

Abstract:The advent of text-driven 360-degree panorama generation, enabling the synthesis of 360-degree panoramic images directly from textual descriptions, marks a transformative advancement in immersive visual content creation. This innovation significantly simplifies the traditionally complex process of producing such content. Recent progress in text-to-image diffusion models has accelerated the rapid development in this emerging field. This survey presents a comprehensive review of text-driven 360-degree panorama generation, offering an in-depth analysis of state-of-the-art algorithms and their expanding applications in 360-degree 3D scene generation. Furthermore, we critically examine current limitations and propose promising directions for future research. A curated project page with relevant resources and research papers is available at https://littlewhitesea.github.io/Text-Driven-Pano-Gen/.

Via

Access Paper or Ask Questions

Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition

Jan 03, 2025

Hu Ding, Yan Yan, Yang Lu, Jing-Hao Xue, Hanzi Wang

Figure 1 for Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition

Figure 2 for Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition

Figure 3 for Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition

Figure 4 for Uncertainty-Aware Label Refinement on Hypergraphs for Personalized Federated Facial Expression Recognition

Abstract:Most facial expression recognition (FER) models are trained on large-scale expression data with centralized learning. Unfortunately, collecting a large amount of centralized expression data is difficult in practice due to privacy concerns of facial images. In this paper, we investigate FER under the framework of personalized federated learning, which is a valuable and practical decentralized setting for real-world applications. To this end, we develop a novel uncertainty-Aware label refineMent on hYpergraphs (AMY) method. For local training, each local model consists of a backbone, an uncertainty estimation (UE) block, and an expression classification (EC) block. In the UE block, we leverage a hypergraph to model complex high-order relationships between expression samples and incorporate these relationships into uncertainty features. A personalized uncertainty estimator is then introduced to estimate reliable uncertainty weights of samples in the local client. In the EC block, we perform label propagation on the hypergraph, obtaining high-quality refined labels for retraining an expression classifier. Based on the above, we effectively alleviate heterogeneous sample uncertainty across clients and learn a robust personalized FER model in each client. Experimental results on two challenging real-world facial expression databases show that our proposed method consistently outperforms several state-of-the-art methods. This indicates the superiority of hypergraph modeling for uncertainty estimation and label refinement on the personalized federated FER task. The source code will be released at https://github.com/mobei1006/AMY.

* IEEE Transactions on Circuits and Systems for Video Technology, 2024

Via

Access Paper or Ask Questions

Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations

Jan 03, 2025

Ruikang Chen, Yan Yan, Jing-Hao Xue, Yang Lu, Hanzi Wang

Figure 1 for Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations

Figure 2 for Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations

Figure 3 for Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations

Figure 4 for Augmentation Matters: A Mix-Paste Method for X-Ray Prohibited Item Detection under Noisy Annotations

Abstract:Automatic X-ray prohibited item detection is vital for public safety. Existing deep learning-based methods all assume that the annotations of training X-ray images are correct. However, obtaining correct annotations is extremely hard if not impossible for large-scale X-ray images, where item overlapping is ubiquitous.As a result, X-ray images are easily contaminated with noisy annotations, leading to performance deterioration of existing methods.In this paper, we address the challenging problem of training a robust prohibited item detector under noisy annotations (including both category noise and bounding box noise) from a novel perspective of data augmentation, and propose an effective label-aware mixed patch paste augmentation method (Mix-Paste). Specifically, for each item patch, we mix several item patches with the same category label from different images and replace the original patch in the image with the mixed patch. In this way, the probability of containing the correct prohibited item within the generated image is increased. Meanwhile, the mixing process mimics item overlapping, enabling the model to learn the characteristics of X-ray images. Moreover, we design an item-based large-loss suppression (LLS) strategy to suppress the large losses corresponding to potentially positive predictions of additional items due to the mixing operation. We show the superiority of our method on X-ray datasets under noisy annotations. In addition, we evaluate our method on the noisy MS-COCO dataset to showcase its generalization ability. These results clearly indicate the great potential of data augmentation to handle noise annotations. The source code is released at https://github.com/wscds/Mix-Paste.

* The manuscript has been ACCEPTED for publication as a regular paper in the IEEE Transactions on Information Forensics & Security

Via

Access Paper or Ask Questions

360PanT: Training-Free Text-Driven 360-Degree Panorama-to-Panorama Translation

Sep 12, 2024

Hai Wang, Jing-Hao Xue

Abstract:Preserving boundary continuity in the translation of 360-degree panoramas remains a significant challenge for existing text-driven image-to-image translation methods. These methods often produce visually jarring discontinuities at the translated panorama's boundaries, disrupting the immersive experience. To address this issue, we propose 360PanT, a training-free approach to text-based 360-degree panorama-to-panorama translation with boundary continuity. Our 360PanT achieves seamless translations through two key components: boundary continuity encoding and seamless tiling translation with spatial control. Firstly, the boundary continuity encoding embeds critical boundary continuity information of the input 360-degree panorama into the noisy latent representation by constructing an extended input image. Secondly, leveraging this embedded noisy latent representation and guided by a target prompt, the seamless tiling translation with spatial control enables the generation of a translated image with identical left and right halves while adhering to the extended input's structure and semantic layout. This process ensures a final translated 360-degree panorama with seamless boundary continuity. Experimental results on both real-world and synthesized datasets demonstrate the effectiveness of our 360PanT in translating 360-degree panoramas. Code is available at \href{https://github.com/littlewhitesea/360PanT}{https://github.com/littlewhitesea/360PanT}.

* Accepted by WACV 2025, Project Page: \href{https://littlewhitesea.github.io/360PanT.github.io/}{https://littlewhitesea.github.io/360PanT.github.io/}

Via

Access Paper or Ask Questions

PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

May 31, 2024

Xiaoke Wang, Xiaochen Yang, Rui Zhu, Jing-Hao Xue

Figure 1 for PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Figure 2 for PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Figure 3 for PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Figure 4 for PUAL: A Classifier on Trifurcate Positive-Unlabeled Data

Abstract:Positive-unlabeled (PU) learning aims to train a classifier using the data containing only labeled-positive instances and unlabeled instances. However, existing PU learning methods are generally hard to achieve satisfactory performance on trifurcate data, where the positive instances distribute on both sides of the negative instances. To address this issue, firstly we propose a PU classifier with asymmetric loss (PUAL), by introducing a structure of asymmetric loss on positive instances into the objective function of the global and local learning classifier. Then we develop a kernel-based algorithm to enable PUAL to obtain non-linear decision boundary. We show that, through experiments on both simulated and real-world datasets, PUAL can achieve satisfactory classification on trifurcate data.

* 24 pages, 6 figures

Via

Access Paper or Ask Questions

BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

May 27, 2024

Zongkai Zhang, Zidong Xu, Wenming Yang, Qingmin Liao, Jing-Hao Xue

Figure 1 for BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Figure 2 for BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Figure 3 for BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Figure 4 for BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Abstract:Existing 3D occupancy networks demand significant hardware resources, hindering the deployment of edge devices. Binarized Neural Networks (BNN) offer substantially reduced computational and memory requirements. However, their performance decreases notably compared to full-precision networks. Moreover, it is challenging to enhance the performance of binarized models by increasing the number of binarized convolutional layers, which limits their practicability for 3D occupancy prediction. To bridge these gaps, we propose a novel binarized deep convolution (BDC) unit that effectively enhances performance while increasing the number of binarized convolutional layers. Firstly, through theoretical analysis, we demonstrate that 1 \times 1 binarized convolutions introduce minimal binarization errors. Therefore, additional binarized convolutional layers are constrained to 1 \times 1 in the BDC unit. Secondly, we introduce the per-channel weight branch to mitigate the impact of binarization errors from unimportant channel features on the performance of binarized models, thereby improving performance while increasing the number of binarized convolutional layers. Furthermore, we decompose the 3D occupancy network into four convolutional modules and utilize the proposed BDC unit to binarize these modules. Our BDC-Occ model is created by applying the proposed BDC unit to binarize the existing 3D occupancy networks. Comprehensive quantitative and qualitative experiments demonstrate that the proposed BDC-Occ is the state-of-the-art binarized 3D occupancy network algorithm.

* 19 pages, 8 figures

Via

Access Paper or Ask Questions

UMBRAE: Unified Multimodal Decoding of Brain Signals

Apr 10, 2024

Weihao Xia, Raoul de Charette, Cengiz Öztireli, Jing-Hao Xue

Abstract:We address prevailing challenges of the brain-powered research, departing from the observation that the literature hardly recover accurate spatial information and require subject-specific models. To address these challenges, we propose UMBRAE, a unified multimodal decoding of brain signals. First, to extract instance-level conceptual and spatial details from neural signals, we introduce an efficient universal brain encoder for multimodal-brain alignment and recover object descriptions at multiple levels of granularity from subsequent multimodal large language model (MLLM). Second, we introduce a cross-subject training strategy mapping subject-specific features to a common feature space. This allows a model to be trained on multiple subjects without extra resources, even yielding superior results compared to subject-specific models. Further, we demonstrate this supports weakly-supervised adaptation to new subjects, with only a fraction of the total training data. Experiments demonstrate that UMBRAE not only achieves superior results in the newly introduced tasks but also outperforms methods in well established tasks. To assess our method, we construct and share with the community a comprehensive brain understanding benchmark BrainHub. Our code and benchmark are available at https://weihaox.github.io/UMBRAE.

* Project Page: https://weihaox.github.io/UMBRAE

Via

Access Paper or Ask Questions