Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Zhang

AI Lab, Netease

Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Feb 19, 2025

Yuan Yao, Xiaopu Zhang, Yu Zhang, Jian Jin, Qiang Yang

Figure 1 for Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Figure 2 for Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Figure 3 for Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Figure 4 for Noise May Contain Transferable Knowledge: Understanding Semi-supervised Heterogeneous Domain Adaptation from an Empirical Perspective

Abstract:Semi-supervised heterogeneous domain adaptation (SHDA) addresses learning across domains with distinct feature representations and distributions, where source samples are labeled while most target samples are unlabeled, with only a small fraction labeled. Moreover, there is no one-to-one correspondence between source and target samples. Although various SHDA methods have been developed to tackle this problem, the nature of the knowledge transferred across heterogeneous domains remains unclear. This paper delves into this question from an empirical perspective. We conduct extensive experiments on about 330 SHDA tasks, employing two supervised learning methods and seven representative SHDA methods. Surprisingly, our observations indicate that both the category and feature information of source samples do not significantly impact the performance of the target domain. Additionally, noise drawn from simple distributions, when used as source samples, may contain transferable knowledge. Based on this insight, we perform a series of experiments to uncover the underlying principles of transferable knowledge in SHDA. Specifically, we design a unified Knowledge Transfer Framework (KTF) for SHDA. Based on the KTF, we find that the transferable knowledge in SHDA primarily stems from the transferability and discriminability of the source domain. Consequently, ensuring those properties in source samples, regardless of their origin (e.g., image, text, noise), can enhance the effectiveness of knowledge transfer in SHDA tasks. The codes and datasets are available at https://github.com/yyyaoyuan/SHDA.

Via

Access Paper or Ask Questions

Corrupted but Not Broken: Rethinking the Impact of Corrupted Data in Visual Instruction Tuning

Feb 18, 2025

Yunhao Gou, Hansi Yang, Zhili Liu, Kai Chen, Yihan Zeng, Lanqing Hong, Zhenguo Li, Qun Liu, James T. Kwok, Yu Zhang

Abstract:Visual Instruction Tuning (VIT) enhances Multimodal Large Language Models (MLLMs) but it is hindered by corrupted datasets containing hallucinated content, incorrect responses, and poor OCR quality. While prior works focus on dataset refinement through high-quality data collection or rule-based filtering, they are costly or limited to specific types of corruption. To deeply understand how corrupted data affects MLLMs, in this paper, we systematically investigate this issue and find that while corrupted data degrades the performance of MLLMs, its effects are largely superficial in that the performance of MLLMs can be largely restored by either disabling a small subset of parameters or post-training with a small amount of clean data. Additionally, corrupted MLLMs exhibit improved ability to distinguish clean samples from corrupted ones, enabling the dataset cleaning without external help. Based on those insights, we propose a corruption-robust training paradigm combining self-validation and post-training, which significantly outperforms existing corruption mitigation strategies.

Via

Access Paper or Ask Questions

RM-PoT: Reformulating Mathematical Problems and Solving via Program of Thoughts

Feb 18, 2025

Yu Zhang, Shujun Peng, Nengwu Wu, Xinhan Lin, Yang Hu, Jie Tang

Abstract:Recently, substantial advancements have been made in training language models to carry out step-by-step reasoning for solving intricate numerical reasoning tasks. Beyond the methods used to solve these problems, the structure and formulation of the problems themselves also play a crucial role in determining the performance of large language models. We observe that even small changes in the surface form of mathematical problems can have a profound impact on both the answer distribution and solve rate. This highlights the vulnerability of LLMs to surface-level variations, revealing its limited robustness when reasoning through complex problems. In this paper, we propose RM-PoT, a three-stage framework that integrates problem reformulation (RM), code-aided reasoning (PoT), and domain-aware few-shot learning to address these limitations. Our approach first reformulates the input problem into diverse surface forms to reduce structural bias, then retrieves five semantically aligned examples from a pre-constructed domain-specific question bank to provide contextual guidance, and finally generates executable Python code for precise computation.

Via

Access Paper or Ask Questions

Diffusion Models for Computational Neuroimaging: A Survey

Feb 10, 2025

Haokai Zhao, Haowei Lou, Lina Yao, Wei Peng, Ehsan Adeli, Kilian M Pohl, Yu Zhang

Figure 1 for Diffusion Models for Computational Neuroimaging: A Survey

Figure 2 for Diffusion Models for Computational Neuroimaging: A Survey

Abstract:Computational neuroimaging involves analyzing brain images or signals to provide mechanistic insights and predictive tools for human cognition and behavior. While diffusion models have shown stability and high-quality generation in natural images, there is increasing interest in adapting them to analyze brain data for various neurological tasks such as data enhancement, disease diagnosis and brain decoding. This survey provides an overview of recent efforts to integrate diffusion models into computational neuroimaging. We begin by introducing the common neuroimaging data modalities, follow with the diffusion formulations and conditioning mechanisms. Then we discuss how the variations of the denoising starting point, condition input and generation target of diffusion models are developed and enhance specific neuroimaging tasks. For a comprehensive overview of the ongoing research, we provide a publicly available repository at https://github.com/JoeZhao527/dm4neuro.

* 9 pages, 1 figure

Via

Access Paper or Ask Questions

Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Jan 30, 2025

Yiwei Shi, Jingyu Hu, Yu Zhang, Mengyue Yang, Weinan Zhang, Cunjia Liu, Weiru Liu

Figure 1 for Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Figure 2 for Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Figure 3 for Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Figure 4 for Beyond Prior Limits: Addressing Distribution Misalignment in Particle Filtering

Abstract:Particle filtering is a Bayesian inference method and a fundamental tool in state estimation for dynamic systems, but its effectiveness is often limited by the constraints of the initial prior distribution, a phenomenon we define as the Prior Boundary Phenomenon. This challenge arises when target states lie outside the prior's support, rendering traditional particle filtering methods inadequate for accurate estimation. Although techniques like unbounded priors and larger particle sets have been proposed, they remain computationally prohibitive and lack adaptability in dynamic scenarios. To systematically overcome these limitations, we propose the Diffusion-Enhanced Particle Filtering Framework, which introduces three key innovations: adaptive diffusion through exploratory particles, entropy-driven regularisation to prevent weight collapse, and kernel-based perturbations for dynamic support expansion. These mechanisms collectively enable particle filtering to explore beyond prior boundaries, ensuring robust state estimation for out-of-boundary targets. Theoretical analysis and extensive experiments validate framework's effectiveness, indicating significant improvements in success rates and estimation accuracy across high-dimensional and non-convex scenarios.

Via

Access Paper or Ask Questions

ARFlow: Autogressive Flow with Hybrid Linear Attention

Jan 27, 2025

Mude Hui, Rui-Jie Zhu, Songlin Yang, Yu Zhang, Zirui Wang, Yuyin Zhou, Jason Eshraghian, Cihang Xie

Figure 1 for ARFlow: Autogressive Flow with Hybrid Linear Attention

Figure 2 for ARFlow: Autogressive Flow with Hybrid Linear Attention

Figure 3 for ARFlow: Autogressive Flow with Hybrid Linear Attention

Figure 4 for ARFlow: Autogressive Flow with Hybrid Linear Attention

Abstract:Flow models are effective at progressively generating realistic images, but they generally struggle to capture long-range dependencies during the generation process as they compress all the information from previous time steps into a single corrupted image. To address this limitation, we propose integrating autoregressive modeling -- known for its excellence in modeling complex, high-dimensional joint probability distributions -- into flow models. During training, at each step, we construct causally-ordered sequences by sampling multiple images from the same semantic category and applying different levels of noise, where images with higher noise levels serve as causal predecessors to those with lower noise levels. This design enables the model to learn broader category-level variations while maintaining proper causal relationships in the flow process. During generation, the model autoregressively conditions the previously generated images from earlier denoising steps, forming a contextual and coherent generation trajectory. Additionally, we design a customized hybrid linear attention mechanism tailored to our modeling approach to enhance computational efficiency. Our approach, termed ARFlow, under 400k training steps, achieves 14.08 FID scores on ImageNet at 128 * 128 without classifier-free guidance, reaching 4.34 FID with classifier-free guidance 1.5, significantly outperforming the previous flow-based model SiT's 9.17 FID. Extensive ablation studies demonstrate the effectiveness of our modeling strategy and chunk-wise attention design.

Via

Access Paper or Ask Questions

Exploring the Feasibility of Deep Learning Models for Long-term Disease Prediction: A Case Study for Wheat Yellow Rust in England

Jan 26, 2025

Zhipeng Yuan, Yu Zhang, Gaoshan Bi, Po Yang

Abstract:Wheat yellow rust, caused by the fungus Puccinia striiformis, is a critical disease affecting wheat crops across Britain, leading to significant yield losses and economic consequences. Given the rapid environmental changes and the evolving virulence of pathogens, there is a growing need for innovative approaches to predict and manage such diseases over the long term. This study explores the feasibility of using deep learning models to predict outbreaks of wheat yellow rust in British fields, offering a proactive approach to disease management. We construct a yellow rust dataset with historial weather information and disease indicator acrossing multiple regions in England. We employ two poweful deep learning models, including fully connected neural networks and long short-term memory to develop predictive models capable of recognizing patterns and predicting future disease outbreaks.The models are trained and validated in a randomly sliced datasets. The performance of these models with different predictive time steps are evaluated based on their accuracy, precision, recall, and F1-score. Preliminary results indicate that deep learning models can effectively capture the complex interactions between multiple factors influencing disease dynamics, demonstrating a promising capacity to forecast wheat yellow rust with considerable accuracy. Specifically, the fully-connected neural network achieved 83.65% accuracy in a disease prediction task with 6 month predictive time step setup. These findings highlight the potential of deep learning to transform disease management strategies, enabling earlier and more precise interventions. Our study provides a methodological framework for employing deep learning in agricultural settings but also opens avenues for future research to enhance the robustness and applicability of predictive models in combating crop diseases globally.

Via

Access Paper or Ask Questions

EliGen: Entity-Level Controlled Image Generation with Regional Attention

Jan 02, 2025

Hong Zhang, Zhongjie Duan, Xingjun Wang, Yingda Chen, Yu Zhang

Figure 1 for EliGen: Entity-Level Controlled Image Generation with Regional Attention

Figure 2 for EliGen: Entity-Level Controlled Image Generation with Regional Attention

Figure 3 for EliGen: Entity-Level Controlled Image Generation with Regional Attention

Figure 4 for EliGen: Entity-Level Controlled Image Generation with Regional Attention

Abstract:Recent advancements in diffusion models have significantly advanced text-to-image generation, yet global text prompts alone remain insufficient for achieving fine-grained control over individual entities within an image. To address this limitation, we present EliGen, a novel framework for Entity-Level controlled Image Generation. We introduce regional attention, a mechanism for diffusion transformers that requires no additional parameters, seamlessly integrating entity prompts and arbitrary-shaped spatial masks. By contributing a high-quality dataset with fine-grained spatial and semantic entity-level annotations, we train EliGen to achieve robust and accurate entity-level manipulation, surpassing existing methods in both positional control precision and image quality. Additionally, we propose an inpainting fusion pipeline, extending EliGen to multi-entity image inpainting tasks. We further demonstrate its flexibility by integrating it with community models such as IP-Adapter and MLLM, unlocking new creative possibilities. The source code, dataset, and model will be released publicly.

Via

Access Paper or Ask Questions

MoPD: Mixture-of-Prompts Distillation for Vision-Language Models

Dec 26, 2024

Yang Chen, Shuai Fu, Yu Zhang

Abstract:Soft prompt learning methods are effective for adapting vision-language models (VLMs) to downstream tasks. Nevertheless, empirical evidence reveals a tendency of existing methods that they overfit seen classes and exhibit degraded performance on unseen classes. This limitation is due to the inherent bias in the training data towards the seen classes. To address this issue, we propose a novel soft prompt learning method, named Mixture-of-Prompts Distillation (MoPD), which can effectively transfer useful knowledge from hard prompts manually hand-crafted (a.k.a. teacher prompts) to the learnable soft prompt (a.k.a. student prompt), thereby enhancing the generalization ability of soft prompts on unseen classes. Moreover, the proposed MoPD method utilizes a gating network that learns to select hard prompts used for prompt distillation. Extensive experiments demonstrate that the proposed MoPD method outperforms state-of-the-art baselines especially on on unseen classes.

Via

Access Paper or Ask Questions

Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

Dec 19, 2024

Shuang Li, Qian Chen, Chulhong Kim, Seongwook Choi, Yibing Wang, Yu Zhang, Changhui Li

Figure 1 for Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

Figure 2 for Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

Figure 3 for Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

Figure 4 for Zero-Shot Artifact2Artifact: Self-incentive artifact removal for photoacoustic imaging without any data

Abstract:Photoacoustic imaging (PAI) uniquely combines optical contrast with the penetration depth of ultrasound, making it critical for clinical applications. However, the quality of 3D PAI is often degraded due to reconstruction artifacts caused by the sparse and angle-limited configuration of detector arrays. Existing iterative or deep learning-based methods are either time-consuming or require large training datasets, significantly limiting their practical application. Here, we propose Zero-Shot Artifact2Artifact (ZS-A2A), a zero-shot self-supervised artifact removal method based on a super-lightweight network, which leverages the fact that reconstruction artifacts are sensitive to irregularities caused by data loss. By introducing random perturbations to the acquired PA data, it spontaneously generates subset data, which in turn stimulates the network to learn the artifact patterns in the reconstruction results, thus enabling zero-shot artifact removal. This approach requires neither training data nor prior knowledge of the artifacts, and is capable of artifact removal for 3D PAI. For maximum amplitude projection (MAP) images or slice images in 3D PAI acquired with arbitrarily sparse or angle-limited detector arrays, ZS-A2A employs a self-incentive strategy to complete artifact removal and improves the Contrast-to-Noise Ratio (CNR). We validated ZS-A2A in both simulation study and $ in\ vivo $ animal experiments. Results demonstrate that ZS-A2A achieves state-of-the-art (SOTA) performance compared to existing zero-shot methods, and for the $ in\ vivo $ rat liver, ZS-A2A improves CNR from 17.48 to 43.46 in just 8 seconds. The project for ZS-A2A will be available in the following GitHub repository: https://github.com/JaegerCQ/ZS-A2A.

Via

Access Paper or Ask Questions