Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zihan Xiong

U-HNO: A U-shaped Hybrid Neural Operator with Sparse-Point Adaptive Routing for Non-stationary PDE Dynamics

May 13, 2026

Yingzhe Ma, Xiao Yang, Yuxin Xie, Zihan Xiong, Jinliang Liu

Abstract:Solutions to many partial differential equations (PDEs) display coexisting smooth global transport and localized sharp features within a single trajectory: shock fronts, thin interfaces, and concentrated high-frequency content sit on top of slowly varying backgrounds. This poses a challenge for neural operators: Fourier-based architectures mix nonlocal interactions efficiently but tend to under-resolve localized non-smooth features, whereas spatially local architectures recover fine detail at the cost of long-range propagation and rollout stability. Existing hybrid operators paper over this tension with a fixed, spatially uniform fusion that forces the same trade-off everywhere. We propose U-HNO, a U-shaped hybrid neural operator whose central design is Sparse-Point Adaptive Routing (SPAR): at every spatial location, a per-pixel hard mask selects whether the global Fourier branch or the local multi-scale Gaussian branch should dominate, and the sparsity ratio is a function of the local contrast of the routing signal, so smooth and shock-aligned regions receive different mixtures of global and local computation. SPAR is embedded in a hierarchical encoder-bottleneck-decoder backbone with skip connections so that the dual branches and the gate operate at every resolution. Training combines pointwise supervision with a finite-difference H^1 gradient term and a band-wise spectral consistency regularizer. Across benchmarks spanning 1D Burgers, Kuramoto-Sivashinsky, KdV, 2D advection, Allen-Cahn, Navier-Stokes, Darcy flow, and 3D transonic compressible Navier-Stokes from PDEBench, U-HNO achieves state-of-the-art rollout accuracy on the majority of tasks in both relative L^2 and H^1 metrics, with the largest gains on problems dominated by sharp localized features. Ablations show that removing any single component substantially degrades rollout error.

* 26 pages, 7 figures

Via

Access Paper or Ask Questions

HIFICL: High-Fidelity In-Context Learning for Multimodal Tasks

Mar 13, 2026

Xiaoyu Li, Yuhang Liu, Zheng Luo, Xuanshuo Kang, Fangqi Lou, Xiaohua Wu, Zihan Xiong

Abstract:In-Context Learning (ICL) is a significant paradigm for Large Multimodal Models (LMMs), using a few in-context demonstrations (ICDs) for new task adaptation. However, its performance is sensitive to demonstration configurations and computationally expensive. Mathematically, the influence of these demonstrations can be decomposed into a dynamic mixture of the standard attention output and the context values. Current approximation methods simplify this process by learning a "shift vector". Inspired by the exact decomposition, we introduce High-Fidelity In-Context Learning (HIFICL) to more faithfully model the ICL mechanism. HIFICL consists of three key components: 1) a set of "virtual key-value pairs" to act as a learnable context, 2) a low-rank factorization for stable and regularized training, and 3) a simple end-to-end training objective. From another perspective, this mechanism constitutes a form of context-aware Parameter-Efficient Fine-Tuning (PEFT). Extensive experiments show that HiFICL consistently outperforms existing approximation methods on several multimodal benchmarks. The code is available at https://github.com/bbbandari/HiFICL.

* Accepted to CVPR 2026. Code available at https://github.com/bbbandari/HiFICL

Via

Access Paper or Ask Questions

Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

May 19, 2025

Zihan Xiong, Xiaohua Wu, Lei Chen, Fangqi Lou

Figure 1 for Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Figure 2 for Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Figure 3 for Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Figure 4 for Multiscale Adaptive Conflict-Balancing Model For Multimedia Deepfake Detection

Abstract:Advances in computer vision and deep learning have blurred the line between deepfakes and authentic media, undermining multimedia credibility through audio-visual forgery. Current multimodal detection methods remain limited by unbalanced learning between modalities. To tackle this issue, we propose an Audio-Visual Joint Learning Method (MACB-DF) to better mitigate modality conflicts and neglect by leveraging contrastive learning to assist in multi-level and cross-modal fusion, thereby fully balancing and exploiting information from each modality. Additionally, we designed an orthogonalization-multimodal pareto module that preserves unimodal information while addressing gradient conflicts in audio-video encoders caused by differing optimization targets of the loss functions. Extensive experiments and ablation studies conducted on mainstream deepfake datasets demonstrate consistent performance gains of our model across key evaluation metrics, achieving an average accuracy of 95.5% across multiple datasets. Notably, our method exhibits superior cross-dataset generalization capabilities, with absolute improvements of 8.0% and 7.7% in ACC scores over the previous best-performing approach when trained on DFDC and tested on DefakeAVMiT and FakeAVCeleb datasets.

* 9 pages,ICMR accepted

Via

Access Paper or Ask Questions