Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Zhou

CTP: A hybrid CNN-Transformer-PINN model for ocean front forecasting

May 16, 2025

Yishuo Wang, Feng Zhou, Muping Zhou, Qicheng Meng, Zhijun Hu, Yi Wang

Abstract:This paper proposes CTP, a novel deep learning framework that integrates convolutional neural network(CNN), Transformer architectures, and physics-informed neural network(PINN) for ocean front prediction. Ocean fronts, as dynamic interfaces between distinct water masses, play critical roles in marine biogeochemical and physical processes. Existing methods such as LSTM, ConvLSTM, and AttentionConv often struggle to maintain spatial continuity and physical consistency over multi-step forecasts. CTP addresses these challenges by combining localized spatial encoding, long-range temporal attention, and physical constraint enforcement. Experimental results across south China sea(SCS) and Kuroshio(KUR) regions from 1993 to 2020 demonstrate that CTP achieves state-of-the-art(SOTA) performance in both single-step and multi-step predictions, significantly outperforming baseline models in accuracy, $F_1$ score, and temporal stability.

Via

Access Paper or Ask Questions

Preliminary Explorations with GPT-4o(mni) Native Image Generation

May 06, 2025

Pu Cao, Feng Zhou, Junyi Ji, Qingye Kong, Zhixiang Lv, Mingjian Zhang, Xuekun Zhao, Siqi Wu, Yinghui Lin, Qing Song(+1 more)

Figure 1 for Preliminary Explorations with GPT-4o(mni) Native Image Generation

Figure 2 for Preliminary Explorations with GPT-4o(mni) Native Image Generation

Figure 3 for Preliminary Explorations with GPT-4o(mni) Native Image Generation

Figure 4 for Preliminary Explorations with GPT-4o(mni) Native Image Generation

Abstract:Recently, the visual generation ability by GPT-4o(mni) has been unlocked by OpenAI. It demonstrates a very remarkable generation capability with excellent multimodal condition understanding and varied task instructions. In this paper, we aim to explore the capabilities of GPT-4o across various tasks. Inspired by previous study, we constructed a task taxonomy along with a carefully curated set of test samples to conduct a comprehensive qualitative test. Benefiting from GPT-4o's powerful multimodal comprehension, its image-generation process demonstrates abilities surpassing those of traditional image-generation tasks. Thus, regarding the dimensions of model capabilities, we evaluate its performance across six task categories: traditional image generation tasks, discriminative tasks, knowledge-based generation, commonsense-based generation, spatially-aware image generation, and temporally-aware image generation. These tasks not only assess the quality and conditional alignment of the model's outputs but also probe deeper into GPT-4o's understanding of real-world concepts. Our results reveal that GPT-4o performs impressively well in general-purpose synthesis tasks, showing strong capabilities in text-to-image generation, visual stylization, and low-level image processing. However, significant limitations remain in its ability to perform precise spatial reasoning, instruction-grounded generation, and consistent temporal prediction. Furthermore, when faced with knowledge-intensive or domain-specific scenarios, such as scientific illustrations or mathematical plots, the model often exhibits hallucinations, factual errors, or structural inconsistencies. These findings suggest that while GPT-4o marks a substantial advancement in unified multimodal generation, there is still a long way to go before it can be reliably applied to professional or safety-critical domains.

Via

Access Paper or Ask Questions

The Mediating Effects of Emotions on Trust through Risk Perception and System Performance in Automated Driving

Apr 06, 2025

Lilit Avetisyan, Emmanuel Abolarin, Vanik Zakarian, X. Jessie Yang, Feng Zhou

Abstract:Trust in automated vehicles (AVs) has traditionally been explored through a cognitive lens, but growing evidence highlights the significant role emotions play in shaping trust. This study investigates how risk perception and AV performance (error vs. no error) influence emotional responses and trust in AVs, using mediation analysis to examine the indirect effects of emotions. In this study, 70 participants (42 male, 28 female) watched real-life recorded videos of AVs operating with or without errors, coupled with varying levels of risk information (high, low, or none). They reported their anticipated emotional responses using 19 discrete emotion items, and trust was assessed through dispositional, learned, and situational trust measures. Factor analysis identified four key emotional components, namely hostility, confidence, anxiety, and loneliness, that were influenced by risk perception and AV performance. The linear mixed model showed that risk perception was not a significant predictor of trust, while performance and individual differences were. Mediation analysis revealed that confidence was a strong positive mediator, while hostile and anxious emotions negatively impacted trust. However, lonely emotions did not significantly mediate the relationship between AV performance and trust. The results show that real-time AV behavior is more influential on trust than pre-existing risk perceptions, indicating trust in AVs might be more experience-based than shaped by prior beliefs. Our findings also underscore the importance of fostering positive emotional responses for trust calibration, which has important implications for user experience design in automated driving.

Via

Access Paper or Ask Questions

L2HCount:Generalizing Crowd Counting from Low to High Crowd Density via Density Simulation

Mar 17, 2025

Guoliang Xu, Jianqin Yin, Ren Zhang, Yonghao Dang, Feng Zhou, Bo Yu

Abstract:Since COVID-19, crowd-counting tasks have gained wide applications. While supervised methods are reliable, annotation is more challenging in high-density scenes due to small head sizes and severe occlusion, whereas it's simpler in low-density scenes. Interestingly, can we train the model in low-density scenes and generalize it to high-density scenes? Therefore, we propose a low- to high-density generalization framework (L2HCount) that learns the pattern related to high-density scenes from low-density ones, enabling it to generalize well to high-density scenes. Specifically, we first introduce a High-Density Simulation Module and a Ground-Truth Generation Module to construct fake high-density images along with their corresponding ground-truth crowd annotations respectively by image-shifting technique, effectively simulating high-density crowd patterns. However, the simulated images have two issues: image blurring and loss of low-density image characteristics. Therefore, we second propose a Head Feature Enhancement Module to extract clear features in the simulated high-density scene. Third, we propose a Dual-Density Memory Encoding Module that uses two crowd memories to learn scene-specific patterns from low- and simulated high-density scenes, respectively. Extensive experiments on four challenging datasets have shown the promising performance of L2HCount.

Via

Access Paper or Ask Questions

Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Mar 12, 2025

Feng Zhou, Pu Cao, Yiyang Ma, Lu Yang, Jianqin Yin

Figure 1 for Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Figure 2 for Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Figure 3 for Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Figure 4 for Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation

Abstract:Denoising higher-resolution latents via a pre-trained U-Net leads to repetitive and disordered image patterns. Although recent studies make efforts to improve generative quality by aligning denoising process across original and higher resolutions, the root cause of suboptimal generation is still lacking exploration. Through comprehensive analysis of position encoding in U-Net, we attribute it to inconsistent position encoding, sourced by the inadequate propagation of position information from zero-padding to latent features in convolution layers as resolution increases. To address this issue, we propose a novel training-free approach, introducing a Progressive Boundary Complement (PBC) method. This method creates dynamic virtual image boundaries inside the feature map to enhance position information propagation, enabling high-quality and rich-content high-resolution image synthesis. Extensive experiments demonstrate the superiority of our method.

* Submitted to ICML 2025

Via

Access Paper or Ask Questions

Efficient Membership Inference Attacks by Bayesian Neural Network

Mar 10, 2025

Zhenlong Liu, Wenyu Jiang, Feng Zhou, Hongxin Wei

Abstract:Membership Inference Attacks (MIAs) aim to estimate whether a specific data point was used in the training of a given model. Previous attacks often utilize multiple reference models to approximate the conditional score distribution, leading to significant computational overhead. While recent work leverages quantile regression to estimate conditional thresholds, it fails to capture epistemic uncertainty, resulting in bias in low-density regions. In this work, we propose a novel approach - Bayesian Membership Inference Attack (BMIA), which performs conditional attack through Bayesian inference. In particular, we transform a trained reference model into Bayesian neural networks by Laplace approximation, enabling the direct estimation of the conditional score distribution by probabilistic model parameters. Our method addresses both epistemic and aleatoric uncertainty with only a reference model, enabling efficient and powerful MIA. Extensive experiments on five datasets demonstrate the effectiveness and efficiency of BMIA.

* 8 pages, under review

Via

Access Paper or Ask Questions

Integrating Protein Dynamics into Structure-Based Drug Design via Full-Atom Stochastic Flows

Mar 06, 2025

Xiangxin Zhou, Yi Xiao, Haowei Lin, Xinheng He, Jiaqi Guan, Yang Wang, Qiang Liu, Feng Zhou, Liang Wang, Jianzhu Ma

Abstract:The dynamic nature of proteins, influenced by ligand interactions, is essential for comprehending protein function and progressing drug discovery. Traditional structure-based drug design (SBDD) approaches typically target binding sites with rigid structures, limiting their practical application in drug development. While molecular dynamics simulation can theoretically capture all the biologically relevant conformations, the transition rate is dictated by the intrinsic energy barrier between them, making the sampling process computationally expensive. To overcome the aforementioned challenges, we propose to use generative modeling for SBDD considering conformational changes of protein pockets. We curate a dataset of apo and multiple holo states of protein-ligand complexes, simulated by molecular dynamics, and propose a full-atom flow model (and a stochastic version), named DynamicFlow, that learns to transform apo pockets and noisy ligands into holo pockets and corresponding 3D ligand molecules. Our method uncovers promising ligand molecules and corresponding holo conformations of pockets. Additionally, the resultant holo-like states provide superior inputs for traditional SBDD approaches, playing a significant role in practical drug discovery.

* Accepted to ICLR 2025

Via

Access Paper or Ask Questions

Enhancing Gradient-based Discrete Sampling via Parallel Tempering

Feb 26, 2025

Luxu Liang, Yuhang Jia, Feng Zhou

Abstract:While gradient-based discrete samplers are effective in sampling from complex distributions, they are susceptible to getting trapped in local minima, particularly in high-dimensional, multimodal discrete distributions, owing to the discontinuities inherent in these landscapes. To circumvent this issue, we combine parallel tempering, also known as replica exchange, with the discrete Langevin proposal and develop the Parallel Tempering enhanced Discrete Langevin Proposal (PTDLP), which are simulated at a series of temperatures. Significant energy differences prompt sample swaps, which are governed by a Metropolis criterion specifically designed for discrete sampling to ensure detailed balance is maintained. Additionally, we introduce an automatic scheme to determine the optimal temperature schedule and the number of chains, ensuring adaptability across diverse tasks with minimal tuning. Theoretically, we establish that our algorithm converges non-asymptotically to the target energy and exhibits faster mixing compared to a single chain. Empirical results further emphasize the superiority of our method in sampling from complex, multimodal discrete distributions, including synthetic problems, restricted Boltzmann machines, and deep energy-based models.

* 24 pages, 5 figures. arXiv admin note: text overlap with arXiv:2402.17699 by other authors

Via

Access Paper or Ask Questions

An ocean front detection and tracking algorithm

Feb 21, 2025

Yishuo Wang, Feng Zhou

Figure 1 for An ocean front detection and tracking algorithm

Figure 2 for An ocean front detection and tracking algorithm

Figure 3 for An ocean front detection and tracking algorithm

Figure 4 for An ocean front detection and tracking algorithm

Abstract:Ocean front is defined as the interface between different water masses and plays a vital role in the evolution of many physical phenomena. Previous detection methods are based on histogram, Lyapunov exponent, gradient and machine learning. These algorithms, however, introduce discontinuity, inaccuracy, use less information or just approaching traditional results. Moreover, automatic front tracking algrorithm is not open source in preceding studies. This paper foucuses on large-scale ocean fronts and proposes an automatic front detection and tracking algorithm based on Bayesian decision and metric space. In this, front merging, filling and ring deletion are put forward to enhance continuity. The distance between fronts in different days is firstly defined and is well-defined in metric space for functional analysis. These technologies can be migrated to other areas of computer vision such as edge detection and tracking.

Via

Access Paper or Ask Questions

Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Feb 11, 2025

Quyu Kong, Yixuan Zhang, Yang Liu, Panrong Tong, Enqi Liu, Feng Zhou

Figure 1 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 2 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 3 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Figure 4 for Language-TPP: Integrating Temporal Point Processes with Language Models for Event Analysis

Abstract:Temporal Point Processes (TPPs) have been widely used for event sequence modeling, but they often struggle to incorporate rich textual event descriptions effectively. Conversely, while Large Language Models (LLMs) have been shown remarkable capabilities in processing textual data, they lack mechanisms for handling temporal dynamics. To bridge this gap, we introduce Language-TPP, a unified framework that integrates TPPs with LLMs for enhanced event sequence modeling. Language-TPP introduces a novel temporal encoding mechanism that converts continuous time intervals into specialized byte-tokens, enabling seamless integration with standard LLM architectures. This approach allows Language-TPP to achieve state-of-the-art performance across multiple TPP tasks, including event time prediction, type prediction, and intensity estimation, on five datasets. Additionally, we demonstrate that incorporating temporal information significantly improves the quality of generated event descriptions.

Via

Access Paper or Ask Questions