Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhi Zhang

AI-Native 6G Physical Layer with Cross-Module Optimization and Cooperative Control Agents

Jan 07, 2026

Xufei Zheng, Han Xiao, Shi Jin, Zhiqin Wang, Wenqiang Tian, Wendong Liu, Jianfei Cao, Jia Shen, Zhihua Shi, Zhi Zhang(+1 more)

Abstract:In this article, a framework of AI-native cross-module optimized physical layer with cooperative control agents is proposed, which involves optimization across global AI/ML modules of the physical layer with innovative design of multiple enhancement mechanisms and control strategies. Specifically, it achieves simultaneous optimization across global modules of uplink AI/ML-based joint source-channel coding with modulation, and downlink AI/ML-based modulation with precoding and corresponding data detection, reducing traditional inter-module information barriers to facilitate end-to-end optimization toward global objectives. Moreover, multiple enhancement mechanisms are also proposed, including i) an AI/ML-based cross-layer modulation approach with theoretical analysis for downlink transmission that breaks the isolation of inter-layer features to expand the solution space for determining improved constellation, ii) a utility-oriented precoder construction method that shifts the role of the AI/ML-based CSI feedback decoder from recovering the original CSI to directly generating precoding matrices aiming to improve end-to-end performance, and iii) incorporating modulation into AI/ML-based CSI feedback to bypass bit-level bottlenecks that introduce quantization errors, non-differentiable gradients, and limitations in constellation solution spaces. Furthermore, AI/ML based control agents for optimized transmission schemes are proposed that leverage AI/ML to perform model switching according to channel state, thereby enabling integrated control for global throughput optimization. Finally, simulation results demonstrate the superiority of the proposed solutions in terms of BLER and throughput. These extensive simulations employ more practical assumptions that are aligned with the requirements of the 3GPP, which hopefully provides valuable insights for future standardization discussions.

Via

Access Paper or Ask Questions

Virtual Width Networks

Nov 17, 2025

Seed, Baisheng Li, Banggu Wu, Bole Ma, Bowen Xiao, Chaoyi Zhang, Cheng Li, Chengyi Wang, Chengyin Xu, Chi Zhang(+108 more)

Abstract:We introduce Virtual Width Networks (VWN), a framework that delivers the benefits of wider representations without incurring the quadratic cost of increasing the hidden size. VWN decouples representational width from backbone width, expanding the embedding space while keeping backbone compute nearly constant. In our large-scale experiment, an 8-times expansion accelerates optimization by over 2 times for next-token and 3 times for next-2-token prediction. The advantage amplifies over training as both the loss gap grows and the convergence-speedup ratio increases, showing that VWN is not only token-efficient but also increasingly effective with scale. Moreover, we identify an approximately log-linear scaling relation between virtual width and loss reduction, offering an initial empirical basis and motivation for exploring virtual-width scaling as a new dimension of large-model efficiency.

Via

Access Paper or Ask Questions

Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research

Oct 24, 2025

Kuicai Dong, Shurui Huang, Fangda Ye, Wei Han, Zhi Zhang, Dexun Li, Wenjun Li, Qu Yang, Gang Wang, Yichao Wang(+2 more)

Abstract:Deep Research systems have revolutionized how LLMs solve complex questions through iterative reasoning and evidence gathering. However, current systems remain fundamentally constrained to textual web data, overlooking the vast knowledge embedded in multimodal documents Processing such documents demands sophisticated parsing to preserve visual semantics (figures, tables, charts, and equations), intelligent chunking to maintain structural coherence, and adaptive retrieval across modalities, which are capabilities absent in existing systems. In response, we present Doc-Researcher, a unified system that bridges this gap through three integrated components: (i) deep multimodal parsing that preserves layout structure and visual semantics while creating multi-granular representations from chunk to document level, (ii) systematic retrieval architecture supporting text-only, vision-only, and hybrid paradigms with dynamic granularity selection, and (iii) iterative multi-agent workflows that decompose complex queries, progressively accumulate evidence, and synthesize comprehensive answers across documents and modalities. To enable rigorous evaluation, we introduce M4DocBench, the first benchmark for Multi-modal, Multi-hop, Multi-document, and Multi-turn deep research. Featuring 158 expert-annotated questions with complete evidence chains across 304 documents, M4DocBench tests capabilities that existing benchmarks cannot assess. Experiments demonstrate that Doc-Researcher achieves 50.6% accuracy, 3.4xbetter than state-of-the-art baselines, validating that effective document research requires not just better retrieval, but fundamentally deep parsing that preserve multimodal integrity and support iterative research. Our work establishes a new paradigm for conducting deep research on multimodal document collections.

* preprint

Via

Access Paper or Ask Questions

NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Oct 21, 2025

Zhi Zhang, Yixian Shen, Congfeng Cao, Ekaterina Shutova

Figure 1 for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Figure 2 for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Figure 3 for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Figure 4 for NeuroAda: Activating Each Neuron's Potential for Parameter-Efficient Fine-Tuning

Abstract:Existing parameter-efficient fine-tuning (PEFT) methods primarily fall into two categories: addition-based and selective in-situ adaptation. The former, such as LoRA, introduce additional modules to adapt the model to downstream tasks, offering strong memory efficiency. However, their representational capacity is often limited, making them less suitable for fine-grained adaptation. In contrast, the latter directly fine-tunes a carefully chosen subset of the original model parameters, allowing for more precise and effective adaptation, but at the cost of significantly increased memory consumption. To reconcile this trade-off, we propose NeuroAda, a novel PEFT method that enables fine-grained model finetuning while maintaining high memory efficiency. Our approach first identifies important parameters (i.e., connections within the network) as in selective adaptation, and then introduces bypass connections for these selected parameters. During finetuning, only the bypass connections are updated, leaving the original model parameters frozen. Empirical results on 23+ tasks spanning both natural language generation and understanding demonstrate that NeuroAda achieves state-of-the-art performance with as little as $\leq \textbf{0.02}\%$ trainable parameters, while reducing CUDA memory usage by up to 60%. We release our code here: https://github.com/FightingFighting/NeuroAda.git.

* EMNLP 2025

Via

Access Paper or Ask Questions

Truncated Proximal Policy Optimization

Jun 18, 2025

Tiantian Fan, Lingjun Liu, Yu Yue, Jiaze Chen, Chengyi Wang, Qiying Yu, Chi Zhang, Zhiqi Lin, Ruofei Zhu, Yufeng Yuan(+13 more)

Figure 1 for Truncated Proximal Policy Optimization

Figure 2 for Truncated Proximal Policy Optimization

Figure 3 for Truncated Proximal Policy Optimization

Figure 4 for Truncated Proximal Policy Optimization

Abstract:Recently, test-time scaling Large Language Models (LLMs) have demonstrated exceptional reasoning capabilities across scientific and professional tasks by generating long chains-of-thought (CoT). As a crucial component for developing these reasoning models, reinforcement learning (RL), exemplified by Proximal Policy Optimization (PPO) and its variants, allows models to learn through trial and error. However, PPO can be time-consuming due to its inherent on-policy nature, which is further exacerbated by increasing response lengths. In this work, we propose Truncated Proximal Policy Optimization (T-PPO), a novel extension to PPO that improves training efficiency by streamlining policy update and length-restricted response generation. T-PPO mitigates the issue of low hardware utilization, an inherent drawback of fully synchronized long-generation procedures, where resources often sit idle during the waiting periods for complete rollouts. Our contributions are two-folds. First, we propose Extended Generalized Advantage Estimation (EGAE) for advantage estimation derived from incomplete responses while maintaining the integrity of policy learning. Second, we devise a computationally optimized mechanism that allows for the independent optimization of the policy and value models. By selectively filtering prompt and truncated tokens, this mechanism reduces redundant computations and accelerates the training process without sacrificing convergence performance. We demonstrate the effectiveness and efficacy of T-PPO on AIME 2024 with a 32B base model. The experimental results show that T-PPO improves the training efficiency of reasoning LLMs by up to 2.5x and outperforms its existing competitors.

Via

Access Paper or Ask Questions

Single Index Bandits: Generalized Linear Contextual Bandits with Unknown Reward Functions

Jun 15, 2025

Yue Kang, Mingshuo Liu, Bongsoo Yi, Jing Lyu, Zhi Zhang, Doudou Zhou, Yao Li

Abstract:Generalized linear bandits have been extensively studied due to their broad applicability in real-world online decision-making problems. However, these methods typically assume that the expected reward function is known to the users, an assumption that is often unrealistic in practice. Misspecification of this link function can lead to the failure of all existing algorithms. In this work, we address this critical limitation by introducing a new problem of generalized linear bandits with unknown reward functions, also known as single index bandits. We first consider the case where the unknown reward function is monotonically increasing, and propose two novel and efficient algorithms, STOR and ESTOR, that achieve decent regrets under standard assumptions. Notably, our ESTOR can obtain the nearly optimal regret bound $\tilde{O}_T(\sqrt{T})$ in terms of the time horizon $T$. We then extend our methods to the high-dimensional sparse setting and show that the same regret rate can be attained with the sparsity index. Next, we introduce GSTOR, an algorithm that is agnostic to general reward functions, and establish regret bounds under a Gaussian design assumption. Finally, we validate the efficiency and effectiveness of our algorithms through experiments on both synthetic and real-world datasets.

Via

Access Paper or Ask Questions

THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Mar 31, 2025

Yujin Huang, Zhi Zhang, Qingchuan Zhao, Xingliang Yuan, Chunyang Chen

Figure 1 for THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Figure 2 for THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Figure 3 for THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Figure 4 for THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models

Abstract:On-device deep learning (DL) has rapidly gained adoption in mobile apps, offering the benefits of offline model inference and user privacy preservation over cloud-based approaches. However, it inevitably stores models on user devices, introducing new vulnerabilities, particularly model-stealing attacks and intellectual property infringement. While system-level protections like Trusted Execution Environments (TEEs) provide a robust solution, practical challenges remain in achieving scalable on-device DL model protection, including complexities in supporting third-party models and limited adoption in current mobile solutions. Advancements in TEE-enabled hardware, such as NVIDIA's GPU-based TEEs, may address these obstacles in the future. Currently, watermarking serves as a common defense against model theft but also faces challenges here as many mobile app developers lack corresponding machine learning expertise and the inherent read-only and inference-only nature of on-device DL models prevents third parties like app stores from implementing existing watermarking techniques in post-deployment models. To protect the intellectual property of on-device DL models, in this paper, we propose THEMIS, an automatic tool that lifts the read-only restriction of on-device DL models by reconstructing their writable counterparts and leverages the untrainable nature of on-device DL models to solve watermark parameters and protect the model owner's intellectual property. Extensive experimental results across various datasets and model structures show the superiority of THEMIS in terms of different metrics. Further, an empirical investigation of 403 real-world DL mobile apps from Google Play is performed with a success rate of 81.14%, showing the practicality of THEMIS.

* To Appear in the 34th USENIX Security Symposium, August 13-15, 2025

Via

Access Paper or Ask Questions

Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Mar 30, 2025

Zhi Zhang, Meng Gai, Sheng Li

Figure 1 for Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Figure 2 for Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Figure 3 for Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Figure 4 for Visual Acuity Consistent Foveated Rendering towards Retinal Resolution

Abstract:Prior foveated rendering methods often suffer from a limitation where the shading load escalates with increasing display resolution, leading to decreased efficiency, particularly when dealing with retinal-level resolutions. To tackle this challenge, we begin with the essence of the human visual system (HVS) perception and present visual acuity-consistent foveated rendering (VaFR), aiming to achieve exceptional rendering performance at retinal-level resolutions. Specifically, we propose a method with a novel log-polar mapping function derived from the human visual acuity model, which accommodates the natural bandwidth of the visual system. This mapping function and its associated shading rate guarantee a consistent output of rendering information, regardless of variations in the display resolution of the VR HMD. Consequently, our VaFR outperforms alternative methods, improving rendering speed while preserving perceptual visual quality, particularly when operating at retinal resolutions. We validate our approach using both the rasterization and ray-casting rendering pipelines. We also validate our approach using different binocular rendering strategies for HMD devices. In diverse testing scenarios, our approach delivers better perceptual visual quality than prior foveated rendering while achieving an impressive speedup of 6.5$\times$-9.29$\times$ for deferred rendering of 3D scenarios and an even more powerful speedup of 10.4$\times$-16.4$\times$ for ray-casting at retinal resolution. Additionally, our approach significantly enhances the rendering performance of binocular 8K path tracing, achieving smooth frame rates.

Via

Access Paper or Ask Questions

Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Feb 18, 2025

Srishti Yadav, Zhi Zhang, Daniel Hershcovich, Ekaterina Shutova

Figure 1 for Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Figure 2 for Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Figure 3 for Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Figure 4 for Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Abstract:Investigating value alignment in Large Language Models (LLMs) based on cultural context has become a critical area of research. However, similar biases have not been extensively explored in large vision-language models (VLMs). As the scale of multimodal models continues to grow, it becomes increasingly important to assess whether images can serve as reliable proxies for culture and how these values are embedded through the integration of both visual and textual data. In this paper, we conduct a thorough evaluation of multimodal model at different scales, focusing on their alignment with cultural values. Our findings reveal that, much like LLMs, VLMs exhibit sensitivity to cultural values, but their performance in aligning with these values is highly context-dependent. While VLMs show potential in improving value understanding through the use of images, this alignment varies significantly across contexts highlighting the complexities and underexplored challenges in the alignment of multimodal models.

* NAACL 2025

Via

Access Paper or Ask Questions

Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Feb 03, 2025

Zhi Zhang, Yan Liu, Mengxia Gao, Yu Yang, Jiannong Cao, Wai Kai Hou, Shirley Li, Sonata Yau, Yun Kwok Wing, Tatia M. C. Lee

Figure 1 for Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Figure 2 for Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Figure 3 for Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Figure 4 for Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Abstract:Psychological resilience, defined as the ability to rebound from adversity, is crucial for mental health. Compared with traditional resilience assessments through self-reported questionnaires, resilience assessments based on neurological data offer more objective results with biological markers, hence significantly enhancing credibility. This paper proposes a novel data-efficient model to address the scarcity of neurological data. We employ Neuro Kolmogorov-Arnold Networks as the structure of the prediction model. In the training stage, a new trait-informed multimodal representation algorithm with a smart chunk technique is proposed to learn the shared latent space with limited data. In the test stage, a new noise-informed inference algorithm is proposed to address the low signal-to-noise ratio of the neurological data. The proposed model not only shows impressive performance on both public datasets and self-constructed datasets but also provides some valuable psychological hypotheses for future research.

Via

Access Paper or Ask Questions