Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Jiang

Department of Control Science and Engineering, Zhejiang University, China

Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit

Feb 09, 2026

Zhendong Wang, Cihan Ruan, Jingchuan Xiao, Chuqing Shi, Wei Jiang, Wei Wang, Wenjie Liu, Nam Ling

Abstract:We investigate what structure emerges in 3D Gaussian Splatting (3DGS) solutions from standard multi-view optimization. We term these Rendering-Optimal References (RORs) and analyze their statistical properties, revealing stable patterns: mixture-structured scales and bimodal radiance across diverse scenes. To understand what determines these parameters, we apply learnability probes by training predictors to reconstruct RORs from point clouds without rendering supervision. Our analysis uncovers fundamental density-stratification. Dense regions exhibit geometry-correlated parameters amenable to render-free prediction, while sparse regions show systematic failure across architectures. We formalize this through variance decomposition, demonstrating that visibility heterogeneity creates covariance-dominated coupling between geometric and appearance parameters in sparse regions. This reveals the dual character of RORs: geometric primitives where point clouds suffice, and view synthesis primitives where multi-view constraints are essential. We provide density-aware strategies that improve training robustness and discuss architectural implications for systems that adaptively balance feed-forward prediction and rendering-based refinement.

Via

Access Paper or Ask Questions

MOVA: Towards Scalable and Synchronized Video-Audio Generation

Feb 09, 2026

SII-OpenMOSS Team, :, Donghua Yu, Mingshu Chen, Qi Chen, Qi Luo, Qianyi Wu, Qinyuan Cheng, Ruixiao Li, Tianyi Liang(+31 more)

Abstract:Audio is indispensable for real-world video, yet generation models have largely overlooked audio components. Current approaches to producing audio-visual content often rely on cascaded pipelines, which increase cost, accumulate errors, and degrade overall quality. While systems such as Veo 3 and Sora 2 emphasize the value of simultaneous generation, joint multimodal modeling introduces unique challenges in architecture, data, and training. Moreover, the closed-source nature of existing systems limits progress in the field. In this work, we introduce MOVA (MOSS Video and Audio), an open-source model capable of generating high-quality, synchronized audio-visual content, including realistic lip-synced speech, environment-aware sound effects, and content-aligned music. MOVA employs a Mixture-of-Experts (MoE) architecture, with a total of 32B parameters, of which 18B are active during inference. It supports IT2VA (Image-Text to Video-Audio) generation task. By releasing the model weights and code, we aim to advance research and foster a vibrant community of creators. The released codebase features comprehensive support for efficient inference, LoRA fine-tuning, and prompt enhancement.

* Technical report for MOVA (open-source video-audio generation model). 38 pages, 10 figures, 22 tables. Project page: https://mosi.cn/models/mova Code: https://github.com/OpenMOSS/MOVA Models: https://huggingface.co/collections/OpenMOSS-Team/mova. Qinyuan Cheng and Tianyi Liang are project leader. Xie Chen and Xipeng Qiu are corresponding authors

Via

Access Paper or Ask Questions

A Tutorial on 3GPP Rel-19 Channel Modeling for 6G FR3 (7-24 GHz): From Standard Specification to Simulation Implementation

Feb 07, 2026

Pan Tang, Huixin Xu, Jianhua Zhang, Ximan Liu, Enrui Liu, Haiyang Miao, Xiaodong Sun, Wei Jiang, Guangyi Liu

Abstract:The upper-mid band (7-24 GHz), designated as Frequency Range 3 (FR3), has emerged as a definitive ``golden band" for 6G networks, strategically balancing the wide coverage of sub-6 GHz with the high capacity of mmWave. To compensate for the severe path loss inherent to this band, the deployment of Extremely Large Aperture Arrays (ELAA) is indispensable. However, the legacy 3GPP TR 38.901 channel model faces critical validity challenges when applied to 6G FR3, stemming from both the distinct propagation characteristics of this frequency band and the fundamental physical paradigm shift induced by ELAA. In response, 3GPP Release 19 (Rel-19) has validated the model through extensive new measurements and introduced significant enhancements. This tutorial provides a comprehensive guide to the Rel-19 channel model for 6G FR3, bridging the gap between standardization specifications and practical simulation implementation. First, we provide a high-level overview of the fundamental principles of the 3GPP channel modeling framework. Second, we detail the specific enhancements and modifications introduced in Rel-19, including the rationale behind the new Suburban Macro (SMa) scenario, the mathematical modeling of ELAA-driven features such as near-field and spatial non-stationarity, and the recalibration of large-scale parameters. Overall, this tutorial serves as an essential guide for researchers and engineers to master the latest 3GPP channel modeling methodology, laying a solid foundation for the accurate design and performance evaluation of future 6G FR3 networks.

Via

Access Paper or Ask Questions

Efficient Post-Training Pruning of Large Language Models with Statistical Correction

Feb 07, 2026

Peiqi Yu, Jinhao Wang, Xinyi Sui, Nam Ling, Wei Wang, Wei Jiang

Abstract:Post-training pruning is an effective approach for reducing the size and inference cost of large language models (LLMs), but existing methods often face a trade-off between pruning quality and computational efficiency. Heuristic pruning methods are efficient but sensitive to activation outliers, while reconstruction-based approaches improve fidelity at the cost of heavy computation. In this work, we propose a lightweight post-training pruning framework based on first-order statistical properties of model weights and activations. During pruning, channel-wise statistics are used to calibrate magnitude-based importance scores, reducing bias from activation-dominated channels. After pruning, we apply an analytic energy compensation to correct distributional distortions caused by weight removal. Both steps operate without retraining, gradients, or second-order information. Experiments across multiple LLM families, sparsity patterns, and evaluation tasks show that the proposed approach improves pruning performance while maintaining computational cost comparable to heuristic methods. The results suggest that simple statistical corrections can be effective for post-training pruning of LLMs.

* 11 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

SCONE: A Practical, Constraint-Aware Plug-in for Latent Encoding in Learned DNA Storage

Feb 05, 2026

Cihan Ruan, Lebin Zhou, Rongduo Han, Linyi Han, Bingqing Zhao, Chenchen Zhu, Wei Jiang, Wei Wang, Nam Ling

Abstract:DNA storage has matured from concept to practical stage, yet its integration with neural compression pipelines remains inefficient. Early DNA encoders applied redundancy-heavy constraint layers atop raw binary data - workable but primitive. Recent neural codecs compress data into learned latent representations with rich statistical structure, yet still convert these latents to DNA via naive binary-to-quaternary transcoding, discarding the entropy model's optimization. This mismatch undermines compression efficiency and complicates the encoding stack. A plug-in module that collapses latent compression and DNA encoding into a single step. SCONE performs quaternary arithmetic coding directly on the latent space in DNA bases. Its Constraint-Aware Adaptive Coding module dynamically steers the entropy encoder's learned probability distribution to enforce biochemical constraints - Guanine-Cytosine (GC) balance and homopolymer suppression - deterministically during encoding, eliminating post-hoc correction. The design preserves full reversibility and exploits the hyperprior model's learned priors without modification. Experiments show SCONE achieves near-perfect constraint satisfaction with negligible computational overhead (<2% latency), establishing a latent-agnostic interface for end-to-end DNA-compatible learned codecs.

Via

Access Paper or Ask Questions

SVRepair: Structured Visual Reasoning for Automated Program Repair

Feb 05, 2026

Xiaoxuan Tang, Jincheng Wang, Liwei Luo, Jingxuan Xu, Sheng Zhou, Dajun Chen, Wei Jiang, Yong Li

Abstract:Large language models (LLMs) have recently shown strong potential for Automated Program Repair (APR), yet most existing approaches remain unimodal and fail to leverage the rich diagnostic signals contained in visual artifacts such as screenshots and control-flow graphs. In practice, many bug reports convey critical information visually (e.g., layout breakage or missing widgets), but directly using such dense visual inputs often causes context loss and noise, making it difficult for MLLMs to ground visual observations into precise fault localization and executable patches. To bridge this semantic gap, we propose \textbf{SVRepair}, a multimodal APR framework with structured visual representation. SVRepair first fine-tunes a vision-language model, \textbf{Structured Visual Representation (SVR)}, to uniformly transform heterogeneous visual artifacts into a \emph{semantic scene graph} that captures GUI elements and their structural relations (e.g., hierarchy), providing normalized, code-relevant context for downstream repair. Building on the graph, SVRepair drives a coding agent to localize faults and synthesize patches, and further introduces an iterative visual-artifact segmentation strategy that progressively narrows the input to bug-centered regions to suppress irrelevant context and reduce hallucinations. Extensive experiments across multiple benchmarks demonstrate state-of-the-art performance: SVRepair achieves \textbf{36.47\%} accuracy on SWE-Bench M, \textbf{38.02\%} on MMCode, and \textbf{95.12\%} on CodeVision, validating the effectiveness of SVRepair for multimodal program repair.

* 16 pages, 3 figures

Via

Access Paper or Ask Questions

EGSS: Entropy-guided Stepwise Scaling for Reliable Software Engineering

Feb 05, 2026

Chenhui Mao, Yuanting Lei, Zhixiang Wei, Ming Liang, Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang, Yong Li

Abstract:Agentic Test-Time Scaling (TTS) has delivered state-of-the-art (SOTA) performance on complex software engineering tasks such as code generation and bug fixing. However, its practical adoption remains limited due to significant computational overhead, primarily driven by two key challenges: (1) the high cost associated with deploying excessively large ensembles, and (2) the lack of a reliable mechanism for selecting the optimal candidate solution, ultimately constraining the performance gains that can be realized. To address these challenges, we propose Entropy-Guided Stepwise Scaling (EGSS), a novel TTS framework that dynamically balances efficiency and effectiveness through entropy-guided adaptive search and robust test-suite augmentation. Extensive experiments on SWE-Bench-Verified demonstrate that EGSS consistently boosts performance by 5-10% across all evaluated models. Specifically, it increases the resolved ratio of Kimi-K2-Intruct from 63.2% to 72.2%, and GLM-4.6 from 65.8% to 74.6%. Furthermore, when paired with GLM-4.6, EGSS achieves a new state-of-the-art among open-source large language models. In addition to these accuracy improvements, EGSS reduces inference-time token usage by over 28% compared to existing TTS methods, achieving simultaneous gains in both effectiveness and computational efficiency.

Via

Access Paper or Ask Questions

Learning Adaptive Parallel Execution for Efficient Code Localization

Jan 27, 2026

Ke Xu, Siyang Xiao, Ming Liang, Yichen Yu, Zhixiang Wang, Jingxuan Xu, Dajun Chen, Wei Jiang, Yong Li

Abstract:Code localization constitutes a key bottleneck in automated software development pipelines. While concurrent tool execution can enhance discovery speed, current agents demonstrate a 34.9\% redundant invocation rate, which negates parallelism benefits. We propose \textbf{FuseSearch}, reformulating parallel code localization as a \textbf{joint quality-efficiency optimization} task. Through defining \textbf{tool efficiency} -- the ratio of unique information gain to invocation count -- we utilize a two-phase SFT and RL training approach for learning adaptive parallel strategies. Different from fixed-breadth approaches, FuseSearch dynamically modulates search breadth according to task context, evolving from exploration phases to refinement stages. Evaluated on SWE-bench Verified, FuseSearch-4B achieves SOTA-level performance (84.7\% file-level and 56.4\% function-level $F_1$ scores) with 93.6\% speedup, utilizing 67.7\% fewer turns and 68.9\% fewer tokens. Results indicate that efficiency-aware training naturally improves quality through eliminating noisy redundant signals, enabling high-performance cost-effective localization agents.

* 13 pages, 4 figures

Via

Access Paper or Ask Questions

LoD-Structured 3D Gaussian Splatting for Streaming Video Reconstruction

Jan 26, 2026

Xinhui Liu, Can Wang, Lei Liu, Zhenghao Chen, Wei Jiang, Wei Wang, Dong Xu

Abstract:Free-Viewpoint Video (FVV) reconstruction enables photorealistic and interactive 3D scene visualization; however, real-time streaming is often bottlenecked by sparse-view inputs, prohibitive training costs, and bandwidth constraints. While recent 3D Gaussian Splatting (3DGS) has advanced FVV due to its superior rendering speed, Streaming Free-Viewpoint Video (SFVV) introduces additional demands for rapid optimization, high-fidelity reconstruction under sparse constraints, and minimal storage footprints. To bridge this gap, we propose StreamLoD-GS, an LoD-based Gaussian Splatting framework designed specifically for SFVV. Our approach integrates three core innovations: 1) an Anchor- and Octree-based LoD-structured 3DGS with a hierarchical Gaussian dropout technique to ensure efficient and stable optimization while maintaining high-quality rendering; 2) a GMM-based motion partitioning mechanism that separates dynamic and static content, refining dynamic regions while preserving background stability; and 3) a quantized residual refinement framework that significantly reduces storage requirements without compromising visual fidelity. Extensive experiments demonstrate that StreamLoD-GS achieves competitive or state-of-the-art performance in terms of quality, efficiency, and storage.

Via

Access Paper or Ask Questions

A Heterogeneous Massive MIMO Technique for Uniform Service in Cellular Networks

Jan 26, 2026

Wei Jiang, Hans D. Schotten

Abstract:Traditional cellular networks struggle with poor quality of service (QoS) for cell-edge users, while cell-free (CF) systems offer uniform QoS but incur high roll-out costs due to acquiring numerous access point (AP) sites and deploying a large-scale optical fiber network to connect them. This paper proposes a cost-effective heterogeneous massive MIMO architecture that integrates centralized co-located antennas at a cell-center base station with distributed edge APs. By strategically splitting massive antennas between centralized and distributed nodes, the system maintains high user fairness comparable to CF systems but reduces infrastructure costs substantially, by minimizing the required number of AP sites and fronthaul connections. Numerical results demonstrate its superiority in balancing performance and costs compared to cellular and CF systems.

Via

Access Paper or Ask Questions