Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Haotian Wu

Diffusion-aided Extreme Video Compression with Lightweight Semantics Guidance

Feb 05, 2026

Maojun Zhang, Haotian Wu, Richeng Jin, Deniz Gunduz, Krystian Mikolajczyk

Abstract:Modern video codecs and learning-based approaches struggle for semantic reconstruction at extremely low bit-rates due to reliance on low-level spatiotemporal redundancies. Generative models, especially diffusion models, offer a new paradigm for video compression by leveraging high-level semantic understanding and powerful visual synthesis. This paper propose a video compression framework that integrates generative priors to drastically reduce bit-rate while maintaining reconstruction fidelity. Specifically, our method compresses high-level semantic representations of the video, then uses a conditional diffusion model to reconstruct frames from these semantics. To further improve compression, we characterize motion information with global camera trajectories and foreground segmentation: background motion is compactly represented by camera pose parameters while foreground dynamics by sparse segmentation masks. This allows for significantly boosts compression efficiency, enabling descent video reconstruction at extremely low bit-rates.

* Accepted by ICASSP 2026

Via

Access Paper or Ask Questions

Tracking the Limits of Knowledge Propagation: How LLMs Fail at Multi-Step Reasoning with Conflicting Knowledge

Jan 21, 2026

Yiyang Feng, Zeming Chen, Haotian Wu, Jiawei Zhou, Antoine Bosselut

Abstract:A common solution for mitigating outdated or incorrect information in Large Language Models (LLMs) is to provide updated facts in-context or through knowledge editing. However, these methods introduce knowledge conflicts when the knowledge update fails to overwrite the model's parametric knowledge, which propagate to faulty reasoning. Current benchmarks for this problem, however, largely focus only on single knowledge updates and fact recall without evaluating how these updates affect downstream reasoning. In this work, we introduce TRACK (Testing Reasoning Amid Conflicting Knowledge), a new benchmark for studying how LLMs propagate new knowledge through multi-step reasoning when it conflicts with the model's initial parametric knowledge. Spanning three reasoning-intensive scenarios (WIKI, CODE, and MATH), TRACK introduces multiple, realistic conflicts to mirror real-world complexity. Our results on TRACK reveal that providing updated facts to models for reasoning can worsen performance compared to providing no updated facts to a model, and that this performance degradation exacerbates as more updated facts are provided. We show this failure stems from both inability to faithfully integrate updated facts, but also flawed reasoning even when knowledge is integrated. TRACK provides a rigorous new benchmark to measure and guide future progress on propagating conflicting knowledge in multi-step reasoning.

* Accepted to EACL 2026 (Main)

Via

Access Paper or Ask Questions

Leveraging Overfitting for Low-Complexity and Modality-Agnostic Joint Source-Channel Coding

Dec 24, 2025

Haotian Wu, Gen Li, Pier Luigi Dragotti, Deniz Gündüz

Abstract:This paper introduces Implicit-JSCC, a novel overfitted joint source-channel coding paradigm that directly optimizes channel symbols and a lightweight neural decoder for each source. This instance-specific strategy eliminates the need for training datasets or pre-trained models, enabling a storage-free, modality-agnostic solution. As a low-complexity alternative, Implicit-JSCC achieves efficient image transmission with around 1000x lower decoding complexity, using as few as 607 model parameters and 641 multiplications per pixel. This overfitted design inherently addresses source generalizability and achieves state-of-the-art results in the high SNR regimes, underscoring its promise for future communication systems, especially streaming scenarios where one-time offline encoding supports multiple online decoding.

Via

Access Paper or Ask Questions

Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Dec 15, 2025

Tingyang Chen, Cong Fu, Jiahua Wu, Haotian Wu, Hua Fan, Xiangyu Ke, Yunjun Gao, Yabo Ni, Anxiang Zeng

Figure 1 for Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Figure 2 for Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Figure 3 for Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Figure 4 for Reveal Hidden Pitfalls and Navigate Next Generation of Vector Similarity Search from Task-Centric Views

Abstract:Vector Similarity Search (VSS) in high-dimensional spaces is rapidly emerging as core functionality in next-generation database systems for numerous data-intensive services -- from embedding lookups in large language models (LLMs), to semantic information retrieval and recommendation engines. Current benchmarks, however, evaluate VSS primarily on the recall-latency trade-off against a ground truth defined solely by distance metrics, neglecting how retrieval quality ultimately impacts downstream tasks. This disconnect can mislead both academic research and industrial practice. We present Iceberg, a holistic benchmark suite for end-to-end evaluation of VSS methods in realistic application contexts. From a task-centric view, Iceberg uncovers the Information Loss Funnel, which identifies three principal sources of end-to-end performance degradation: (1) Embedding Loss during feature extraction; (2) Metric Misuse, where distances poorly reflect task relevance; (3) Data Distribution Sensitivity, highlighting index robustness across skews and modalities. For a more comprehensive assessment, Iceberg spans eight diverse datasets across key domains such as image classification, face recognition, text retrieval, and recommendation systems. Each dataset, ranging from 1M to 100M vectors, includes rich, task-specific labels and evaluation metrics, enabling assessment of retrieval algorithms within the full application pipeline rather than in isolation. Iceberg benchmarks 13 state-of-the-art VSS methods and re-ranks them based on application-level metrics, revealing substantial deviations from traditional rankings derived purely from recall-latency evaluations. Building on these insights, we define a set of task-centric meta-features and derive an interpretable decision tree to guide practitioners in selecting and tuning VSS methods for their specific workloads.

* SIGMOD2026

Via

Access Paper or Ask Questions

EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Nov 13, 2025

Junquan Huang, Haotian Wu, Yubo Gao, Yibo Yan, Junyan Zhang, Yonghua Hei, Song Dai, Jie Zhang, Puay Siew Tan, Xuming Hu

Figure 1 for EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Figure 2 for EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Figure 3 for EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Figure 4 for EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models

Abstract:Large language models (LLMs) with Chain-of-Thought (CoT) prompting achieve strong reasoning but often produce unnecessarily long explanations, increasing cost and sometimes reducing accuracy. Fair comparison of efficiency-oriented approaches is hindered by fragmented evaluation practices. We introduce EffiReason-Bench, a unified benchmark for rigorous cross-paradigm evaluation of efficient reasoning methods across three categories: Reasoning Blueprints, Dynamic Execution, and Post-hoc Refinement. To enable step-by-step evaluation, we construct verified CoT annotations for CommonsenseQA and LogiQA via a pipeline that enforces standardized reasoning structures, comprehensive option-wise analysis, and human verification. We evaluate 7 methods across 6 open-source LLMs (1B-70B) on 4 datasets spanning mathematics, commonsense, and logic, and propose the E3-Score, a principled metric inspired by economic trade-off modeling that provides smooth, stable evaluation without discontinuities or heavy reliance on heuristics. Experiments show that no single method universally dominates; optimal strategies depend on backbone scale, task complexity, and architecture.

* 11 pages, 4 figures, 4 tables. Appendix included

Via

Access Paper or Ask Questions

LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression

Jul 01, 2025

Haotian Wu, Gongpu Chen, Pier Luigi Dragotti, Deniz Gündüz

Abstract:We introduce and validate the lottery codec hypothesis, which states that untrained subnetworks within randomly initialized networks can serve as synthesis networks for overfitted image compression, achieving rate-distortion (RD) performance comparable to trained networks. This hypothesis leads to a new paradigm for image compression by encoding image statistics into the network substructure. Building on this hypothesis, we propose LotteryCodec, which overfits a binary mask to an individual image, leveraging an over-parameterized and randomly initialized network shared by the encoder and the decoder. To address over-parameterization challenges and streamline subnetwork search, we develop a rewind modulation mechanism that improves the RD performance. LotteryCodec outperforms VTM and sets a new state-of-the-art in single-image compression. LotteryCodec also enables adaptive decoding complexity through adjustable mask ratios, offering flexible compression solutions for diverse device constraints and application requirements.

* International Conference on Machine Learning (2025)

Via

Access Paper or Ask Questions

Realizing Fully-Connected Layers Over the Air via Reconfigurable Intelligent Surfaces

May 02, 2025

Meng Hua, Chenghong Bian, Haotian Wu, Deniz Gündüz

Abstract:By leveraging the waveform superposition property of the multiple access channel, over-the-air computation (AirComp) enables the execution of digital computations through analog means in the wireless domain, leading to faster processing and reduced latency. In this paper, we propose a novel approach to implement a neural network (NN) consisting of digital fully connected (FC) layers using physically reconfigurable hardware. Specifically, we investigate reconfigurable intelligent surfaces (RISs)-assisted multiple-input multiple-output (MIMO) systems to emulate the functionality of a NN for over-the-air inference. In this setup, both the RIS and the transceiver are jointly configured to manipulate the ambient wireless propagation environment, effectively reproducing the adjustable weights of a digital FC layer. We refer to this new computational paradigm as \textit{AirFC}. We formulate an imitation error minimization problem between the effective channel created by RIS and a target FC layer by jointly optimizing over-the-air parameters. To solve this non-convex optimization problem, an extremely low-complexity alternating optimization algorithm is proposed, where semi-closed-form/closed-form solutions for all optimization variables are derived. Simulation results show that the RIS-assisted MIMO-based AirFC can achieve competitive classification accuracy. Furthermore, it is also shown that a multi-RIS configuration significantly outperforms a single-RIS setup, particularly in line-of-sight (LoS)-dominated channels.

Via

Access Paper or Ask Questions

Task-Agnostic Semantic Communication with Multimodal Foundation Models

Feb 25, 2025

Jiangjing Hu, Haotian Wu, Wenjing Zhang, Fengyu Wang, Wenjun Xu, Hui Gao, Deniz Gündüz

Abstract:Most existing semantic communication (SemCom) systems use deep joint source-channel coding (DeepJSCC) to encode task-specific semantics in a goal-oriented manner. However, their reliance on predefined tasks and datasets significantly limits their flexibility and generalizability in practical deployments. Multi-modal foundation models provide a promising solution by generating universal semantic tokens. Inspired by this, we introduce SemCLIP, a task-agnostic SemCom framework leveraging the contrastive language-image pre-training (CLIP) model. By transmitting CLIP-generated image tokens instead of raw images, SemCLIP enables efficient semantic communications under low bandwidth and challenging channel conditions, facilitating diverse downstream tasks and zero-shot applications. Specifically, we propose a DeepJSCC scheme for efficient CLIP tokens encoding. To mitigate potential degradation caused by compression and channel noise, a multi-modal transmission-aware prompt learning mechanism is designed at the receiver, which adapts prompts based on transmission quality, enhancing system robustness and channel adaptability. Simulation results demonstrate that SemCLIP outperforms the baselines, achieving a $41\%$ improvement in zero-shot accuracy at a low signal-to-noise ratio. Meanwhile, SemCLIP reduces bandwidth usage by more than $50$-fold compared to different image transmission methods, demonstrating the potential of foundation models towards a generalized, task-agnostic SemCom solution.

Via

Access Paper or Ask Questions

Multiview Point Cloud Registration Based on Minimum Potential Energy for Free-Form Blade Measurement

Feb 11, 2025

Zijie Wu, Yaonan Wang, Yang Mo, Qing Zhu, He Xie, Haotian Wu, Mingtao Feng, Ajmal Mian

Abstract:Point cloud registration is an essential step for free-form blade reconstruction in industrial measurement. Nonetheless, measuring defects of the 3D acquisition system unavoidably result in noisy and incomplete point cloud data, which renders efficient and accurate registration challenging. In this paper, we propose a novel global registration method that is based on the minimum potential energy (MPE) method to address these problems. The basic strategy is that the objective function is defined as the minimum potential energy optimization function of the physical registration system. The function distributes more weight to the majority of inlier points and less weight to the noise and outliers, which essentially reduces the influence of perturbations in the mathematical formulation. We decompose the solution into a globally optimal approximation procedure and a fine registration process with the trimmed iterative closest point algorithm to boost convergence. The approximation procedure consists of two main steps. First, according to the construction of the force traction operator, we can simply compute the position of the potential energy minimum. Second, to find the MPE point, we propose a new theory that employs two flags to observe the status of the registration procedure. We demonstrate the performance of the proposed algorithm on four types of blades. The proposed method outperforms the other global methods in terms of both accuracy and noise resistance.

Via

Access Paper or Ask Questions

Semantics-Guided Diffusion for Deep Joint Source-Channel Coding in Wireless Image Transmission

Jan 02, 2025

Maojun Zhang, Haotian Wu, Guangxu Zhu, Richeng Jin, Xiaoming Chen, Deniz Gündüz

Abstract:Joint source-channel coding (JSCC) offers a promising avenue for enhancing transmission efficiency by jointly incorporating source and channel statistics into the system design. A key advancement in this area is the deep joint source and channel coding (DeepJSCC) technique that designs a direct mapping of input signals to channel symbols parameterized by a neural network, which can be trained for arbitrary channel models and semantic quality metrics. This paper advances the DeepJSCC framework toward a semantics-aligned, high-fidelity transmission approach, called semantics-guided diffusion DeepJSCC (SGD-JSCC). Existing schemes that integrate diffusion models (DMs) with JSCC face challenges in transforming random generation into accurate reconstruction and adapting to varying channel conditions. SGD-JSCC incorporates two key innovations: (1) utilizing some inherent information that contributes to the semantics of an image, such as text description or edge map, to guide the diffusion denoising process; and (2) enabling seamless adaptability to varying channel conditions with the help of a semantics-guided DM for channel denoising. The DM is guided by diverse semantic information and integrates seamlessly with DeepJSCC. In a slow fading channel, SGD-JSCC dynamically adapts to the instantaneous signal-to-noise ratio (SNR) directly estimated from the channel output, thereby eliminating the need for additional pilot transmissions for channel estimation. In a fast fading channel, we introduce a training-free denoising strategy, allowing SGD-JSCC to effectively adjust to fluctuations in channel gains. Numerical results demonstrate that, guided by semantic information and leveraging the powerful DM, our method outperforms existing DeepJSCC schemes, delivering satisfactory reconstruction performance even at extremely poor channel conditions.

* 13 pages, submitted to IEEE for possible publication

Via

Access Paper or Ask Questions