Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhibo Ren

Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

Jun 11, 2026

Xinxin Li, Huiyao Chen, Meishan Zhang, Yunxin Li, Zulong Chen, Zhibo Ren, Xiaoqing Dong Baotian Hu, Min Zhang

Abstract:Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction requires conversation-level contextual evidence. Existing ASR correction methods often rely on the current hypothesis or concatenate raw dialogue history. In such contexts, sparse correction evidence can be difficult to locate amid redundancy and noise. Addressing these challenges, we propose an ontology memory-augmented ASR correction framework for long text-speech interleaved conversations. The framework organizes preceding interaction history into a dynamically updatable ontology memory, where entities, terminology, surface variants, potential ASR confusions, and semantic relations are stored as retrievable nodes for context-grounded correction. To evaluate this setting, we construct RAMC-Corr, a dataset derived from MAGIC-RAMC for long-range ASR correction with grounded context. Experiments on RAMC-Corr show that our method improves over direct correction in 9 out of 10 paired backbone-setting combinations and encourages more selective and evidence-grounded corrections for context-dependent ASR errors.

Via

Access Paper or Ask Questions

SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Aug 28, 2025

Xinhao Huang, Zhibo Ren, Yipeng Yu, Ying Zhou, Zulong Chen, Zeyi Wen

Figure 1 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 2 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 3 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Figure 4 for SEAL: Structure and Element Aware Learning to Improve Long Structured Document Retrieval

Abstract:In long structured document retrieval, existing methods typically fine-tune pre-trained language models (PLMs) using contrastive learning on datasets lacking explicit structural information. This practice suffers from two critical issues: 1) current methods fail to leverage structural features and element-level semantics effectively, and 2) the lack of datasets containing structural metadata. To bridge these gaps, we propose \our, a novel contrastive learning framework. It leverages structure-aware learning to preserve semantic hierarchies and masked element alignment for fine-grained semantic discrimination. Furthermore, we release \dataset, a long structured document retrieval dataset with rich structural annotations. Extensive experiments on both released and industrial datasets across various modern PLMs, along with online A/B testing, demonstrate consistent performance improvements, boosting NDCG@10 from 73.96\% to 77.84\% on BGE-M3. The resources are available at https://github.com/xinhaoH/SEAL.

* Accepted at EMNLP 2025 Main Conference

Via

Access Paper or Ask Questions

Improving Tropical Cyclone Forecasting With Video Diffusion Models

Jan 27, 2025

Zhibo Ren, Pritthijit Nath, Pancham Shukla

Figure 1 for Improving Tropical Cyclone Forecasting With Video Diffusion Models

Figure 2 for Improving Tropical Cyclone Forecasting With Video Diffusion Models

Figure 3 for Improving Tropical Cyclone Forecasting With Video Diffusion Models

Figure 4 for Improving Tropical Cyclone Forecasting With Video Diffusion Models

Abstract:Tropical cyclone (TC) forecasting is crucial for disaster preparedness and mitigation. While recent deep learning approaches have shown promise, existing methods often treat TC evolution as a series of independent frame-to-frame predictions, limiting their ability to capture long-term dynamics. We present a novel application of video diffusion models for TC forecasting that explicitly models temporal dependencies through additional temporal layers. Our approach enables the model to generate multiple frames simultaneously, better capturing cyclone evolution patterns. We introduce a two-stage training strategy that significantly improves individual-frame quality and performance in low-data regimes. Experimental results show our method outperforms the previous approach of Nath et al. by 19.3% in MAE, 16.2% in PSNR, and 36.1% in SSIM. Most notably, we extend the reliable forecasting horizon from 36 to 50 hours. Through comprehensive evaluation using both traditional metrics and Fr\'echet Video Distance (FVD), we demonstrate that our approach produces more temporally coherent forecasts while maintaining competitive single-frame quality. Code accessible at https://github.com/Ren-creater/forecast-video-diffmodels.

* 7 pages, 7 figures

Via

Access Paper or Ask Questions