Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Information Extraction

What is Information Extraction? Information extraction is the process of automatically extracting structured information from unstructured text data.

A Unified Anti-Jamming Design in Complex Environments Based on Cross-Modal Fusion and Intelligent Decision-Making

Jun 09, 2025

Huake Wang, Xudong Han, Bairui Cai, Guisheng Liao, Yinghui Quan

Abstract:With the rapid development of radar jamming systems, especially digital radio frequency memory (DRFM), the electromagnetic environment has become increasingly complex. In recent years, most existing studies have focused solely on either jamming recognition or anti-jamming strategy design. In this paper, we propose a unified framework that integrates interference recognition with intelligent anti-jamming strategy selection. Specifically, time-frequency (TF) features of radar echoes are first extracted using both Short-Time Fourier Transform (STFT) and Smoothed Pseudo Wigner-Ville Distribution (SPWVD). A feature fusion method is then designed to effectively combine these two types of time-frequency representations. The fused TF features are further combined with time-domain features of the radar echoes through a cross-modal fusion module based on an attention mechanism. Finally, the recognition results, together with information obtained from the passive radar, are fed into a Deep Q-Network (DQN)-based intelligent anti-jamming strategy network to select jamming suppression waveforms. The key jamming parameters obtained by the passive radar provide essential information for intelligent decision-making, enabling the generation of more effective strategies tailored to specific jamming types. The proposed method demonstrates improvements in both jamming type recognition accuracy and the stability of anti-jamming strategy selection under complex environments. Experimental results show that our method achieves superior performance compared to Support Vector Machines (SVM), VGG-16, and 2D-CNN methods, with respective improvements of 1.41%, 2.5%, and 14.51% in overall accuracy. Moreover, in comparison with the SARSA algorithm, the designed algorithm achieves faster reward convergence and more stable strategy generation.

Via

Access Paper or Ask Questions

Defurnishing with X-Ray Vision: Joint Removal of Furniture from Panoramas and Mesh

Jun 06, 2025

Alan Dolhasz, Chen Ma, Dave Gausebeck, Kevin Chen, Gregor Miller, Lucas Hayne, Gunnar Hovden, Azwad Sabik, Olaf Brandt, Mira Slavcheva

Abstract:We present a pipeline for generating defurnished replicas of indoor spaces represented as textured meshes and corresponding multi-view panoramic images. To achieve this, we first segment and remove furniture from the mesh representation, extend planes, and fill holes, obtaining a simplified defurnished mesh (SDM). This SDM acts as an ``X-ray'' of the scene's underlying structure, guiding the defurnishing process. We extract Canny edges from depth and normal images rendered from the SDM. We then use these as a guide to remove the furniture from panorama images via ControlNet inpainting. This control signal ensures the availability of global geometric information that may be hidden from a particular panoramic view by the furniture being removed. The inpainted panoramas are used to texture the mesh. We show that our approach produces higher quality assets than methods that rely on neural radiance fields, which tend to produce blurry low-resolution images, or RGB-D inpainting, which is highly susceptible to hallucinations.

* Paper website: https://matterport.github.io/defurnishing-with-x-ray-vision/

Via

Access Paper or Ask Questions

MAGNet: A Multi-Scale Attention-Guided Graph Fusion Network for DRC Violation Detection

Jun 08, 2025

Weihan Lu, Hong Cai Chen

Abstract:Design rule checking (DRC) is of great significance for cost reduction and design efficiency improvement in integrated circuit (IC) designs. Machine-learning-based DRC has become an important approach in computer-aided design (CAD). In this paper, we propose MAGNet, a hybrid deep learning model that integrates an improved U-Net with a graph neural network for DRC violation prediction. The U-Net backbone is enhanced with a Dynamic Attention Module (DAM) and a Multi-Scale Convolution Module (MSCM) to strengthen its capability in extracting fine-grained and multi-scale spatial features. In parallel, we construct a pixel-aligned graph structure based on chip layout tiles, and apply a specialized GNN to model the topological relationships among pins. During graph construction, a graph-to-grid mapping is generated to align GNN features with the layout image. In addition, a label amplification strategy is adopted during training to enhance the model's sensitivity to sparse violation patterns. Overall, MAGNet effectively combines spatial, semantic, and structural information, achieving improved prediction accuracy and reduced false positive rates in DRC hotspot detection. Subsequently, through incremental training, we achieve a more sensitive discrimination ability for hotspots. The results demonstrate that, in comparison with ibUnet, RouteNet, and J-Net, MAGnet significantly outperforms these models, achieving substantial improvements in overall performance.

* 9 pages, 12 figures, 2 tables

Via

Access Paper or Ask Questions

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Jun 06, 2025

Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri(+2 more)

Abstract:Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

Via

Access Paper or Ask Questions

Multi-Modal Large Models Based Beam Prediction: An Example Empowered by DeepSeek

Jun 06, 2025

Yizhu Zhao, Li Yu, Lianzheng Shi, Jianhua Zhang, Guangyi Liu

Abstract:Beam prediction is an effective approach to reduce training overhead in massive multiple-input multiple-output (MIMO) systems. However, existing beam prediction models still exhibit limited generalization ability in diverse scenarios, which remains a critical challenge. In this paper, we propose MLM-BP, a beam prediction framework based on the multi-modal large model released by DeepSeek, with full consideration of multi-modal environmental information. Specifically, the distribution of scatterers that impact the optimal beam is captured by the sensing devices. Then positions are tokenized to generate text-based representations, and multi-view images are processed by an image encoder, which is fine-tuned with low-rank adaptation (LoRA), to extract environmental embeddings. Finally, these embeddings are fed into the large model, and an output projection module is designed to determine the optimal beam index. Simulation results show that MLM-BP achieves 98.1% Top-1 accuracy on the simulation dataset. Additionally, it demonstrates few-shot generalization on a real-world dataset, achieving 72.7% Top-1 accuracy and 92.4% Top-3 accuracy with only 30% of the dataset, outperforming the existing small models by over 15%.

Via

Access Paper or Ask Questions

Physics Informed Capsule Enhanced Variational AutoEncoder for Underwater Image Enhancement

Jun 05, 2025

Niki Martinel, Rita Pucci

Abstract:We present a novel dual-stream architecture that achieves state-of-the-art underwater image enhancement by explicitly integrating the Jaffe-McGlamery physical model with capsule clustering-based feature representation learning. Our method simultaneously estimates transmission maps and spatially-varying background light through a dedicated physics estimator while extracting entity-level features via capsule clustering in a parallel stream. This physics-guided approach enables parameter-free enhancement that respects underwater formation constraints while preserving semantic structures and fine-grained details. Our approach also features a novel optimization objective ensuring both physical adherence and perceptual quality across multiple spatial frequencies. To validate our approach, we conducted extensive experiments across six challenging benchmarks. Results demonstrate consistent improvements of $+0.5$dB PSNR over the best existing methods while requiring only one-third of their computational complexity (FLOPs), or alternatively, more than $+1$dB PSNR improvement when compared to methods with similar computational budgets. Code and data \textit{will} be available at https://github.com/iN1k1/.

Via

Access Paper or Ask Questions

TV-LiVE: Training-Free, Text-Guided Video Editing via Layer Informed Vitality Exploitation

Jun 08, 2025

Min-Jung Kim, Dongjin Kim, Seokju Yun, Jaegul Choo

Abstract:Video editing has garnered increasing attention alongside the rapid progress of diffusion-based video generation models. As part of these advancements, there is a growing demand for more accessible and controllable forms of video editing, such as prompt-based editing. Previous studies have primarily focused on tasks such as style transfer, background replacement, object substitution, and attribute modification, while maintaining the content structure of the source video. However, more complex tasks, including the addition of novel objects and nonrigid transformations, remain relatively unexplored. In this paper, we present TV-LiVE, a Training-free and text-guided Video editing framework via Layerinformed Vitality Exploitation. We empirically identify vital layers within the video generation model that significantly influence the quality of generated outputs. Notably, these layers are closely associated with Rotary Position Embeddings (RoPE). Based on this observation, our method enables both object addition and non-rigid video editing by selectively injecting key and value features from the source model into the corresponding layers of the target model guided by the layer vitality. For object addition, we further identify prominent layers to extract the mask regions corresponding to the newly added target prompt. We found that the extracted masks from the prominent layers faithfully indicate the region to be edited. Experimental results demonstrate that TV-LiVE outperforms existing approaches for both object addition and non-rigid video editing. Project Page: https://emjay73.github.io/TV_LiVE/

Via

Access Paper or Ask Questions

Applying XAI based unsupervised knowledge discovering for Operation modes in a WWTP. A real case: AQUAVALL WWTP

Jun 06, 2025

Alicia Beneyto-Rodriguez, Gregorio I. Sainz-Palmero, Marta Galende-Hernández, María J. Fuente, José M. Cuenca

Abstract:Water reuse is a key point when fresh water is a commodity in ever greater demand, but which is also becoming ever more available. Furthermore, the return of clean water to its natural environment is also mandatory. Therefore, wastewater treatment plants (WWTPs) are essential in any policy focused on these serious challenges. WWTPs are complex facilities which need to operate at their best to achieve their goals. Nowadays, they are largely monitored, generating large databases of historical data concerning their functioning over time. All this implies a large amount of embedded information which is not usually easy for plant managers to assimilate, correlate and understand; in other words, for them to know the global operation of the plant at any given time. At this point, the intelligent and Machine Learning (ML) approaches can give support for that need, managing all the data and translating them into manageable, interpretable and explainable knowledge about how the WWTP plant is operating at a glance. Here, an eXplainable Artificial Intelligence (XAI) based methodology is proposed and tested for a real WWTP, in order to extract explainable service knowledge concerning the operation modes of the WWTP managed by AQUAVALL, which is the public service in charge of the integral water cycle in the City Council of Valladolid (Castilla y Le\'on, Spain). By applying well-known approaches of XAI and ML focused on the challenge of WWTP, it has been possible to summarize a large number of historical databases through a few explained operation modes of the plant in a low-dimensional data space, showing the variables and facility units involved in each case.

Via

Access Paper or Ask Questions

Perceptual Decoupling for Scalable Multi-modal Reasoning via Reward-Optimized Captioning

Jun 05, 2025

Yunhao Gou, Kai Chen, Zhili Liu, Lanqing Hong, Xin Jin, Zhenguo Li, James T. Kwok, Yu Zhang

Abstract:Recent advances in slow-thinking language models (e.g., OpenAI-o1 and DeepSeek-R1) have demonstrated remarkable abilities in complex reasoning tasks by emulating human-like reflective cognition. However, extending such capabilities to multi-modal large language models (MLLMs) remains challenging due to the high cost of retraining vision-language alignments when upgrading the underlying reasoner LLMs. A straightforward solution is to decouple perception from reasoning, i.e., converting visual inputs into language representations (e.g., captions) that are then passed to a powerful text-only reasoner. However, this decoupling introduces a critical challenge: the visual extractor must generate descriptions that are both faithful to the image and informative enough to support accurate downstream reasoning. To address this, we propose Reasoning-Aligned Perceptual Decoupling via Caption Reward Optimization (RACRO) - a reasoning-guided reinforcement learning strategy that aligns the extractor's captioning behavior with the reasoning objective. By closing the perception-reasoning loop via reward-based optimization, RACRO significantly enhances visual grounding and extracts reasoning-optimized representations. Experiments on multi-modal math and science benchmarks show that the proposed RACRO method achieves state-of-the-art average performance while enabling superior scalability and plug-and-play adaptation to more advanced reasoning LLMs without the necessity for costly multi-modal re-alignment.

Via

Access Paper or Ask Questions

DualX-VSR: Dual Axial Spatial$\times$Temporal Transformer for Real-World Video Super-Resolution without Motion Compensation

Jun 05, 2025

Shuo Cao, Yihao Liu, Xiaohui Li. Yuanting Gao. Yu Zhou, Chao Dong

Abstract:Transformer-based models like ViViT and TimeSformer have advanced video understanding by effectively modeling spatiotemporal dependencies. Recent video generation models, such as Sora and Vidu, further highlight the power of transformers in long-range feature extraction and holistic spatiotemporal modeling. However, directly applying these models to real-world video super-resolution (VSR) is challenging, as VSR demands pixel-level precision, which can be compromised by tokenization and sequential attention mechanisms. While recent transformer-based VSR models attempt to address these issues using smaller patches and local attention, they still face limitations such as restricted receptive fields and dependence on optical flow-based alignment, which can introduce inaccuracies in real-world settings. To overcome these issues, we propose Dual Axial Spatial$\times$Temporal Transformer for Real-World Video Super-Resolution (DualX-VSR), which introduces a novel dual axial spatial$\times$temporal attention mechanism that integrates spatial and temporal information along orthogonal directions. DualX-VSR eliminates the need for motion compensation, offering a simplified structure that provides a cohesive representation of spatiotemporal information. As a result, DualX-VSR achieves high fidelity and superior performance in real-world VSR task.

* 15 pages, 9 figures

Via

Access Paper or Ask Questions

Topic:Information Extraction

Papers and Code