Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Liang

EMRModel: A Large Language Model for Extracting Medical Consultation Dialogues into Structured Medical Records

Apr 23, 2025

Shuguang Zhao, Qiangzhong Feng, Zhiyang He, Peipei Sun, Yingying Wang, Xiaodong Tao, Xiaoliang Lu, Mei Cheng, Xinyue Wu, Yanyan Wang(+1 more)

Abstract:Medical consultation dialogues contain critical clinical information, yet their unstructured nature hinders effective utilization in diagnosis and treatment. Traditional methods, relying on rule-based or shallow machine learning techniques, struggle to capture deep and implicit semantics. Recently, large pre-trained language models and Low-Rank Adaptation (LoRA), a lightweight fine-tuning method, have shown promise for structured information extraction. We propose EMRModel, a novel approach that integrates LoRA-based fine-tuning with code-style prompt design, aiming to efficiently convert medical consultation dialogues into structured electronic medical records (EMRs). Additionally, we construct a high-quality, realistically grounded dataset of medical consultation dialogues with detailed annotations. Furthermore, we introduce a fine-grained evaluation benchmark for medical consultation information extraction and provide a systematic evaluation methodology, advancing the optimization of medical natural language processing (NLP) models. Experimental results show EMRModel achieves an F1 score of 88.1%, improving by49.5% over standard pre-trained models. Compared to traditional LoRA fine-tuning methods, our model shows superior performance, highlighting its effectiveness in structured medical record extraction tasks.

Via

Access Paper or Ask Questions

Improved AFSA-Based Beam Training Without CSI for RIS-Assisted ISAC Systems

Apr 10, 2025

Yunxiang Shi, Lixin Li, Wensheng Lin, Wei Liang, Zhu Han

Abstract:In this paper, we consider transmit beamforming and reflection patterns design in reconfigurable intelligent surface (RIS)-assisted integrated sensing and communication (ISAC) systems, where the dual-function base station (DFBS) lacks channel state information (CSI). To address the high overhead of cascaded channel estimation, we propose an improved artificial fish swarm algorithm (AFSA) combined with a feedback-based joint active and passive beam training scheme. In this approach, we consider the interference caused by multipath user echo signals on target detection and propose a beamforming design method that balances both communication and sensing performance. Numerical simulations show that the proposed AFSA outperforms other optimization algorithms, particularly in its robustness against echo interference under different communication signal-to-noise ratio (SNR) constraints.

Via

Access Paper or Ask Questions

Learning to Plan with Personalized Preferences

Feb 02, 2025

Manjie Xu, Xinyi Yang, Wei Liang, Chi Zhang, Yixin Zhu

Figure 1 for Learning to Plan with Personalized Preferences

Figure 2 for Learning to Plan with Personalized Preferences

Figure 3 for Learning to Plan with Personalized Preferences

Figure 4 for Learning to Plan with Personalized Preferences

Abstract:Effective integration of AI agents into daily life requires them to understand and adapt to individual human preferences, particularly in collaborative roles. Although recent studies on embodied intelligence have advanced significantly, they typically adopt generalized approaches that overlook personal preferences in planning. We address this limitation by developing agents that not only learn preferences from few demonstrations but also learn to adapt their planning strategies based on these preferences. Our research leverages the observation that preferences, though implicitly expressed through minimal demonstrations, can generalize across diverse planning scenarios. To systematically evaluate this hypothesis, we introduce Preference-based Planning (PbP) benchmark, an embodied benchmark featuring hundreds of diverse preferences spanning from atomic actions to complex sequences. Our evaluation of SOTA methods reveals that while symbol-based approaches show promise in scalability, significant challenges remain in learning to generate and execute plans that satisfy personalized preferences. We further demonstrate that incorporating learned preferences as intermediate representations in planning significantly improves the agent's ability to construct personalized plans. These findings establish preferences as a valuable abstraction layer for adaptive planning, opening new directions for research in preference-guided plan generation and execution.

Via

Access Paper or Ask Questions

IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Jan 11, 2025

Bin Feng, Meng Zheng, Wei Liang, Lei Zhang

Figure 1 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Figure 2 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Figure 3 for IPP-Net: A Generalizable Deep Neural Network Model for Indoor Pathloss Radio Map Prediction

Abstract:In this paper, we propose a generalizable deep neural network model for indoor pathloss radio map prediction (termed as IPP-Net). IPP-Net is based on a UNet architecture and learned from both large-scale ray tracing simulation data and a modified 3GPP indoor hotspot model. The performance of IPP-Net is evaluated in the First Indoor Pathloss Radio Map Prediction Challenge in ICASSP 2025. The evaluation results show that IPP-Net achieves a weighted root mean square error of 9.501 dB on three competition tasks and obtains the second overall ranking.

* 2 pages, 1 figure, Accepted to ICASSP 2025

Via

Access Paper or Ask Questions

FloNa: Floor Plan Guided Embodied Visual Navigation

Dec 24, 2024

Jiaxin Li, Weiqi Huang, Zan Wang, Wei Liang, Huijun Di, Feng Liu

Abstract:Humans naturally rely on floor plans to navigate in unfamiliar environments, as they are readily available, reliable, and provide rich geometrical guidance. However, existing visual navigation settings overlook this valuable prior knowledge, leading to limited efficiency and accuracy. To eliminate this gap, we introduce a novel navigation task: Floor Plan Visual Navigation (FloNa), the first attempt to incorporate floor plan into embodied visual navigation. While the floor plan offers significant advantages, two key challenges emerge: (1) handling the spatial inconsistency between the floor plan and the actual scene layout for collision-free navigation, and (2) aligning observed images with the floor plan sketch despite their distinct modalities. To address these challenges, we propose FloDiff, a novel diffusion policy framework incorporating a localization module to facilitate alignment between the current observation and the floor plan. We further collect $20k$ navigation episodes across $117$ scenes in the iGibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge. Project website: https://gauleejx.github.io/flona/.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Security Enhancement of Quantum Communication in Space-Air-Ground Integrated Networks

Oct 22, 2024

Yixiao Zhang, Wei Liang, Lixin Li, Wensheng Lin

Abstract:This paper investigates a transmission scheme for enhancing quantum communication security, aimed at improving the security of space-air-ground integrated networks (SAGIN). Quantum teleportation achieves the transmission of quantum states through quantum channels. In simple terms, an unknown quantum state at one location can be reconstructed on a particle at another location. By combining classical Turbo coding with quantum Shor error-correcting codes, we propose a practical solution that ensures secure information transmission even in the presence of errors in both classical and quantum channels. To provide absolute security under SAGIN, we add a quantum secure direct communication (QSDC) protocol to the current system. Specifically, by accounting for the practical scenario of eavesdropping in quantum channels, the QSDC protocol utilizes virtual entangled pairs to detect the presence of eavesdroppers. Consequently, the overall scheme guarantees both the reliability and absolute security of communication.

Via

Access Paper or Ask Questions

An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Oct 12, 2024

Wei Liang, Yiting Zhang, Ji Zhang, Erica Cochran Hameen

Figure 1 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 2 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 3 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Figure 4 for An Expeditious Spatial Mean Radiant Temperature Mapping Framework using Visual SLAM and Semantic Segmentation

Abstract:Ensuring thermal comfort is essential for the well-being and productivity of individuals in built environments. Of the various thermal comfort indicators, the mean radiant temperature (MRT) is very challenging to measure. Most common measurement methodologies are time-consuming and not user-friendly. To address this issue, this paper proposes a novel MRT measurement framework that uses visual simultaneous localization and mapping (SLAM) and semantic segmentation techniques. The proposed approach follows the rule of thumb of the traditional MRT calculation method using surface temperature and view factors. However, it employs visual SLAM and creates a 3D thermal point cloud with enriched surface temperature information. The framework then implements Grounded SAM, a new object detection and segmentation tool to extract features with distinct temperature profiles on building surfaces. The detailed segmentation of thermal features not only reduces potential errors in the calculation of the MRT but also provides an efficient reconstruction of the spatial MRT distribution in the indoor environment. We also validate the calculation results with the reference measurement methodology. This data-driven framework offers faster and more efficient MRT measurements and spatial mapping than conventional methods. It can enable the direct engagement of researchers and practitioners in MRT measurements and contribute to research on thermal comfort and radiant cooling and heating systems.

* Accepted by 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshop

Via

Access Paper or Ask Questions

Towards Diverse and Efficient Audio Captioning via Diffusion Models

Sep 14, 2024

Manjie Xu, Chenxing Li, Xinyi Tu, Yong Ren, Ruibo Fu, Wei Liang, Dong Yu

Abstract:We introduce Diffusion-based Audio Captioning (DAC), a non-autoregressive diffusion model tailored for diverse and efficient audio captioning. Although existing captioning models relying on language backbones have achieved remarkable success in various captioning tasks, their insufficient performance in terms of generation speed and diversity impede progress in audio understanding and multimedia applications. Our diffusion-based framework offers unique advantages stemming from its inherent stochasticity and holistic context modeling in captioning. Through rigorous evaluation, we demonstrate that DAC not only achieves SOTA performance levels compared to existing benchmarks in the caption quality, but also significantly outperforms them in terms of generation speed and diversity. The success of DAC illustrates that text generation can also be seamlessly integrated with audio and visual generation tasks using a diffusion backbone, paving the way for a unified, audio-related generative model across different modalities.

* https://sites.google.com/view/diffusion-audio-captioning

Via

Access Paper or Ask Questions

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Sep 13, 2024

Yong Ren, Chenxing Li, Manjie Xu, Wei Liang, Yu Gu, Rilin Chen, Dong Yu

Figure 1 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Figure 2 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Figure 3 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Figure 4 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Abstract:Visual and auditory perception are two crucial ways humans experience the world. Text-to-video generation has made remarkable progress over the past year, but the absence of harmonious audio in generated video limits its broader applications. In this paper, we propose Semantic and Temporal Aligned Video-to-Audio (STA-V2A), an approach that enhances audio generation from videos by extracting both local temporal and global semantic video features and combining these refined video features with text as cross-modal guidance. To address the issue of information redundancy in videos, we propose an onset prediction pretext task for local temporal feature extraction and an attentive pooling module for global semantic feature extraction. To supplement the insufficient semantic information in videos, we propose a Latent Diffusion Model with Text-to-Audio priors initialization and cross-modal guidance. We also introduce Audio-Audio Align, a new metric to assess audio-temporal alignment. Subjective and objective metrics demonstrate that our method surpasses existing Video-to-Audio models in generating audio with better quality, semantic consistency, and temporal alignment. The ablation experiment validated the effectiveness of each module. Audio samples are available at https://y-ren16.github.io/STAV2A.

* Submitted to ICASSP2025

Via

Access Paper or Ask Questions

R2G: Reasoning to Ground in 3D Scenes

Aug 24, 2024

Yixuan Li, Zan Wang, Wei Liang

Abstract:We propose Reasoning to Ground (R2G), a neural symbolic model that grounds the target objects within 3D scenes in a reasoning manner. In contrast to prior works, R2G explicitly models the 3D scene with a semantic concept-based scene graph; recurrently simulates the attention transferring across object entities; thus makes the process of grounding the target objects with the highest probability interpretable. Specifically, we respectively embed multiple object properties within the graph nodes and spatial relations among entities within the edges, utilizing a predefined semantic vocabulary. To guide attention transferring, we employ learning or prompting-based methods to analyze the referential utterance and convert it into reasoning instructions within the same semantic space. In each reasoning round, R2G either (1) merges current attention distribution with the similarity between the instruction and embedded entity properties or (2) shifts the attention across the scene graph based on the similarity between the instruction and embedded spatial relations. The experiments on Sr3D/Nr3D benchmarks show that R2G achieves a comparable result with the prior works while maintaining improved interpretability, breaking a new path for 3D language grounding.

Via

Access Paper or Ask Questions