Singapore
Abstract:Evaluating large language models (LLMs) in the biomedical domain requires benchmarks that can distinguish reasoning from pattern matching and remain discriminative as model capabilities improve. Existing biomedical question answering (QA) benchmarks are limited in this respect. Multiple-choice formats can allow models to succeed through answer elimination rather than inference, while widely circulated exam-style datasets are increasingly vulnerable to performance saturation and training data contamination. Multi-hop reasoning, defined as the ability to integrate information across multiple sources to derive an answer, is central to clinically meaningful tasks such as diagnostic support, literature-based discovery, and hypothesis generation, yet remains underrepresented in current biomedical QA benchmarks. MedHopQA is a disease-centered multi-hop reasoning benchmark consisting of 1,000 expert-curated question-answer pairs introduced as a shared task at BioCreative IX. Each question requires synthesis of information across two distinct Wikipedia articles, and answers are provided in an open-ended free-text format. Gold annotations are augmented with ontology-grounded synonym sets from MONDO, NCBI Gene, and NCBI Taxonomy to support both lexical and concept-level evaluation. MedHopQA was constructed through a structured process combining human annotation, triage, iterative verification, and LLM-as-a-judge validation. To reduce leaderboard gaming and contamination risk, the 1,000 scored questions are embedded within a publicly downloadable set of 10,000 questions, with answers withheld, on a CodaBench leaderboard. MedHopQA provides both a benchmark and a reusable framework for constructing future biomedical QA datasets that prioritize compositional reasoning, saturation resistance, and contamination resistance as core design constraints.
Abstract:Pinching antenna (PA) systems provide a new spatial degree of freedom by flexible activation of pinching positions. However, the resulting effective channel strongly depends on the activated pinching positions, rendering conventional coherent transmission generally relies on accurate acquisition of instantaneous channel state information (CSI) and incurring substantial pilot overhead. To address this challenge, we propose a differential spatial modulation (DSM) scheme for PA systems, termed as DSM-PA. Specifically, a differential transmission scheme with an embedded Alamouti coding structure is designed, where information bits are conveyed via phase variations between adjacent symbol blocks. This design enables noncoherent transmission without requiring instantaneous CSI while simultaneously achieving transmit diversity. Moreover, to fully exploit the spatial degrees of freedom of PA systems, a pinching position-based index modulation (IM) rule is developed to enhance spectral efficiency. An asymptotically tight upper bound on the average bit error rate (BER) over quasi-static Rician fading channels is derived using the moment-generating function (MGF) method. The diversity analysis also reveals that the proposed DSM-PA scheme achieves full transmit diversity. Finally, simulation results verify the accuracy of the BER analysis and demonstrate the effectiveness of the proposed DSM-PA scheme.
Abstract:Accurate delay-Doppler channel estimation is critical for next-evolution waveforms (NEWs) to enable reliable signal detection. This paper proposes a robust channel estimation algorithm that integrates Flag sequences optimized via an adaptive accelerated parallel majorization-minimization (AP-MM) algorithm with a proposed channel estimation algorithm. To enable efficient, low-complexity parameter extraction and further overcome the robustness issues of conventional greedy estimation, we introduce two key enhancements, i.e., a candidate selection strategy to mitigate spurious sidelobe peaks, and a global least squares (LS) refinement stage to eliminate error propagation caused by sidelobe masking effects. Numerical results demonstrate that the proposed scheme significantly outperforms traditional existing algorithms, achieving the desired estimation accuracy.
Abstract:The integration of multicarrier modulation and multiple-input-multiple-output (MIMO) is critical for reliable transmission of wireless signals in complex environments, which significantly improve spectrum efficiency. Existing studies have shown that popular orthogonal time frequency space (OTFS) and affine frequency division multiplexing (AFDM) offer significant advantages over orthogonal frequency division multiplexing (OFDM) in uncoded doubly selective channels. However, it remains uncertain whether these benefits extend to coded systems. Meanwhile, the information-theoretic limit analysis of coded MIMO multicarrier systems and the corresponding low-complexity receiver design remain unclear. To overcome these challenges, this paper proposes a multi-slot cross-domain memory approximate message passing (MS-CD-MAMP) receiver as well as develops its information-theoretic (i.e., achievable rate) limit and optimal coding principle for MIMO-multicarrier modulation (e.g., OFDM, OTFS, and AFDM) systems. The proposed MS-CD-MAMP receiver can exploit not only the time domain channel sparsity for low complexity but also the corresponding symbol domain constellation constraints for performance enhancement. Meanwhile, limited by the high-dimensional complex state evolution (SE), a simplified single-input single-output variational SE is proposed to derive the achievable rate of MS-CD-MAMP and the optimal coding principle with the goal of maximizing the achievable rate. Numerical results show that coded MIMO-OFDM/OTFS/AFDM with MS-CD-MAMP achieve the same maximum achievable rate in doubly selective channels, whose finite-length performance with practical optimized low-density parity-check (LDPC) codes is only 0.5 $\sim$ 1.8 dB away from the associated theoretical limit, and has 0.8 $\sim$ 4.4 dB gain over the well-designed point-to-point LDPC codes.
Abstract:The recently proposed multi-chirp waveform, affine frequency division multiplexing (AFDM), is considered as a potential candidate for integrated sensing and communication (ISAC). However, acquiring accurate target sensing parameter information becomes challenging due to fractional delay and Doppler shift occurrence, as well as effects introduced by the coexistence of near-field (NF) and far-field (FF) targets associated with large-scale antenna systems. In this paper, we propose a novel angle-delay-Doppler estimation scheme for AFDM-ISAC system in mixed NF and FF scenarios. Specifically, we model the received ISAC signals as a third-order tensor that admits a low-rank CANDECOMP/PARAFAC (CP) format. By employing the Vandermonde nature of the factor matrix and the spatial smoothing technique, we develop a structured CP decomposition method that guarantees the condition for uniqueness. We further propose a low-complexity estimation scheme to acquire target sensing parameters with fractional values, including angle of arrival/departure (AoA/AoD), delay and Doppler shift accurately. We also derive the Cram\'er-Rao Lower Bound (CRLB) as a benchmark and analyze the complexity of our proposed scheme. Finally, simulation results are provided to demonstrate the effectiveness and superiority of our proposed scheme.
Abstract:The recently proposed multi-chirp waveform, affine frequency division multiplexing (AFDM), is regarded as a prospective candidate for integrated sensing and communication (ISAC) due to its robust performance in high-mobility scenarios and full diversity achievement in doubly dispersive channels. However, the insufficient Doppler resolution caused by limited transmission duration can reduce the accuracy of parameter estimation. In this paper, we propose a new off-grid target parameter estimation scheme to jointly estimate the range and velocity of the targets for AFDM-ISAC system, where the off-grid Doppler components are incorporated to enhance estimation accuracy. Specifically, we form the sensing model as an off-grid sparse signal recovery problem relying on the virtual delay and Doppler grids defined in the discrete affine Fourier (DAF) domain, where the off-grid components are regarded as hyper-parameters for estimation. We also employ the expectation-maximization (EM) technique via a sparse Bayesian learning (SBL) framework to update hyper-parameters iteratively. Simulation results indicate that our proposed off-grid algorithm outperforms existing algorithms in sensing performance and is highly robust to the AFDM-ISAC high-mobility scenario.
Abstract:We present HILGEN, a Hierarchically-Informed Data Generation approach that combines domain knowledge from the Unified Medical Language System (UMLS) with synthetic data generated by large language models (LLMs), specifically GPT-3.5. Our approach leverages UMLS's hierarchical structure to expand training data with related concepts, while incorporating contextual information from LLMs through targeted prompts aimed at automatically generating synthetic examples for sparsely occurring named entities. The performance of the HILGEN approach was evaluated across four biomedical NER datasets (MIMIC III, BC5CDR, NCBI-Disease, and Med-Mentions) using BERT-Large and DANN (Data Augmentation with Nearest Neighbor Classifier) models, applying various data generation strategies, including UMLS, GPT-3.5, and their best ensemble. For the BERT-Large model, incorporating UMLS led to an average F1 score improvement of 40.36%, while using GPT-3.5 resulted in a comparable average increase of 40.52%. The Best-Ensemble approach using BERT-Large achieved the highest improvement, with an average increase of 42.29%. DANN model's F1 score improved by 22.74% on average using the UMLS-only approach. The GPT-3.5-based method resulted in a 21.53% increase, and the Best-Ensemble DANN model showed a more notable improvement, with an average increase of 25.03%. Our proposed HILGEN approach improves NER performance in few-shot settings without requiring additional manually annotated data. Our experiments demonstrate that an effective strategy for optimizing biomedical NER is to combine biomedical knowledge curated in the past, such as the UMLS, and generative LLMs to create synthetic training instances. Our future research will focus on exploring additional innovative synthetic data generation strategies for further improving NER performance.




Abstract:High mobility environment leads to severe Doppler effects and poses serious challenges to the conventional physical layer based on the widely popular orthogonal frequency division multiplexing (OFDM). The recent emergence of orthogonal time frequency space (OTFS) modulation, along with its many related variants, presents a promising solution to overcome such channel Doppler effects. This paper aims to clearly establish the relationships among the various manifestations of OTFS. Among these related modulations, we identify their connections, common features, and distinctions. Building on existing works, this work provides a general overview of various OTFS-related detection schemes and performance comparisons. We first provide an overview of OFDM and filter bank multi-carrier (FBMC) by demonstrating OTFS as a precoded FBMC through the introduction of inverse symplectic finite Fourier transform (ISFFT). We explore the relationship between OTFS and related modulation schemes with similar characteristics. We provide an effective channel model for high-mobility channels and offer a unified detection representation. We provide numerical comparisons of power spectrum density (PSD) and bit error rate (BER) to underscore the benefit of these modulation schemes in high-mobility scenarios. We also evaluate various detection schemes, revealing insights into their efficacies. We discuss opportunities and challenges for OTFS in high mobility, setting the stage for future research and development in this field.




Abstract:Integrated sensing and communication (ISAC) has become an attractive technology for future wireless networks. In this paper, we propose a simultaneous transmission and reflection reconfigurable intelligent surface (STAR-RIS) aided dynamic scatterers tracking scheme for ISAC in high mobility millimeter wave communication systems, where the STAR-RIS is employed to provide communication service for indoor user with the base station (BS) and simultaneously sense and track the interested outdoor dynamic scatterers. Specifically, we resort to an active STAR-RIS to respectively receive and further deal with the impinging signal from its double sides at the same time. Then, we develop a transmission strategy with the activation scheme of the STAR-RIS elements, and construct the signal models within the system. After acquiring the channel parameters related to the BS-RIS channel, the dynamic paths can be identified from all the scattering paths, and the dynamic targets can be classified with respect to their radar cross sections. We further track the outdoor scatterers at STAR-RIS by resorting to the Gaussian mixture-probability hypothesis density filter. With the tracked locations of the outdoor scatterers, a beam prediction strategy for both the precoder of BS and the refraction phase shift vector of STAR-RIS is developed to enhance the communication performance of the indoor user. Besides, a target mismatch detection and path collision prediction mechanism is proposed to reduce the training overhead and improve the transmission performance. Finally, the feasibility and effectiveness of our proposed STAR-RIS aided dynamic scatterers tracking scheme for ISAC are demonstrated and verified via simulation results.


Abstract:Even orthogonal time frequency space (OTFS) has been shown as a promising modulation scheme for high mobility doubly-selective fading channels, its attainability of full diversity order in either time or frequency selective fading channels has not been clarified. By performing pairwise error probability (PEP) analysis, we observe that the original OTFS system can not always guarantee full exploitation of the embedded diversity in either time or frequency selective fading channels. To address this issue and further improve system performance, this work proposes linear precoding solutions based on algebraic number theory for OTFS systems over time and frequency selective fading channels, respectively. The proposed linear precoded OTFS systems can guarantee the maximal diversity and potential coding gains in time/frequency selective fading channels without any transmission rate loss and do not require the channel state information (CSI) at the transmitter. Simulation results are finally provided to illustrate the superiority of our proposed precoded OTFS over both the original unprecoded and the existing phase rotation OTFS systems in time/frequency selective fading channels.