Information extraction is the process of automatically extracting structured information from unstructured text data.
With the rapid development of online medical platforms, consumer health questions (CHQs) are inefficient in diagnosis due to redundant information and frequent non-professional terms. The medical question summary (MQS) task aims to transform CHQs into streamlined doctors' frequently asked questions (FAQs), but existing methods still face challenges such as poor identification of question focus and model hallucination. This paper explores the potential of large language models (LLMs) in the MQS task and finds that direct fine-tuning is prone to focus identification bias and generates unfaithful content. To this end, we propose an optimization framework based on core focus guidance. First, a prompt template is designed to drive the LLMs to extract the core focus from the CHQs that is faithful to the original text. Then, a fine-tuning dataset is constructed in combination with the original CHQ-FAQ pairs to improve the ability to identify the focus of the question. Finally, a multi-dimensional quality evaluation and selection mechanism is proposed to comprehensively improve the quality of the summary from multiple dimensions. We conduct comprehensive experiments on two widely-adopted MQS datasets using three established evaluation metrics. The proposed framework achieves state-of-the-art performance across all measures, demonstrating a significant boost in the model's ability to identify critical focus of questions and a notable mitigation of hallucinations. The source codes are freely available at https://github.com/DUT-LiuChao/FocusMed.
This paper studies the problem of extracting common randomness (CR) or secret keys from correlated random sources observed by two legitimate parties, Alice and Bob, through public discussion in the presence of an eavesdropper, Eve. We propose a practical two-stage CR extraction framework. In the first stage, the variational probabilistic quantization (VPQ) step is introduced, where Alice and Bob employ probabilistic neural network (NN) encoders to map their observations into discrete, nearly uniform random variables (RVs) with high agreement probability while minimizing information leakage to Eve. This is realized through a variational learning objective combined with adversarial training. In the second stage, a secure sketch using code-offset construction reconciles the encoder outputs into identical secret keys, whose secrecy is guaranteed by the VPQ objective. As a representative application, we study physical layer key (PLK) generation. Beyond the traditional methods, which rely on the channel reciprocity principle and require two-way channel probing, thus suffering from large protocol overhead and being unsuitable in high mobility scenarios, we propose a sensing-based PLK generation method for integrated sensing and communications (ISAC) systems, where paired range-angle (RA) maps measured at Alice and Bob serve as correlated sources. The idea is verified through both end-to-end simulations and real-world software-defined radio (SDR) measurements, including scenarios where Eve has partial knowledge about Bob's position. The results demonstrate the feasibility and convincing performance of both the proposed CR extraction framework and sensing-based PLK generation method.
Recent advances in soft robotic hands and tactile sensing have enabled both to perform an increasing number of complex tasks with the aid of machine learning. In particular, we presented the GelSight Baby Fin Ray in our previous work, which integrates a camera with a soft, compliant Fin Ray structure. Camera-based tactile sensing gives the GelSight Baby Fin Ray the ability to capture rich contact information like forces, object geometries, and textures. Moreover, our previous work showed that the GelSight Baby Fin Ray can dig through clutter, and classify in-shell nuts. To further examine the potential of the GelSight Baby Fin Ray, we leverage learning to distinguish nut-in-shell textures and to perform force and position estimation. We implement ablation studies with popular neural network structures, including ResNet50, GoogLeNet, and 3- and 5-layer convolutional neural network (CNN) structures. We conclude that machine learning is a promising technique to extract useful information from high-resolution tactile images and empower soft robotics to better understand and interact with the environments.




Unsupervised multivariate time series (MTS) representation learning aims to extract compact and informative representations from raw sequences without relying on labels, enabling efficient transfer to diverse downstream tasks. In this paper, we propose Dual-Masked Autoencoder (DMAE), a novel masked time-series modeling framework for unsupervised MTS representation learning. DMAE formulates two complementary pretext tasks: (1) reconstructing masked values based on visible attributes, and (2) estimating latent representations of masked features, guided by a teacher encoder. To further improve representation quality, we introduce a feature-level alignment constraint that encourages the predicted latent representations to align with the teacher's outputs. By jointly optimizing these objectives, DMAE learns temporally coherent and semantically rich representations. Comprehensive evaluations across classification, regression, and forecasting tasks demonstrate that our approach achieves consistent and superior performance over competitive baselines.
Characterizing the geometry of an object orbiting around a star from its transit light curve is a powerful tool to uncover various complex phenomena. This problem is inherently ill-posed, since similar or identical light curves can be produced by multiple different shapes. In this study, we investigate the extent to which the features of a shape can be embedded in a transit light curve. We generate a library of two-dimensional random shapes and simulate their transit light curves with light curve simulator, Yuti. Each shape is decomposed into a series of elliptical components expressed in the form of Fourier coefficients that adds increasingly diminishing perturbations to an ideal ellipse. We train deep neural networks to predict these Fourier coefficients directly from simulated light curves. Our results demonstrate that the neural network can successfully reconstruct the low-order ellipses, which describe overall shape, orientation and large-scale perturbations. For higher order ellipses the scale is successfully determined but the inference of eccentricity and orientation is limited, demonstrating the extent of shape information in the light curve. We explore the impact of non-convex shape features in reconstruction, and show its dependence on shape orientation. The level of reconstruction achieved by the neural network underscores the utility of using light curves as a means to extract geometric information from transiting systems.
Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects' mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.




Natural language processing (NLP) is a key technology to extract important patient information from clinical narratives to support healthcare applications. The rapid development of large language models (LLMs) has revolutionized many NLP tasks in the clinical domain, yet their optimal use in patient information extraction tasks requires further exploration. This study examines LLMs' effectiveness in patient information extraction, focusing on LLM architectures, fine-tuning strategies, and multi-task instruction tuning techniques for developing robust and generalizable patient information extraction systems. This study aims to explore key concepts of using LLMs for clinical concept and relation extraction tasks, including: (1) encoder-only or decoder-only LLMs, (2) prompt-based parameter-efficient fine-tuning (PEFT) algorithms, and (3) multi-task instruction tuning on few-shot learning performance. We benchmarked a suite of LLMs, including encoder-based LLMs (BERT, GatorTron) and decoder-based LLMs (GatorTronGPT, Llama 3.1, GatorTronLlama), across five datasets. We compared traditional full-size fine-tuning and prompt-based PEFT. We explored a multi-task instruction tuning framework that combines both tasks across four datasets to evaluate the zero-shot and few-shot learning performance using the leave-one-dataset-out strategy.
Current multi-object tracking (MOT) algorithms typically overlook issues inherent in low-quality videos, leading to significant degradation in tracking performance when confronted with real-world image deterioration. Therefore, advancing the application of MOT algorithms in real-world low-quality video scenarios represents a critical and meaningful endeavor. To address the challenges posed by low-quality scenarios, inspired by vision-language models, this paper proposes a Visual Semantic Enhancement-guided Multi-Object Tracking framework (VSE-MOT). Specifically, we first design a tri-branch architecture that leverages a vision-language model to extract global visual semantic information from images and fuse it with query vectors. Subsequently, to further enhance the utilization of visual semantic information, we introduce the Multi-Object Tracking Adapter (MOT-Adapter) and the Visual Semantic Fusion Module (VSFM). The MOT-Adapter adapts the extracted global visual semantic information to suit multi-object tracking tasks, while the VSFM improves the efficacy of feature fusion. Through extensive experiments, we validate the effectiveness and superiority of the proposed method in real-world low-quality video scenarios. Its tracking performance metrics outperform those of existing methods by approximately 8% to 20%, while maintaining robust performance in conventional scenarios.
Hyperspectral bands offer rich spectral and spatial information; however, their high dimensionality poses challenges for efficient processing. Band selection (BS) methods aim to extract a smaller subset of bands to reduce spectral redundancy. Existing approaches, such as ranking-based, clustering-based, and iterative methods, often suffer from issues like sensitivity to initialization, parameter tuning, and high computational cost. This work introduces a BS strategy integrating three dependence measures: Average Band Correlation (ABC) and Mutual Information (MI), and Variance Inflation Factor (VIF). ABC quantifies linear correlations between spectral bands, while MI measures uncertainty reduction relative to ground truth labels. To address multicollinearity and reduce the search space, the approach first applies a VIF-based pre-selection of spectral bands. Subsequently, a clustering algorithm is used to identify the optimal subset of bands based on the ABC and MI values. Unlike previous methods, this approach is completely parameter-free for hyperspectral band selection, eliminating the need for optimal parameter estimation. The proposed method is evaluated on four standard benchmark datasets: WHU-Hi-LongKou, Pavia University, Salinas, and Oil Spill datasets, and is compared to existing state-of-the-art approaches. There is significant overlap between the bands identified by our proposed method and those selected by other methods, indicating that our approach effectively captures the most relevant spectral features. Further, support vector machine (SVM) classification validates that VIF-driven pruning enhances classification by minimizing multicollinearity. Ablation studies confirm that combining ABC with MI yields robust, discriminative band subsets.
Generalist Anomaly Detection (GAD) aims to train a unified model on an original domain that can detect anomalies in new target domains. Previous GAD methods primarily use only normal samples as references, overlooking the valuable information contained in anomalous samples that are often available in real-world scenarios. To address this limitation, we propose a more practical approach: normal-abnormal-guided generalist anomaly detection, which leverages both normal and anomalous samples as references to guide anomaly detection across diverse domains. We introduce the Normal-Abnormal Generalist Learning (NAGL) framework, consisting of two key components: Residual Mining (RM) and Anomaly Feature Learning (AFL). RM extracts abnormal patterns from normal-abnormal reference residuals to establish transferable anomaly representations, while AFL adaptively learns anomaly features in query images through residual mapping to identify instance-aware anomalies. Our approach effectively utilizes both normal and anomalous references for more accurate and efficient cross-domain anomaly detection. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing GAD approaches. This work represents the first to adopt a mixture of normal and abnormal samples as references in generalist anomaly detection. The code and datasets are available at https://github.com/JasonKyng/NAGL.