Information extraction is the process of automatically extracting structured information from unstructured text data.
The Anna Karenina Principle (AKP) holds that success requires satisfying a small set of essential conditions, whereas failure takes diverse forms. We test AKP, its reverse, and two further patterns described as ordered and noisy across novels, online encyclopedias, research papers, and movies. Texts are represented as sequences of functional blocks, and convergence is assessed in transition order and position. Results show that structural principles vary by medium: novels follow reverse AKP in order, Wikipedia combines AKP with ordered patterns, academic papers display reverse AKP in order but remain noisy in position, and movies diverge by genre. Success therefore depends on structural constraints that are specific to each medium, while failure assumes different shapes across domains.
Criminal investigations often involve the analysis of messages exchanged through instant messaging apps such as WhatsApp, which can be an extremely effort-consuming task. Our approach integrates knowledge graphs and NLP models to support this analysis by semantically enriching data collected from suspects' mobile phones, and help prosecutors and investigators search into the data and get valuable insights. Our semantic enrichment process involves extracting message data and modeling it using a knowledge graph, generating transcriptions of voice messages, and annotating the data using an end-to-end entity extraction approach. We adopt two different solutions to help users get insights into the data, one based on querying and visualizing the graph, and one based on semantic search. The proposed approach ensures that users can verify the information by accessing the original data. While we report about early results and prototypes developed in the context of an ongoing project, our proposal has undergone practical applications with real investigation data. As a consequence, we had the chance to interact closely with prosecutors, collecting positive feedback but also identifying interesting opportunities as well as promising research directions to share with the research community.
Recent advances in soft robotic hands and tactile sensing have enabled both to perform an increasing number of complex tasks with the aid of machine learning. In particular, we presented the GelSight Baby Fin Ray in our previous work, which integrates a camera with a soft, compliant Fin Ray structure. Camera-based tactile sensing gives the GelSight Baby Fin Ray the ability to capture rich contact information like forces, object geometries, and textures. Moreover, our previous work showed that the GelSight Baby Fin Ray can dig through clutter, and classify in-shell nuts. To further examine the potential of the GelSight Baby Fin Ray, we leverage learning to distinguish nut-in-shell textures and to perform force and position estimation. We implement ablation studies with popular neural network structures, including ResNet50, GoogLeNet, and 3- and 5-layer convolutional neural network (CNN) structures. We conclude that machine learning is a promising technique to extract useful information from high-resolution tactile images and empower soft robotics to better understand and interact with the environments.
Deploying large language models (LLMs) for structured data extraction in domains such as financial compliance reporting, legal document analytics, and multilingual knowledge base construction is often impractical for smaller teams due to the high cost of running large architectures and the difficulty of preparing large, high-quality datasets. Most recent instruction-tuning studies focus on seven-billion-parameter or larger models, leaving limited evidence on whether much smaller models can work reliably under low-resource, multi-task conditions. This work presents ETLCH, a billion-parameter LLaMA-based model fine-tuned with low-rank adaptation on only a few hundred to one thousand samples per task for JSON extraction, knowledge graph extraction, and named entity recognition. Despite its small scale, ETLCH outperforms strong baselines across most evaluation metrics, with substantial gains observed even at the lowest data scale. These findings demonstrate that well-tuned small models can deliver stable and accurate structured outputs at a fraction of the computational cost, enabling cost-effective and reliable information extraction pipelines in resource-constrained environments.




Unsupervised multivariate time series (MTS) representation learning aims to extract compact and informative representations from raw sequences without relying on labels, enabling efficient transfer to diverse downstream tasks. In this paper, we propose Dual-Masked Autoencoder (DMAE), a novel masked time-series modeling framework for unsupervised MTS representation learning. DMAE formulates two complementary pretext tasks: (1) reconstructing masked values based on visible attributes, and (2) estimating latent representations of masked features, guided by a teacher encoder. To further improve representation quality, we introduce a feature-level alignment constraint that encourages the predicted latent representations to align with the teacher's outputs. By jointly optimizing these objectives, DMAE learns temporally coherent and semantically rich representations. Comprehensive evaluations across classification, regression, and forecasting tasks demonstrate that our approach achieves consistent and superior performance over competitive baselines.
Generalist Anomaly Detection (GAD) aims to train a unified model on an original domain that can detect anomalies in new target domains. Previous GAD methods primarily use only normal samples as references, overlooking the valuable information contained in anomalous samples that are often available in real-world scenarios. To address this limitation, we propose a more practical approach: normal-abnormal-guided generalist anomaly detection, which leverages both normal and anomalous samples as references to guide anomaly detection across diverse domains. We introduce the Normal-Abnormal Generalist Learning (NAGL) framework, consisting of two key components: Residual Mining (RM) and Anomaly Feature Learning (AFL). RM extracts abnormal patterns from normal-abnormal reference residuals to establish transferable anomaly representations, while AFL adaptively learns anomaly features in query images through residual mapping to identify instance-aware anomalies. Our approach effectively utilizes both normal and anomalous references for more accurate and efficient cross-domain anomaly detection. Extensive experiments across multiple benchmarks demonstrate that our method significantly outperforms existing GAD approaches. This work represents the first to adopt a mixture of normal and abnormal samples as references in generalist anomaly detection. The code and datasets are available at https://github.com/JasonKyng/NAGL.
Characterizing the geometry of an object orbiting around a star from its transit light curve is a powerful tool to uncover various complex phenomena. This problem is inherently ill-posed, since similar or identical light curves can be produced by multiple different shapes. In this study, we investigate the extent to which the features of a shape can be embedded in a transit light curve. We generate a library of two-dimensional random shapes and simulate their transit light curves with light curve simulator, Yuti. Each shape is decomposed into a series of elliptical components expressed in the form of Fourier coefficients that adds increasingly diminishing perturbations to an ideal ellipse. We train deep neural networks to predict these Fourier coefficients directly from simulated light curves. Our results demonstrate that the neural network can successfully reconstruct the low-order ellipses, which describe overall shape, orientation and large-scale perturbations. For higher order ellipses the scale is successfully determined but the inference of eccentricity and orientation is limited, demonstrating the extent of shape information in the light curve. We explore the impact of non-convex shape features in reconstruction, and show its dependence on shape orientation. The level of reconstruction achieved by the neural network underscores the utility of using light curves as a means to extract geometric information from transiting systems.
Hyperspectral bands offer rich spectral and spatial information; however, their high dimensionality poses challenges for efficient processing. Band selection (BS) methods aim to extract a smaller subset of bands to reduce spectral redundancy. Existing approaches, such as ranking-based, clustering-based, and iterative methods, often suffer from issues like sensitivity to initialization, parameter tuning, and high computational cost. This work introduces a BS strategy integrating three dependence measures: Average Band Correlation (ABC) and Mutual Information (MI), and Variance Inflation Factor (VIF). ABC quantifies linear correlations between spectral bands, while MI measures uncertainty reduction relative to ground truth labels. To address multicollinearity and reduce the search space, the approach first applies a VIF-based pre-selection of spectral bands. Subsequently, a clustering algorithm is used to identify the optimal subset of bands based on the ABC and MI values. Unlike previous methods, this approach is completely parameter-free for hyperspectral band selection, eliminating the need for optimal parameter estimation. The proposed method is evaluated on four standard benchmark datasets: WHU-Hi-LongKou, Pavia University, Salinas, and Oil Spill datasets, and is compared to existing state-of-the-art approaches. There is significant overlap between the bands identified by our proposed method and those selected by other methods, indicating that our approach effectively captures the most relevant spectral features. Further, support vector machine (SVM) classification validates that VIF-driven pruning enhances classification by minimizing multicollinearity. Ablation studies confirm that combining ABC with MI yields robust, discriminative band subsets.




Data visualization is essential for interpreting complex datasets, yet traditional tools often require technical expertise, limiting accessibility. VizGen is an AI-assisted graph generation system that empowers users to create meaningful visualizations using natural language. Leveraging advanced NLP and LLMs like Claude 3.7 Sonnet and Gemini 2.0 Flash, it translates user queries into SQL and recommends suitable graph types. Built on a multi-agent architecture, VizGen handles SQL generation, graph creation, customization, and insight extraction. Beyond visualization, it analyzes data for patterns, anomalies, and correlations, and enhances user understanding by providing explanations enriched with contextual information gathered from the internet. The system supports real-time interaction with SQL databases and allows conversational graph refinement, making data analysis intuitive and accessible. VizGen democratizes data visualization by bridging the gap between technical complexity and user-friendly design.
Current multi-object tracking (MOT) algorithms typically overlook issues inherent in low-quality videos, leading to significant degradation in tracking performance when confronted with real-world image deterioration. Therefore, advancing the application of MOT algorithms in real-world low-quality video scenarios represents a critical and meaningful endeavor. To address the challenges posed by low-quality scenarios, inspired by vision-language models, this paper proposes a Visual Semantic Enhancement-guided Multi-Object Tracking framework (VSE-MOT). Specifically, we first design a tri-branch architecture that leverages a vision-language model to extract global visual semantic information from images and fuse it with query vectors. Subsequently, to further enhance the utilization of visual semantic information, we introduce the Multi-Object Tracking Adapter (MOT-Adapter) and the Visual Semantic Fusion Module (VSFM). The MOT-Adapter adapts the extracted global visual semantic information to suit multi-object tracking tasks, while the VSFM improves the efficacy of feature fusion. Through extensive experiments, we validate the effectiveness and superiority of the proposed method in real-world low-quality video scenarios. Its tracking performance metrics outperform those of existing methods by approximately 8% to 20%, while maintaining robust performance in conventional scenarios.