Language model intelligence is revolutionizing the way we program materials simulations. However, the diversity of simulation scenarios renders it challenging to precisely transform human language into a tailored simulator. Here, using three functionalized types of language model, we propose a language-to-simulation (Lang2Sim) framework that enables interactive navigation on languaging a simulation engine, by taking a scenario instance of water sorption in porous matrices. Unlike line-by-line coding of a target simulator, the language models interpret each simulator as an assembly of invariant tool function and its variant input-output pair. Lang2Sim enables the precise transform of textual description by functionalizing and sequentializing the language models of, respectively, rationalizing the tool categorization, customizing its input-output combinations, and distilling the simulator input into executable format. Importantly, depending on its functionalized type, each language model features a distinct processing of chat history to best balance its memory limit and information completeness, thus leveraging the model intelligence to unstructured nature of human request. Overall, this work establishes language model as an intelligent platform to unlock the era of languaging a simulation engine.
Autism Spectrum Disorder (ASD) can often make life difficult for children, therefore early diagnosis is necessary for proper treatment and care. Thus, in this work, we consider the problem of detecting or classifying ASD in children to aid medical professionals in early detection. To this end, we develop a deep learning model that analyzes video clips of children reacting to sensory stimuli, with the intent on capturing key differences in reactions and behavior between ASD and non-ASD patients. Unlike many works in ASD classification, their data consist of MRI data, which requires expensive specialized MRI equipment, meanwhile our method need only rely on a powerful but relatively cheaper GPU, a decent computer setup, and a video camera for inference. Results on our data show that our model can generalize well and can understand key differences in the distinct movements of the patients. This is despite limited amounts of data for a deep learning problem, limited temporal information available to the model as input, and even when there is noise due to movement.
In this paper, the problem of joint transmission and computation resource allocation for a multi-user probabilistic semantic communication (PSC) network is investigated. In the considered model, users employ semantic information extraction techniques to compress their large-sized data before transmitting them to a multi-antenna base station (BS). Our model represents large-sized data through substantial knowledge graphs, utilizing shared probability graphs between the users and the BS for efficient semantic compression. The resource allocation problem is formulated as an optimization problem with the objective of maximizing the sum of equivalent rate of all users, considering total power budget and semantic resource limit constraints. The computation load considered in the PSC network is formulated as a non-smooth piecewise function with respect to the semantic compression ratio. To tackle this non-convex non-smooth optimization challenge, a three-stage algorithm is proposed where the solutions for the receive beamforming matrix of the BS, transmit power of each user, and semantic compression ratio of each user are obtained stage by stage. Numerical results validate the effectiveness of our proposed scheme.
Detecting objects across various scales remains a significant challenge in computer vision, particularly in tasks such as Rice Leaf Disease (RLD) detection, where objects exhibit considerable scale variations. Traditional object detection methods often struggle to address these variations, resulting in missed detections or reduced accuracy. In this study, we propose the multi-scale Attention Pyramid module (mAPm), a novel approach that integrates dilated convolutions into the Feature Pyramid Network (FPN) to enhance multi-scale information ex-traction. Additionally, we incorporate a global Multi-Head Self-Attention (MHSA) mechanism and a deconvolutional layer to refine the up-sampling process. We evaluate mAPm on YOLOv7 using the MRLD and COCO datasets. Compared to vanilla FPN, BiFPN, NAS-FPN, PANET, and ACFPN, mAPm achieved a significant improvement in Average Precision (AP), with a +2.61% increase on the MRLD dataset compared to the baseline FPN method in YOLOv7. This demonstrates its effectiveness in handling scale variations. Furthermore, the versatility of mAPm allows its integration into various FPN-based object detection models, showcasing its potential to advance object detection techniques.
Large Language Models (LLMs) have achieved impressive results in Machine Translation (MT). However, careful evaluations by human reveal that the translations produced by LLMs still contain multiple errors. Importantly, feeding back such error information into the LLMs can lead to self-correction and result in improved translation performance. Motivated by these insights, we introduce a systematic LLM-based self-correcting translation framework, named TER, which stands for Translate, Estimate, and Refine, marking a significant step forward in this direction. Our findings demonstrate that 1) our self-correction framework successfully assists LLMs in improving their translation quality across a wide range of languages, whether it's from high-resource languages to low-resource ones or whether it's English-centric or centered around other languages; 2) TER exhibits superior systematicity and interpretability compared to previous methods; 3) different estimation strategies yield varied impacts on AI feedback, directly affecting the effectiveness of the final corrections. We further compare different LLMs and conduct various experiments involving self-correction and cross-model correction to investigate the potential relationship between the translation and evaluation capabilities of LLMs.
In the digital realm, rich data serves as a crucial source of insights into the complexities of social, political, and economic landscapes. Addressing the growing need for high-quality information on events and the imperative to combat hate speech, this research led to the establishment of the Shared Task on Climate Activism Stance and Hate Event Detection at CASE 2024. Focused on climate activists contending with hate speech on social media, our study contributes to hate speech identification from tweets. Analyzing three sub-tasks - Hate Speech Detection (Sub-task A), Targets of Hate Speech Identification (Sub-task B), and Stance Detection (Sub-task C) - Team Z-AGI Labs evaluated various models, including LSTM, Xgboost, and LGBM based on Tf-Idf. Results unveiled intriguing variations, with Catboost excelling in Subtask-B (F1: 0.5604) and Subtask-C (F1: 0.7081), while LGBM emerged as the top-performing model for Subtask-A (F1: 0.8684). This research provides valuable insights into the suitability of classical machine learning models for climate hate speech and stance detection, aiding informed model selection for robust mechanisms.
Automated classification of liver lesions in multi-phase CT and MR scans is of clinical significance but challenging. This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework, specifically designed for liver lesion classification in 3D multi-phase CT and MR imaging with varying phase counts. The proposed SDR-Former utilizes a streamlined Siamese Neural Network (SNN) to process multi-phase imaging inputs, possessing robust feature representations while maintaining computational efficiency. The weight-sharing feature of the SNN is further enriched by a hybrid Dual-Resolution Transformer (DR-Former), comprising a 3D Convolutional Neural Network (CNN) and a tailored 3D Transformer for processing high- and low-resolution images, respectively. This hybrid sub-architecture excels in capturing detailed local features and understanding global contextual information, thereby, boosting the SNN's feature extraction capabilities. Additionally, a novel Adaptive Phase Selection Module (APSM) is introduced, promoting phase-specific intercommunication and dynamically adjusting each phase's influence on the diagnostic outcome. The proposed SDR-Former framework has been validated through comprehensive experiments on two clinical datasets: a three-phase CT dataset and an eight-phase MR dataset. The experimental results affirm the efficacy of the proposed framework. To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public. This pioneering dataset, being the first publicly available multi-phase MR dataset in this field, also underpins the MICCAI LLD-MMRI Challenge. The dataset is accessible at:https://bit.ly/3IyYlgN.
Decision-making is a fundamental capability in everyday life. Large Language Models (LLMs) provide multifaceted support in enhancing human decision-making processes. However, understanding the influencing factors of LLM-assisted decision-making is crucial for enabling individuals to utilize LLM-provided advantages and minimize associated risks in order to make more informed and better decisions. This study presents the results of a comprehensive literature analysis, providing a structural overview and detailed analysis of determinants impacting decision-making with LLM support. In particular, we explore the effects of technological aspects of LLMs, including transparency and prompt engineering, psychological factors such as emotions and decision-making styles, as well as decision-specific determinants such as task difficulty and accountability. In addition, the impact of the determinants on the decision-making process is illustrated via multiple application scenarios. Drawing from our analysis, we develop a dependency framework that systematizes possible interactions in terms of reciprocal interdependencies between these determinants. Our research reveals that, due to the multifaceted interactions with various determinants, factors such as trust in or reliance on LLMs, the user's mental model, and the characteristics of information processing are identified as significant aspects influencing LLM-assisted decision-making processes. Our findings can be seen as crucial for improving decision quality in human-AI collaboration, empowering both users and organizations, and designing more effective LLM interfaces. Additionally, our work provides a foundation for future empirical investigations on the determinants of decision-making assisted by LLMs.
Recently, the application of Contrastive Representation Learning (CRL) in learning with noisy labels (LNL) has shown promising advancements due to its remarkable ability to learn well-distributed representations for better distinguishing noisy labels. However, CRL is mainly used as a pre-training technique, leading to a complicated multi-stage training pipeline. We also observed that trivially combining CRL with supervised LNL methods decreases performance. Using different images from the same class as negative pairs in CRL creates optimization conflicts between CRL and the supervised loss. To address these two issues, we propose an end-to-end PLReMix framework that avoids the complicated pipeline by introducing a Pseudo-Label Relaxed (PLR) contrastive loss to alleviate the conflicts between losses. This PLR loss constructs a reliable negative set of each sample by filtering out its inappropriate negative pairs that overlap at the top k indices of prediction probabilities, leading to more compact semantic clusters than vanilla CRL. Furthermore, a two-dimensional Gaussian Mixture Model (GMM) is adopted to distinguish clean and noisy samples by leveraging semantic information and model outputs simultaneously, which is expanded on the previously widely used one-dimensional form. The PLR loss and a semi-supervised loss are simultaneously applied to train on the GMM divided clean and noisy samples. Experiments on multiple benchmark datasets demonstrate the effectiveness of the proposed method. Our proposed PLR loss is scalable, which can be easily integrated into other LNL methods and boost their performance. Codes will be available.
Graph Neural Networks (GNNs) have gained considerable traction for their capability to effectively process topological data, yet their interpretability remains a critical concern. Current interpretation methods are dominated by post-hoc explanations to provide a transparent and intuitive understanding of GNNs. However, they have limited performance in interpreting complicated subgraphs and can't utilize the explanation to advance GNN predictions. On the other hand, transparent GNN models are proposed to capture critical subgraphs. While such methods could improve GNN predictions, they usually don't perform well on explanations. Thus, it is desired for a new strategy to better couple GNN explanation and prediction. In this study, we have developed a novel interpretable causal GNN framework that incorporates retrieval-based causal learning with Graph Information Bottleneck (GIB) theory. The framework could semi-parametrically retrieve crucial subgraphs detected by GIB and compress the explanatory subgraphs via a causal module. The framework was demonstrated to consistently outperform state-of-the-art methods, and to achieve 32.71\% higher precision on real-world explanation scenarios with diverse explanation types. More importantly, the learned explanations were shown able to also improve GNN prediction performance.