The relationship between acute kidney injury (AKI) prediction and nephrotoxic drugs, or drugs that adversely affect kidney function, is one that has yet to be explored in the critical care setting. One contributing factor to this gap in research is the limited investigation of drug modalities in the intensive care unit (ICU) context, due to the challenges of processing prescription data into the corresponding drug representations and a lack in the comprehensive understanding of these drug representations. This study addresses this gap by proposing a novel approach that leverages patient prescription data as a modality to improve existing models for AKI prediction. We base our research on Electronic Health Record (EHR) data, extracting the relevant patient prescription information and converting it into the selected drug representation for our research, the extended-connectivity fingerprint (ECFP). Furthermore, we adopt a unique multimodal approach, developing machine learning models and 1D Convolutional Neural Networks (CNN) applied to clinical drug representations, establishing a procedure which has not been used by any previous studies predicting AKI. The findings showcase a notable improvement in AKI prediction through the integration of drug embeddings and other patient cohort features. By using drug features represented as ECFP molecular fingerprints along with common cohort features such as demographics and lab test values, we achieved a considerable improvement in model performance for the AKI prediction task over the baseline model which does not include the drug representations as features, indicating that our distinct approach enhances existing baseline techniques and highlights the relevance of drug data in predicting AKI in the ICU setting
Concepts abound in mathematics and its applications. They vary greatly between subject areas, and new ones are introduced in each mathematical paper or application. A formal theory builds a hierarchy of definitions, theorems and proofs that reference each other. When an AI agent is proving a new theorem, most of the mathematical concepts and lemmas relevant to that theorem may have never been seen during training. This is especially true in the Coq proof assistant, which has a diverse library of Coq projects, each with its own definitions, lemmas, and even custom tactic procedures used to prove those lemmas. It is essential for agents to incorporate such new information into their knowledge base on the fly. We work towards this goal by utilizing a new, large-scale, graph-based dataset for machine learning in Coq. We leverage a faithful graph-representation of Coq terms that induces a directed graph of dependencies between definitions to create a novel graph neural network, Graph2Tac (G2T), that takes into account not only the current goal, but also the entire hierarchy of definitions that led to the current goal. G2T is an online model that is deeply integrated into the users' workflow and can adapt in real time to new Coq projects and their definitions. It complements well with other online models that learn in real time from new proof scripts. Our novel definition embedding task, which is trained to compute representations of mathematical concepts not seen during training, boosts the performance of the neural network to rival state-of-the-art k-nearest neighbor predictors.
Pan-sharpening involves reconstructing missing high-frequency information in multi-spectral images with low spatial resolution, using a higher-resolution panchromatic image as guidance. Although the inborn connection with frequency domain, existing pan-sharpening research has not almost investigated the potential solution upon frequency domain. To this end, we propose a novel Frequency Adaptive Mixture of Experts (FAME) learning framework for pan-sharpening, which consists of three key components: the Adaptive Frequency Separation Prediction Module, the Sub-Frequency Learning Expert Module, and the Expert Mixture Module. In detail, the first leverages the discrete cosine transform to perform frequency separation by predicting the frequency mask. On the basis of generated mask, the second with low-frequency MOE and high-frequency MOE takes account for enabling the effective low-frequency and high-frequency information reconstruction. Followed by, the final fusion module dynamically weights high-frequency and low-frequency MOE knowledge to adapt to remote sensing images with significant content variations. Quantitative and qualitative experiments over multiple datasets demonstrate that our method performs the best against other state-of-the-art ones and comprises a strong generalization ability for real-world scenes. Code will be made publicly at \url{https://github.com/alexhe101/FAME-Net}.
Quantitative measurement of crystals in high-resolution images allows for important insights into underlying material characteristics. Deep learning has shown great progress in vision-based automatic crystal size measurement, but current instance segmentation methods reach their limits with images that have large variation in crystal size or hard to detect crystal boundaries. Even small image segmentation errors, such as incorrectly fused or separated segments, can significantly lower the accuracy of the measured results. Instead of improving the existing pixel-wise boundary segmentation methods, we propose to use an instance-based segmentation method, which gives more robust segmentation results to improve measurement accuracy. Our novel method enhances flow maps with a size-aware multi-scale attention module. The attention module adaptively fuses information from multiple scales and focuses on the most relevant scale for each segmented image area. We demonstrate that our proposed attention fusion strategy outperforms state-of-the-art instance and boundary segmentation methods, as well as simple average fusion of multi-scale predictions. We evaluate our method on a refractory raw material dataset of high-resolution images with large variation in crystal size and show that our model can be used to calculate the crystal size more accurately than existing methods.
Human communication is a complex and diverse process that not only involves multiple factors such as language, commonsense, and cultural backgrounds but also requires the participation of multimodal information, such as speech. Large Language Model (LLM)-based multi-agent systems have demonstrated promising performance in simulating human society. Can we leverage LLM-based multi-agent systems to simulate human communication? However, current LLM-based multi-agent systems mainly rely on text as the primary medium. In this paper, we propose SpeechAgents, a multi-modal LLM based multi-agent system designed for simulating human communication. SpeechAgents utilizes multi-modal LLM as the control center for individual agent and employes multi-modal signals as the medium for exchanged messages among agents. Additionally, we propose Multi-Agent Tuning to enhance the multi-agent capabilities of LLM without compromising general abilities. To strengthen and evaluate the effectiveness of human communication simulation, we build the Human-Communication Simulation Benchmark. Experimental results demonstrate that SpeechAgents can simulate human communication dialogues with consistent content, authentic rhythm, and rich emotions and demonstrate excellent scalability even with up to 25 agents, which can apply to tasks such as drama creation and audio novels generation. Code and models will be open-sourced at https://github. com/0nutation/SpeechAgents
Both space and ground communications have been proven effective solutions under different perspectives in Internet of Things (IoT) networks. This paper investigates multiple-access scenarios, where plenty of IoT users are cooperatively served by a satellite in space and access points (APs) on the ground. Available users in each coherence interval are split into scheduled and unscheduled subsets to optimize limited radio resources. We compute the uplink ergodic throughput of each scheduled user under imperfect channel state information (CSI) and non-orthogonal pilot signals. As maximum-radio combining is deployed locally at the ground gateway and the APs, the uplink ergodic throughput is obtained in a closed-form expression. The analytical results explicitly unveil the effects of channel conditions and pilot contamination on each scheduled user. By maximizing the sum throughput, the system can simultaneously determine scheduled users and perform power allocation based on either a model-based approach with alternating optimization or a learning-based approach with the graph neural network. Numerical results manifest that integrated satellite-terrestrial cell-free massive multiple-input multiple-output systems can significantly improve the sum ergodic throughput over coherence intervals. The integrated systems can schedule the vast majority of users; some might be out of service due to the limited power budget.
Traditional manual detection for solder joint defect is no longer applied during industrial production due to low efficiency, inconsistent evaluation, high cost and lack of real-time data. A new approach has been proposed to address the issues of low accuracy, high false detection rates and computational cost of solder joint defect detection in surface mount technology of industrial scenarios. The proposed solution is a hybrid attention mechanism designed specifically for the solder joint defect detection algorithm to improve quality control in the manufacturing process by increasing the accuracy while reducing the computational cost. The hybrid attention mechanism comprises a proposed enhanced multi-head self-attention and coordinate attention mechanisms increase the ability of attention networks to perceive contextual information and enhances the utilization range of network features. The coordinate attention mechanism enhances the connection between different channels and reduces location information loss. The hybrid attention mechanism enhances the capability of the network to perceive long-distance position information and learn local features. The improved algorithm model has good detection ability for solder joint defect detection, with mAP reaching 91.5%, 4.3% higher than the You Only Look Once version 5 algorithm and better than other comparative algorithms. Compared to other versions, mean Average Precision, Precision, Recall, and Frame per Seconds indicators have also improved. The improvement of detection accuracy can be achieved while meeting real-time detection requirements.
The importance of the information in the direct sound to human perception of spatial sound sources is an ongoing research topic. The classification between direct sound and diffuse or reverberant sound forms the basis of numerous studies in the field of spatial audio. In particular, parametric spatial audio representation methods use this classification and employ signal processing in order to enhance the audio quality at reproduction. However, current literature does not provide information concerning the impact of ideal direct sound representation on externalization, in the context of Ambisonics. This paper aims to assess the importance of the spatial information in the direct sound in the externalization of a sound field when using binaural reproduction. This is done in the spherical harmonics (SH) domain, where an ideal direct sound representation within an otherwise Ambisonics signal is simulated, and its perceived externalization is evaluated in a formal listening test. This investigation leads to the conclusion that externalization of a first order Ambisonics signal may be significantly improved by enhancing the direct sound component, up to a level similar to a third order Ambisonics signal.
Recent work on Neural Radiance Fields (NeRF) exploits multi-view 3D consistency, achieving impressive results in 3D scene modeling and high-fidelity novel-view synthesis. However, there are limitations. First, existing methods assume enough high-quality images are available for training the NeRF model, ignoring real-world image degradation. Second, previous methods struggle with ambiguity in the training set due to unmodeled inconsistencies among different views. In this work, we present RustNeRF for real-world high-quality NeRF. To improve NeRF's robustness under real-world inputs, we train a 3D-aware preprocessing network that incorporates real-world degradation modeling. We propose a novel implicit multi-view guidance to address information loss during image degradation and restoration. Extensive experiments demonstrate RustNeRF's advantages over existing approaches under real-world degradation. The code will be released.
Accurate air quality forecasting is crucial for public health, environmental monitoring and protection, and urban planning. However, existing methods fail to effectively utilize multi-scale information, both spatially and temporally. Spatially, there is a lack of integration between individual monitoring stations and city-wide scales. Temporally, the periodic nature of air quality variations is often overlooked or inadequately considered. To address these limitations, we present a novel Multi-spatial Multi-temporal air quality forecasting method based on Graph Convolutional Networks and Gated Recurrent Units (M2G2), bridging the gap in air quality forecasting across spatial and temporal scales. The proposed framework consists of two modules: Multi-scale Spatial GCN (MS-GCN) for spatial information fusion and Multi-scale Temporal GRU(MT-GRU) for temporal information integration. In the spatial dimension, the MS-GCN module employs a bidirectional learnable structure and a residual structure, enabling comprehensive information exchange between individual monitoring stations and the city-scale graph. Regarding the temporal dimension, the MT-GRU module adaptively combines information from different temporal scales through parallel hidden states. Leveraging meteorological indicators and four air quality indicators, we present comprehensive comparative analyses and ablation experiments, showcasing the higher accuracy of M2G2 in comparison to nine currently available advanced approaches across all aspects. The improvements of M2G2 over the second-best method on RMSE of the 24h/48h/72h are as follows: PM2.5: (7.72%, 6.67%, 10.45%); PM10: (6.43%, 5.68%, 7.73%); NO2: (5.07%, 7.76%, 16.60%); O3: (6.46%, 6.86%, 9.79%). Furthermore, we demonstrate the effectiveness of each module of M2G2 by ablation study.