Abstract:A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications.
Abstract:This study tackles the complexities of global supply chains, which are increasingly vulnerable to disruptions caused by port congestion, material shortages, and inflation. To address these challenges, we explore the application of machine learning methods, which excel in predicting and optimizing solutions based on large datasets. Our focus is on enhancing supply chain security through fraud detection, maintenance prediction, and material backorder forecasting. We introduce an automated machine learning framework that streamlines data analysis, model construction, and hyperparameter optimization for these tasks. By automating these processes, our framework improves the efficiency and effectiveness of supply chain security measures. Our research identifies key factors that influence machine learning performance, including sampling methods, categorical encoding, feature selection, and hyperparameter optimization. We demonstrate the importance of considering these factors when applying machine learning to supply chain challenges. Traditional mathematical programming models often struggle to cope with the complexity of large-scale supply chain problems. Our study shows that machine learning methods can provide a viable alternative, particularly when dealing with extensive datasets and complex patterns. The automated machine learning framework presented in this study offers a novel approach to supply chain security, contributing to the existing body of knowledge in the field. Its comprehensive automation of machine learning processes makes it a valuable contribution to the domain of supply chain management.
Abstract:Graph theory has been a powerful tool in solving difficult and complex problems arising in all disciplines. In particular, graph matching is a classical problem in pattern analysis with enormous applications. Many graph problems have been formulated as a mathematical program and then solved using exact, heuristic, and/or approximated-guaranteed procedures. On the other hand, graph theory has been a powerful tool in visualizing and understanding complex mathematical programming problems, especially integer programs. Formulating a graph problem as a natural integer program (IP) is often a challenging task. However, an IP formulation of the problem has many advantages. Several researchers have noted the need for natural IP formulation of graph theoretic problems. The present study aims to provide a unified framework for IP formulation of graph-matching problems. Although there are many surveys on graph matching problems, none is concerned with IP formulation. This paper is the first to provide a comprehensive IP formulation for such problems. The framework includes a variety of graph optimization problems in the literature. While these problems have been studied by different research communities, however, the framework presented here helps to bring efforts from different disciplines to tackle such diverse and complex problems. We hope the present study can significantly help to simplify some of the difficult problems arising in practice, especially in pattern analysis.
Abstract:We propose a novel constraint called Multiple Spectral filter Operators Preservation (MSFOR) to compute functional maps and based on it, develop an efficient deep functional map architecture called Deep MSFOP for shape matching. The core idea is that, instead of using the general descriptor preservation constraint, we require our maps to preserve multiple spectral filter operators. This allows us to incorporate more informative geometrical information, contained in different frequency bands of functions, into the functional map computing. This can be confirmed by that some previous techniques like wavelet preservation and LBO commutativity are actually our special cases. Moreover, we also develop a very efficient way to compute the maps with MSFOP constraint, which can be conveniently embedded into the deep learning, especially having learnable filter operators. Utilizing the above results, we finally design our Deep MSFOP pipeline, equipped with a suitable unsupervised loss jointly penalizing the functional map and the underlying pointwise map. Our deep functional map has notable advantages, including that the functional map is more geometrically informative and guaranteed to be proper, and the computing is numerically stable. Extensive experimental results on different datasets demonstrate that our approach outperforms the existing state-of-the-art methods, especially in challenging settings like non-isometric and inconsistent topology datasets.
Abstract:Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no human annotations for question-critical timestamps in existing VideoQA datasets. In light of this, we propose a novel weakly supervised framework to enforce the LMMs to reason out the answers with question-critical moments as visual inputs. Specifically, we fuse the question and answer pairs as event descriptions to find multiple keyframes as target moments, which will be pseudo-labels. With these pseudo-labels as additionally weak supervision, we devise a lightweight Gaussian-based Contrastive Grounding (GCG) module. GCG learns multiple Gaussian functions to characterize the temporal structure of the video, and sample question-critical frames as positive moments to be the visual inputs of LMMs. Extensive experiments on several VideoQA benchmarks verify the effectiveness of our framework, and we achieve substantial improvements compared to previous state-of-the-art methods.
Abstract:Despite the remarkable results that can be achieved by data-driven intelligent fault diagnosis techniques, they presuppose the same distribution of training and test data as well as sufficient labeled data. Various operating states often exist in practical scenarios, leading to the problem of domain shift that hinders the effectiveness of fault diagnosis. While recent unsupervised domain adaptation methods enable cross-domain fault diagnosis, they struggle to effectively utilize information from multiple source domains and achieve effective diagnosis faults in multiple target domains simultaneously. In this paper, we innovatively proposed a weighted joint maximum mean discrepancy enabled multi-source-multi-target unsupervised domain adaptation (WJMMD-MDA), which realizes domain adaptation under multi-source-multi-target scenarios in the field of fault diagnosis for the first time. The proposed method extracts sufficient information from multiple labeled source domains and achieves domain alignment between source and target domains through an improved weighted distance loss. As a result, domain-invariant and discriminative features between multiple source and target domains are learned with cross-domain fault diagnosis realized. The performance of the proposed method is evaluated in comprehensive comparative experiments on three datasets, and the experimental results demonstrate the superiority of this method.
Abstract:In this paper, we consider the channel modeling of a heterogeneous vehicular integrated sensing and communication (ISAC) system, where a dual-functional multi-antenna base station (BS) intends to communicate with a multi-antenna vehicular receiver (MR) and sense the surrounding environments simultaneously. The time-varying complex channel impulse responses (CIRs) of the sensing and communication channels are derived, respectively, in which the sensing and communication channels are correlated with shared clusters. The proposed models show great generality for the capability in covering both monostatic and bistatic sensing scenarios, and as well for considering both static clusters/targets and mobile clusters/targets. Important channel statistical characteristics, including time-varying spatial cross-correlation function (CCF) and temporal auto-correlation function (ACF), are derived and analyzed. Numerically results are provided to show the propagation characteristics of the proposed ISAC channel model. Finally, the proposed model is validated via the agreement between theoretical and simulated as well as measurement results.
Abstract:The efficient utilization of wind power by wind turbines relies on the ability of their pitch systems to adjust blade pitch angles in response to varying wind speeds. However, the presence of multiple fault types in the pitch system poses challenges in accurately classifying these faults. This paper proposes a novel method based on hard sample mining-enabled contrastive feature learning (HSMCFL) to address this problem. The proposed method employs cosine similarity to identify hard samples and subsequently leverages contrastive feature learning to enhance representation learning through the construction of hard sample pairs. Furthermore, a multilayer perceptron is trained using the learned discriminative representations to serve as an efficient classifier. To evaluate the effectiveness of the proposed method, two real datasets comprising wind turbine pitch system cog belt fracture data are utilized. The fault diagnosis performance of the proposed method is compared against existing methods, and the results demonstrate its superior performance. The proposed approach exhibits significant improvements in fault diagnosis accuracy, providing promising prospects for enhancing the reliability and efficiency of wind turbine pitch system fault diagnosis.
Abstract:Under the background of 5G, Internet, artificial intelligence technol,ogy and robot technology, warehousing, and logistics robot technology has developed rapidly, and products have been widely used. A practical application is to help warehouse personnel pick up or deliver heavy goods at dispersed locations based on dynamic routes. However, programs that can only accept instructions or pre-set by the system do not have more flexibility, but existing human auto-following techniques either cannot accurately identify specific targets or require a combination of lasers and cameras that are cumbersome and do not accomplish obstacle avoidance well. This paper proposed an algorithm that combines DeepSort and a width-based tracking module to track targets and use artificial potential field local path planning to avoid obstacles. The evaluation is performed in a self-designed flat bounded test field and simulated in ROS. Our method achieves the SOTA results on following and successfully reaching the end-point without hitting obstacles.
Abstract:This paper describes the submission of the RoyalFlush neural machine translation system for the WMT 2022 translation efficiency task. Unlike the commonly used autoregressive translation system, we adopted a two-stage translation paradigm called Hybrid Regression Translation (HRT) to combine the advantages of autoregressive and non-autoregressive translation. Specifically, HRT first autoregressively generates a discontinuous sequence (e.g., make a prediction every $k$ tokens, $k>1$) and then fills in all previously skipped tokens at once in a non-autoregressive manner. Thus, we can easily trade off the translation quality and speed by adjusting $k$. In addition, by integrating other modeling techniques (e.g., sequence-level knowledge distillation and deep-encoder-shallow-decoder layer allocation strategy) and a mass of engineering efforts, HRT improves 80\% inference speed and achieves equivalent translation performance with the same-capacity AT counterpart. Our fastest system reaches 6k+ words/second on the GPU latency setting, estimated to be about 3.1x faster than the last year's winner.