Multi-modal machine learning (MMML), which involves integrating multiple modalities of data and their corresponding processing methods, has demonstrated promising results in various practical applications, such as text-to-image translation. This review paper summarizes the recent progress and challenges in using MMML for engineering design tasks. First, we introduce the different data modalities commonly used as design representations and involved in MMML, including text, 2D pixel data (e.g., images and sketches), and 3D shape data (e.g., voxels, point clouds, and meshes). We then provide an overview of the various approaches and techniques used for representing, fusing, aligning, synthesizing, and co-learning multi-modal data as five fundamental concepts of MMML. Next, we review the state-of-the-art capabilities of MMML that potentially apply to engineering design tasks, including design knowledge retrieval, design evaluation, and design synthesis. We also highlight the potential benefits and limitations of using MMML in these contexts. Finally, we discuss the challenges and future directions in using MMML for engineering design, such as the need for large labeled multi-modal design datasets, robust and scalable algorithms, integrating domain knowledge, and handling data heterogeneity and noise. Overall, this review paper provides a comprehensive overview of the current state and prospects of MMML for engineering design applications.
Empathy is an important characteristic to be considered when building a more intelligent and humanized dialogue agent. However, existing methods did not fully comprehend empathy as a complex process involving three aspects: cognition, affection and behavior. In this paper, we propose CAB, a novel framework that takes a comprehensive perspective of cognition, affection and behavior to generate empathetic responses. For cognition, we build paths between critical keywords in the dialogue by leveraging external knowledge. This is because keywords in a dialogue are the core of sentences. Building the logic relationship between keywords, which is overlooked by the majority of existing works, can improve the understanding of keywords and contextual logic, thus enhance the cognitive ability. For affection, we capture the emotional dependencies with dual latent variables that contain both interlocutors' emotions. The reason is that considering both interlocutors' emotions simultaneously helps to learn the emotional dependencies. For behavior, we use appropriate dialogue acts to guide the dialogue generation to enhance the empathy expression. Extensive experiments demonstrate that our multi-perspective model outperforms the state-of-the-art models in both automatic and manual evaluation.
Slip is a very common phenomena present in wheeled mobile robotic systems. It has undesirable consequences such as wasting energy and impeding system stability. To tackle the challenge of mobile robot trajectory tracking under slippery conditions, we propose a hierarchical framework that learns and adapts gains of the tracking controllers simultaneously online. Concretely, a reinforcement learning (RL) module is used to auto-tune parameters in a lateral predictive controller and a longitudinal speed PID controller. Experiments show the necessity of simultaneous gain tuning, and have demonstrated that our online framework outperforms the best baseline controller using fixed gains. By utilizing online gain adaptation, our framework achieves robust tracking performance by rejecting slip and reducing tracking errors when the mobile robot travels through various terrains.
Using artificial intelligent (AI) to re-design and enhance the current wireless communication system is a promising pathway for the future sixth-generation (6G) wireless network. The performance of AI-enabled wireless communication depends heavily on the quality of wireless air-interface data. Although there are various approaches to data quality assessment (DQA) for different applications, none has been designed for wireless air-interface data. In this paper, we propose a DQA framework to measure the quality of wireless air-interface data from three aspects: similarity, diversity, and completeness. The similarity measures how close the considered datasets are in terms of their statistical distributions; the diversity measures how well-rounded a dataset is, while the completeness measures to what degree the considered dataset satisfies the required performance metrics in an application scenario. The proposed framework can be applied to various types of wireless air-interface data, such as channel state information (CSI), signal-to-interference-plus-noise ratio (SINR), reference signal received power (RSRP), etc. For simplicity, the validity of our proposed DQA framework is corroborated by applying it to CSI data and using similarity and diversity metrics to improve CSI compression and recovery in Massive MIMO systems.
With the outbreak of COVID-19, a large number of relevant studies have emerged in recent years. We propose an automatic COVID-19 diagnosis framework based on lung CT scan images, the PVT-COV19D. In order to accommodate the different dimensions of the image input, we first classified the images using Transformer models, then sampled the images in the dataset according to normal distribution, and fed the sampling results into the modified PVTv2 model for training. A large number of experiments on the COV19-CT-DB dataset demonstrate the effectiveness of the proposed method.
Since Markowitz's mean-variance framework, optimizing a portfolio that maximizes the profit and minimizes the risk has been ubiquitous in the financial industry. Initially, profit and risk were measured by the first two moments of the portfolio's return, a.k.a. the mean and variance, which are sufficient to characterize a Gaussian distribution. However, it is broadly believed that the first two moments are not enough to capture the characteristics of the returns' behavior, which have been recognized to be asymmetric and heavy-tailed. Although there is ample evidence that portfolio designs involving the third and fourth moments, i.e., skewness and kurtosis, will outperform the conventional mean-variance framework, they are non-trivial. Specifically, in the classical framework, the memory and computational cost of computing the skewness and kurtosis grow sharply with the number of assets. To alleviate the difficulty in high-dimensional problems, we consider an alternative expression for high-order moments based on parametric representations via a generalized hyperbolic skew-t distribution. Then, we reformulate the high-order portfolio optimization problem as a fixed-point problem and propose a robust fixed-point acceleration algorithm that solves the problem in an efficient and scalable manner. Empirical experiments also demonstrate that our proposed high-order portfolio optimization framework is of low complexity and significantly outperforms the state-of-the-art methods by 2 to 4 orders of magnitude.
This work proposes a subband network for single-channel speech dereverberation, and also a new learning target based on reverberation time shortening (RTS). In the time-frequency domain, we propose to use a subband network to perform dereverberation for different frequency bands independently. The time-domain convolution can be well decomposed to subband convolutions, thence it is reasonable to train the subband network to perform subband deconvolution. The learning target for dereverberation is usually set as the direct-path speech or optionally with some early reflections. This type of target suddenly truncates the reverberation, and thus it may not be suitable for network training, and leads to a large prediction error. In this work, we propose a RTS learning target to suppress reverberation and meanwhile maintain the exponential decaying property of reverberation, which will ease the network training, and thus reduce the prediction error and signal distortions. Experiments show that the subband network can achieve outstanding dereverberation performance, and the proposed target has a smaller prediction error than the target of direct-path speech and early reflections.
Retinal vascular diseases affect the well-being of human body and sometimes provide vital signs of otherwise undetected bodily damage. Recently, deep learning techniques have been successfully applied for detection of diabetic retinopathy (DR). The main obstacle of applying deep learning techniques to detect most other retinal vascular diseases is the limited amount of data available. In this paper, we propose a transfer learning technique that aims to utilize the feature similarities for detecting retinal vascular diseases. We choose the well-studied DR detection as a source task and identify the early detection of retinopathy of prematurity (ROP) as the target task. Our experimental results demonstrate that our DR-pretrained approach dominates in all metrics the conventional ImageNet-pretrained transfer learning approach, currently adopted in medical image analysis. Moreover, our approach is more robust with respect to the stochasticity in the training process and with respect to reduced training samples. This study suggests the potential of our proposed transfer learning approach for a broad range of retinal vascular diseases or pathologies, where data is limited.
Accurate, long-term forecasting of human pedestrian trajectories in highly dynamic and interactive scenes is a long-standing challenge. Recent advances in using data-driven approaches have achieved significant improvements in terms of prediction accuracy. However, the lack of group-aware analysis has limited the performance of forecasting models. This is especially apparent in highly populated scenes, where pedestrians are moving in groups and the interactions between groups are extremely complex and dynamic. In this paper, we present Grouptron, a multi-scale dynamic forecasting framework that leverages pedestrian group detection and utilizes individual-level, group-level, and scene-level information for better understanding and representation of the scenes. Our approach employs spatio-temporal clustering algorithms to identify pedestrian groups, creates spatio-temporal graphs at the individual, group, and scene levels. It then uses graph neural networks to encode dynamics at different scales and incorporates encoding across different scales for trajectory prediction. We carried out extensive comparisons and ablation experiments to demonstrate the effectiveness of our approach. Our method achieves 9.3% decrease in final displacement error (FDE) compared with state-of-the-art methods on ETH/UCY benchmark datasets, and 16.1% decrease in FDE in more crowded scenes where extensive human group interactions are more frequently present.
There are extensive studies focusing on the application scenario that all the bipartite cohesive subgraphs need to be discovered in a bipartite graph. However, we observe that, for some applications, one is interested in finding bipartite cohesive subgraphs containing a specific vertex. In this paper, we study a new query dependent bipartite cohesive subgraph search problem based on $k$-wing model, named as the personalized $k$-wing search problem. We introduce a $k$-wing equivalence relationship to summarize the edges of a bipartite graph $G$ into groups. Therefore, all the edges of $G$ are segregated into different groups, i.e. $k$-wing equivalence class, forming an efficient and wing number conserving index called EquiWing. Further, we propose a more compact version of EquiWing, EquiWing-Comp, which is achieved by integrating our proposed $k$-butterfly loose approach and discovered hierarchy properties. These indices are used to expedite the personalized $k$-wing search with a non-repetitive access to $G$, which leads to linear algorithms for searching the personalized $k$-wing. Moreover, we conduct a thorough study on the maintenance of the proposed indices for evolving bipartite graphs. We discover novel properties that help us localize the scope of the maintenance at a low cost. By exploiting the discoveries, we propose novel algorithms for maintaining the two indices, which substantially reduces the cost of maintenance. We perform extensive experimental studies in real, large-scale graphs to validate the efficiency and effectiveness of EquiWing and EquiWing-Comp compared to the baseline.