Recently, flow-based frame interpolation methods have achieved great success by first modeling optical flow between target and input frames, and then building synthesis network for target frame generation. However, above cascaded architecture can lead to large model size and inference delay, hindering them from mobile and real-time applications. To solve this problem, we propose a novel Progressive Motion Context Refine Network (PMCRNet) to predict motion fields and image context jointly for higher efficiency. Different from others that directly synthesize target frame from deep feature, we explore to simplify frame interpolation task by borrowing existing texture from adjacent input frames, which means that decoder in each pyramid level of our PMCRNet only needs to update easier intermediate optical flow, occlusion merge mask and image residual. Moreover, we introduce a new annealed multi-scale reconstruction loss to better guide the learning process of this efficient PMCRNet. Experiments on multiple benchmarks show that proposed approaches not only achieve favorable quantitative and qualitative results but also reduces current model size and running time significantly.
Recent works have shown that optical flow can be learned by deep networks from unlabelled image pairs based on brightness constancy assumption and smoothness prior. Current approaches additionally impose an augmentation regularization term for continual self-supervision, which has been proved to be effective on difficult matching regions. However, this method also amplify the inevitable mismatch in unsupervised setting, blocking the learning process towards optimal solution. To break the dilemma, we propose a novel mutual distillation framework to transfer reliable knowledge back and forth between the teacher and student networks for alternate improvement. Concretely, taking estimation of off-the-shelf unsupervised approach as pseudo labels, our insight locates at defining a confidence selection mechanism to extract relative good matches, and then add diverse data augmentation for distilling adequate and reliable knowledge from teacher to student. Thanks to the decouple nature of our method, we can choose a stronger student architecture for sufficient learning. Finally, better student prediction is adopted to transfer knowledge back to the efficient teacher without additional costs in real deployment. Rather than formulating it as a supervised task, we find that introducing an extra unsupervised term for multi-target learning achieves best final results. Extensive experiments show that our approach, termed MDFlow, achieves state-of-the-art real-time accuracy and generalization ability on challenging benchmarks. Code is available at https://github.com/ltkong218/MDFlow.
Simultaneous localization and mapping (SLAM) provides user tracking and environmental mapping capabilities, enabling communication systems to gain situational awareness. Advanced communication networks with ultra-wideband, multiple antennas, and a large number of connections present opportunities for deep integration of sensing and communications. First, the development of integrated sensing and communications (ISAC) is reviewed in this study, and the differences between ISAC and traditional communication are revealed. Then, efficient mechanisms for multi-domain collaborative SLAM are presented. In particular, research opportunities and challenges for cross-sensing, cross-user, cross-frequency, and cross-device SLAM mechanisms are proposed. In addition, SLAM-aided communication strategies are explicitly discussed. We prove that the multi-domain cooperative SLAM mechanisms based on hybrid sensing and crowdsourcing can considerably improve the accuracy of localization and mapping in complex multipath propagation environments through numerical analysis. Furthermore, we conduct testbed experiments to show that the proposed SLAM mechanisms can achieve decimeter-level localization and mapping accuracy in practical scenarios, thereby proving the application prospect of multi-domain collaborative SLAM in ISAC.
The local geometrical randomness of metal foams brings complexities to the performance prediction of porous structures. Although the relative density is commonly deemed as the key factor, the stochasticity of internal cell sizes and shapes has an apparent effect on the porous structural behaviour but the corresponding measurement is challenging. To address this issue, we are aimed to develop an assessment strategy for efficiently examining the foam properties by combining multiscale modelling and deep learning. The multiscale modelling is based on the finite element (FE) simulation employing representative volume elements (RVEs) with random cellular morphologies, mimicking the typical features of closed-cell Aluminium foams. A deep learning database is constructed for training the designed convolutional neural networks (CNNs) to establish a direct link between the mesoscopic porosity characteristics and the effective Youngs modulus of foams. The error range of CNN models leads to an uncertain mechanical performance, which is further evaluated in a structural uncertainty analysis on the FG porous three-layer beam consisting of two thin high-density layers and a thick low-density one, where the imprecise CNN predicted moduli are represented as triangular fuzzy numbers in double parametric form. The uncertain beam bending deflections under a mid-span point load are calculated with the aid of Timoshenko beam theory and the Ritz method. Our findings suggest the success in training CNN models to estimate RVE modulus using images with an average error of 5.92%. The evaluation of FG porous structures can be significantly simplified with the proposed method and connects to the mesoscopic cellular morphologies without establishing the mechanics model for local foams.
With the development of neural networks and the increasing popularity of automatic driving, the calibration of the LiDAR and the camera has attracted more and more attention. This calibration task is multi-modal, where the rich color and texture information captured by the camera and the accurate three-dimensional spatial information from the LiDAR is incredibly significant for downstream tasks. Current research interests mainly focus on obtaining accurate calibration results through information fusion. However, they seldom analyze whether the calibrated results are correct or not, which could be of significant importance in real-world applications. For example, in large-scale production, the LiDARs and the cameras of each smart car have to get well-calibrated as the car leaves the production line, while in the rest of the car life period, the poses of the LiDARs and cameras should also get continually supervised to ensure the security. To this end, this paper proposes a self-checking algorithm to judge whether the extrinsic parameters are well-calibrated by introducing a binary classification network based on the fused information from the camera and the LiDAR. Moreover, since there is no such dataset for the task in this work, we further generate a new dataset branch from the KITTI dataset tailored for the task. Our experiments on the proposed dataset branch demonstrate the performance of our method. To the best of our knowledge, this is the first work to address the significance of continually checking the calibrated extrinsic parameters for autonomous driving. The code is open-sourced on the Github website at https://github.com/OpenCalib/LiDAR2camera_self-check.
Despite the impressive performance of Artificial Intelligence (AI) systems, their robustness remains elusive and constitutes a key issue that impedes large-scale adoption. Robustness has been studied in many domains of AI, yet with different interpretations across domains and contexts. In this work, we systematically survey the recent progress to provide a reconciled terminology of concepts around AI robustness. We introduce three taxonomies to organize and describe the literature both from a fundamental and applied point of view: 1) robustness by methods and approaches in different phases of the machine learning pipeline; 2) robustness for specific model architectures, tasks, and systems; and in addition, 3) robustness assessment methodologies and insights, particularly the trade-offs with other trustworthiness properties. Finally, we identify and discuss research gaps and opportunities and give an outlook on the field. We highlight the central role of humans in evaluating and enhancing AI robustness, considering the necessary knowledge humans can provide, and discuss the need for better understanding practices and developing supportive tools in the future.
Within the next several years, there will be a high level of autonomous technology that will be available for widespread use, which will reduce labor costs, increase safety, save energy, enable difficult unmanned tasks in harsh environments, and eliminate human error. Compared to software development for other autonomous vehicles, maritime software development, especially on aging but still functional fleets, is described as being in a very early and emerging phase. This introduces very large challenges and opportunities for researchers and engineers to develop maritime autonomous systems. Recent progress in sensor and communication technology has introduced the use of autonomous surface vehicles (ASVs) in applications such as coastline surveillance, oceanographic observation, multi-vehicle cooperation, and search and rescue missions. Advanced artificial intelligence technology, especially deep learning (DL) methods that conduct nonlinear mapping with self-learning representations, has brought the concept of full autonomy one step closer to reality. This paper surveys the existing work regarding the implementation of DL methods in ASV-related fields. First, the scope of this work is described after reviewing surveys on ASV developments and technologies, which draws attention to the research gap between DL and maritime operations. Then, DL-based navigation, guidance, control (NGC) systems and cooperative operations, are presented. Finally, this survey is completed by highlighting the current challenges and future research directions.
Machine learning on electromyography (EMG) has recently achieved remarkable success on a variety of tasks, while such success relies heavily on the assumption that the training and future data must be of the same data distribution. However, this assumption may not hold in many real-world applications. Model calibration is required via data re-collection and label annotation, which is generally very expensive and time-consuming. To address this problem, transfer learning (TL), which aims to improve target learners' performance by transferring the knowledge from related source domains, is emerging as a new paradigm to reduce the amount of calibration effort. In this survey, we assess the eligibility of more than fifty published peer-reviewed representative transfer learning approaches for EMG applications. Unlike previous surveys on purely transfer learning or EMG-based machine learning, this survey aims to provide an insight into the biological foundations of existing transfer learning methods on EMG-related analysis. In specific, we first introduce the physiological structure of the muscles and the EMG generating mechanism, and the recording of EMG to provide biological insights behind existing transfer learning approaches. Further, we categorize existing research endeavors into data based, model based, training scheme based, and adversarial based. This survey systematically summarizes and categorizes existing transfer learning approaches for EMG related machine learning applications. In addition, we discuss possible drawbacks of existing works and point out the future direction of better EMG transfer learning algorithms to enhance practicality for real-world applications.
Person re-identification (Re-ID) has become increasingly important as it supports a wide range of security applications. Traditional person Re-ID mainly relies on optical camera-based systems, which incur several limitations due to the changes in the appearance of people, occlusions, and human poses. In this work, we propose a WiFi vision-based system, 3D-ID, for person Re-ID in 3D space. Our system leverages the advances of WiFi and deep learning to help WiFi devices see, identify, and recognize people. In particular, we leverage multiple antennas on next-generation WiFi devices and 2D AoA estimation of the signal reflections to enable WiFi to visualize a person in the physical environment. We then leverage deep learning to digitize the visualization of the person into 3D body representation and extract both the static body shape and dynamic walking patterns for person Re-ID. Our evaluation results under various indoor environments show that the 3D-ID system achieves an overall rank-1 accuracy of 85.3%. Results also show that our system is resistant to various attacks. The proposed 3D-ID is thus very promising as it could augment or complement camera-based systems.
In this paper, we argue that the way we have been training and evaluating ML models has largely forgotten the fact that they are applied in an organization or societal context as they provide value to people. We show that with this perspective we fundamentally change how we evaluate, select and deploy ML models - and to some extent even what it means to learn. Specifically, we stress that the notion of value plays a central role in learning and evaluating, and different models may require different learning practices and provide different values based on the application context they are applied. We also show that this concretely impacts how we select and embed models into human workflows based on experimental datasets. Nothing of what is presented here is hard: to a large extent is a series of fairly trivial observations with massive practical implications.