Heavy rain removal from a single image is the task of simultaneously eliminating rain streaks and fog, which can dramatically degrade the quality of captured images. Most existing rain removal methods do not generalize well for the heavy rain case. In this work, we propose a novel network architecture consisting of three sub-networks to remove heavy rain from a single image without estimating rain streaks and fog separately. The first sub-net, a U-net-based architecture that incorporates our Spatial Channel Attention (SCA) blocks, extracts global features that provide sufficient contextual information needed to remove atmospheric distortions caused by rain and fog. The second sub-net learns the additive residues information, which is useful in removing rain streak artifacts via our proposed Residual Inception Modules (RIM). The third sub-net, the multiplicative sub-net, adopts our Channel-attentive Inception Modules (CIM) and learns the essential brighter local features which are not effectively extracted in the SCA and additive sub-nets by modulating the local pixel intensities in the derained images. Our three clean image results are then combined via an attentive blending block to generate the final clean image. Our method with SCA, RIM, and CIM significantly outperforms the previous state-of-the-art single-image deraining methods on the synthetic datasets, shows considerably cleaner and sharper derained estimates on the real image datasets. We present extensive experiments and ablation studies supporting each of our method's contributions on both synthetic and real image datasets.
Vehicle re-identification (Re-ID) is to retrieve images of the same vehicle across different cameras. Two key challenges lie in the subtle inter-instance discrepancy caused by near-duplicate identities and the large intra-instance variance caused by different views. Since the holistic appearance suffers from viewpoint variation and distortion, part-level feature learning has been introduced to enhance vehicle description. However, existing approaches to localize and amplify significant parts often fail to handle spatial misalignment as well as occlusion and require expensive annotations. In this paper, we propose a weakly supervised Part-Mentored Attention Network (PMANet) composed of a Part Attention Network (PANet) for vehicle part localization with self-attention and a Part-Mentored Network (PMNet) for mentoring the global and local feature aggregation. Firstly, PANet is introduced to predict a foreground mask and pinpoint $K$ prominent vehicle parts only with weak identity supervision. Secondly, we propose a PMNet to learn global and part-level features with multi-scale attention and aggregate them in $K$ main-partial tasks via part transfer. Like humans who first differentiate objects with general information and then observe salient parts for more detailed clues, PANet and PMNet construct a two-stage attention structure to perform a coarse-to-fine search among identities. Finally, we address this Re-ID issue as a multi-task problem, including global feature learning, identity classification, and part transfer. We adopt Homoscedastic Uncertainty to learn the optimal weighing of different losses. Comprehensive experiments are conducted on two benchmark datasets. Our approach outperforms recent state-of-the-art methods by averagely 2.63% in CMC@1 on VehicleID and 2.2% in mAP on VeRi776. Results on occluded test sets also demonstrate the generalization ability of PMANet.
This paper investigates how to step up local image descriptor matching by exploiting matching context information. Two main contexts are identified, originated respectively from the descriptor space and from the keypoint space. The former is generally used to design the actual matching strategy while the latter to filter matches according to the local spatial consistency. On this basis, a new matching strategy and a novel local spatial filter, named respectively blob matching and Delaunay Triangulation Matching (DTM) are devised. Blob matching provides a general matching framework by merging together several strategies, including pre-filtering as well as many-to-many and symmetric matching, enabling to achieve a global improvement upon each individual strategy. DTM alternates between Delaunay triangulation contractions and expansions to figure out and adjust keypoint neighborhood consistency. Experimental evaluation shows that DTM is comparable or better than the state-of-the-art in terms of matching accuracy and robustness, especially for non-planar scenes. Evaluation is carried out according to a new benchmark devised for analyzing the matching pipeline in terms of correct correspondences on both planar and non-planar scenes, including state-of-the-art methods as well as the common SIFT matching approach for reference. This evaluation can be of assistance for future research in this field.
Recommender systems play a vital role in modern online services, such as Amazon and Taobao. Traditional personalized methods, which focus on user-item (UI) relations, have been widely applied in industrial settings, owing to their efficiency and effectiveness. Despite their success, we argue that these approaches ignore local information hidden in similar users. To tackle this problem, user-based methods exploit similar user relations to make recommendations in a local perspective. Nevertheless, traditional user-based methods, like userKNN and matrix factorization, are intractable to be deployed in the real-time applications since such transductive models have to be recomputed or retrained with any new interaction. To overcome this challenge, we propose a framework called self-complementary collaborative filtering~(SCCF) which can make recommendations with both global and local information in real time. On the one hand, it utilizes UI relations and user neighborhood to capture both global and local information. On the other hand, it can identify similar users for each user in real time by inferring user representations on the fly with an inductive model. The proposed framework can be seamlessly incorporated into existing inductive UI approach and benefit from user neighborhood with little additional computation. It is also the first attempt to apply user-based methods in real-time settings. The effectiveness and efficiency of SCCF are demonstrated through extensive offline experiments on four public datasets, as well as a large scale online A/B test in Taobao.
Mobile Sensing Apps have been widely used as a practical approach to collect behavioral and health-related information from individuals and provide timely intervention to promote health and well-beings, such as mental health and chronic cares. As the objectives of mobile sensing could be either \emph{(a) personalized medicine for individuals} or \emph{(b) public health for populations}, in this work we review the design of these mobile sensing apps, and propose to categorize the design of these apps/systems in two paradigms -- \emph{(i) Personal Sensing} and \emph{(ii) Crowd Sensing} paradigms. While both sensing paradigms might incorporate with common ubiquitous sensing technologies, such as wearable sensors, mobility monitoring, mobile data offloading, and/or cloud-based data analytics to collect and process sensing data from individuals, we present a novel taxonomy system with two major components that can specify and classify apps/systems from aspects of the life-cycle of mHealth Sensing: \emph{(1) Sensing Task Creation \& Participation}, \emph{(2) Health Surveillance \& Data Collection}, and \emph{(3) Data Analysis \& Knowledge Discovery}. With respect to different goals of the two paradigms, this work systematically reviews this field, and summarizes the design of typical apps/systems in the view of the configurations and interactions between these two components. In addition to summarization, the proposed taxonomy system also helps figure out the potential directions of mobile sensing for health from both personalized medicines and population health perspectives.
In this paper, we propose intelligent reconfigurable surface (IRS)-assisted unmanned aerial vehicles (UAVs) networks that can utilise both advantages of agility and reflection for enhancing the network's performance. To aim at maximising the energy efficiency (EE) of the considered networks, we jointly optimise the power allocation of the UAVs and the phaseshift matrix of the IRS. A deep reinforcement learning (DRL) approach is proposed for solving the continuous optimisation problem with time-varying channel gain in a centralised fashion. Moreover, a parallel learning approach is also proposed for reducing the information transmission requirement of the centralised approach. Numerical results show a significant improvement of our proposed schemes compared with the conventional approaches in terms of EE, flexibility, and processing time. Our proposed DRL methods for IRS-assisted UAV networks can be used for real-time applications due to their capability of instant decision-making and handling the time-varying channel with the dynamic environmental setting.
Integration is indispensable, not only in mathematics, but also in a wide range of other fields. A deep learning method has recently been developed and shown to be capable of integrating mathematical functions that could not previously be integrated on a computer. However, that method treats integration as equivalent to natural language translation and does not reflect mathematical information. In this study, we adjusted the learning model to take mathematical information into account and developed a wide range of learning models that learn the order of numerical operations more robustly. In this way, we achieved a 98.80% correct answer rate with symbolic integration, a higher rate than that of any existing method. We judged the correctness of the integration based on whether the derivative of the primitive function was consistent with the integrand. By building an integrated model based on this strategy, we achieved a 99.79% rate of correct answers with symbolic integration.
Camera pose estimation or camera relocalization is the centerpiece in numerous computer vision tasks such as visual odometry, structure from motion (SfM) and SLAM. In this paper we propose a neural network approach with a graph transformer backbone, namely TransCamP, to address the camera relocalization problem. In contrast with prior work where the pose regression is mainly guided by photometric consistency, TransCamP effectively fuses the image features, camera pose information and inter-frame relative camera motions into encoded graph attributes and is trained towards the graph consistency and accuracy instead, yielding significantly higher computational efficiency. By leveraging graph transformer layers with edge features and enabling tensorized adjacency matrix, TransCamP dynamically captures the global attention and thus endows the pose graph with evolving structures to achieve improved robustness and accuracy. In addition, optional temporal transformer layers actively enhance the spatiotemporal inter-frame relation for sequential inputs. Evaluation of the proposed network on various public benchmarks demonstrates that TransCamP outperforms state-of-the-art approaches.
Polar ice cores play a central role in studies of the earth's climate system through natural archives. A pressing issue is the analysis of the oldest, highly thinned ice core sections, where the identification of paleoclimate signals is particularly challenging. For this, state-of-the-art imaging by laser-ablation inductively-coupled plasma mass spectrometry (LA-ICP-MS) has the potential to be revolutionary due to its combination of micron-scale 2D chemical information with visual features. However, the quantitative study of record preservation in chemical images raises new questions that call for the expertise of the computer vision community. To illustrate this new inter-disciplinary frontier, we describe a selected set of key questions. One critical task is to assess the paleoclimate significance of single line profiles along the main core axis, which we show is a scale-dependent problem for which advanced image analysis methods are critical. Another important issue is the evaluation of post-depositional layer changes, for which the chemical images provide rich information. Accordingly, the time is ripe to begin an intensified exchange among the two scientific communities of computer vision and ice core science. The collaborative building of a new framework for investigating high-resolution chemical images with automated image analysis techniques will also benefit the already wide-spread application of LA-ICP-MS chemical imaging in the geosciences.
Recommending medications for patients using electronic health records (EHRs) is a crucial data mining task for an intelligent healthcare system. It can assist doctors in making clinical decisions more efficiently. However, the inherent complexity of the EHR data renders it as a challenging task: (1) Multilevel structures: the EHR data typically contains multilevel structures which are closely related with the decision-making pathways, e.g., laboratory results lead to disease diagnoses, and then contribute to the prescribed medications; (2) Multiple sequences interactions: multiple sequences in EHR data are usually closely correlated with each other; (3) Abundant noise: lots of task-unrelated features or noise information within EHR data generally result in suboptimal performance. To tackle the above challenges, we propose a multilevel selective and interactive network (MeSIN) for medication recommendation. Specifically, MeSIN is designed with three components. First, an attentional selective module (ASM) is applied to assign flexible attention scores to different medical codes embeddings by their relevance to the recommended medications in every admission. Second, we incorporate a novel interactive long-short term memory network (InLSTM) to reinforce the interactions of multilevel medical sequences in EHR data with the help of the calibrated memory-augmented cell and an enhanced input gate. Finally, we employ a global selective fusion module (GSFM) to infuse the multi-sourced information embeddings into final patient representations for medications recommendation. To validate our method, extensive experiments have been conducted on a real-world clinical dataset. The results demonstrate a consistent superiority of our framework over several baselines and testify the effectiveness of our proposed approach.