Recently, image quality assessment (IQA) has achieved remarkable progress with the success of deep learning. However, existing IQA methods are practically troublesome. With the strict pre-condition of full-reference (FR) methods limiting its application in real scenarios, the no-reference (NR) scheme is also inconvenient due to its unsatisfying performance and the lack of flexibility or controllability. In this paper, we aim to bridge the gap between FR and NR-IQA and introduce a brand new scheme, namely pseudo-reference image quality assessment (PR-IQA), by introducing pseudo reference images. As the first implementation of PR-IQA, we propose a novel baseline, i.e., Unpaired-IQA, from the perspective of subjective opinion-aware IQA. A self-adaptive feature fusion (SAFF) module is well-designed for the unpaired features in PR-IQA, with which the model can extract quality-discriminative features from distorted images and content variability-robust features from pseudo reference ones, respectively. Extensive experiments demonstrate that the proposed model outperforms the state-of-the-art NR-IQA methods, verifying the effectiveness of PR-IQA and demonstrating that a user-friendly, controllable IQA is feasible and successfully realized.
With the advancement of machine learning (ML) and its growing awareness, many organizations who own data but not ML expertise (data owner) would like to pool their data and collaborate with those who have expertise but need data from diverse sources to train truly generalizable models (model owner). In such collaborative ML, the data owner wants to protect the privacy of its training data, while the model owner desires the confidentiality of the model and the training method which may contain intellectual properties. However, existing private ML solutions, such as federated learning and split learning, cannot meet the privacy requirements of both data and model owners at the same time. This paper presents Citadel, a scalable collaborative ML system that protects the privacy of both data owner and model owner in untrusted infrastructures with the help of Intel SGX. Citadel performs distributed training across multiple training enclaves running on behalf of data owners and an aggregator enclave on behalf of the model owner. Citadel further establishes a strong information barrier between these enclaves by means of zero-sum masking and hierarchical aggregation to prevent data/model leakage during collaborative training. Compared with the existing SGX-protected training systems, Citadel enables better scalability and stronger privacy guarantees for collaborative ML. Cloud deployment with various ML models shows that Citadel scales to a large number of enclaves with less than 1.73X slowdown caused by SGX.
Maliciously-manipulated images or videos - so-called deep fakes - especially face-swap images and videos have attracted more and more malicious attackers to discredit some key figures. Previous pixel-level artifacts based detection techniques always focus on some unclear patterns but ignore some available semantic clues. Therefore, these approaches show weak interpretability and robustness. In this paper, we propose a biometric information based method to fully exploit the appearance and shape feature for face-swap detection of key figures. The key aspect of our method is obtaining the inconsistency of 3D facial shape and facial appearance, and the inconsistency based clue offers natural interpretability for the proposed face-swap detection method. Experimental results show the superiority of our method in robustness on various laundering and cross-domain data, which validates the effectiveness of the proposed method.
End-to-end models have gradually become the preferred option for automatic speech recognition (ASR) applications. During the training of end-to-end ASR, data augmentation is a quite effective technique for regularizing the neural networks. This paper proposes a novel data augmentation technique based on semantic transposition of the transcriptions via syntax rules for end-to-end Mandarin ASR. Specifically, we first segment the transcriptions based on part-of-speech tags. Then transposition strategies, such as placing the object in front of the subject or swapping the subject and the object, are applied on the segmented sentences. Finally, the acoustic features corresponding to the transposed transcription are reassembled based on the audio-to-text forced-alignment produced by a pre-trained ASR system. The combination of original data and augmented one is used for training a new ASR system. The experiments are conducted on the Transformer[2] and Conformer[3] based ASR. The results show that the proposed method can give consistent performance gain to the system. Augmentation related issues, such as comparison of different strategies and ratios for data combination are also investigated.
The emergence of Artificial Intelligence of Things (AIoT) has provided novel insights for many social computing applications such as group recommender systems. As distance among people has been greatly shortened, it has been a more general demand to provide personalized services to groups instead of individuals. In order to capture group-level preference features from individuals, existing methods were mostly established via aggregation and face two aspects of challenges: secure data management workflow is absent, and implicit preference feedbacks is ignored. To tackle current difficulties, this paper proposes secure Artificial Intelligence of Things for implicit Group Recommendations (SAIoT-GR). As for hardware module, a secure IoT structure is developed as the bottom support platform. As for software module, collaborative Bayesian network model and non-cooperative game are can be introduced as algorithms. Such a secure AIoT architecture is able to maximize the advantages of the two modules. In addition, a large number of experiments are carried out to evaluate the performance of the SAIoT-GR in terms of efficiency and robustness.
In recent years, virtual makeup applications have become more and more popular. However, it is still challenging to propose a robust makeup transfer method in the real-world environment. Current makeup transfer methods mostly work well on good-conditioned clean makeup images, but transferring makeup that exhibits shadow and occlusion is not satisfying. To alleviate it, we propose a novel makeup transfer method, called 3D-Aware Shadow and Occlusion Robust GAN (SOGAN). Given the source and the reference faces, we first fit a 3D face model and then disentangle the faces into shape and texture. In the texture branch, we map the texture to the UV space and design a UV texture generator to transfer the makeup. Since human faces are symmetrical in the UV space, we can conveniently remove the undesired shadow and occlusion from the reference image by carefully designing a Flip Attention Module (FAM). After obtaining cleaner makeup features from the reference image, a Makeup Transfer Module (MTM) is introduced to perform accurate makeup transfer. The qualitative and quantitative experiments demonstrate that our SOGAN not only achieves superior results in shadow and occlusion situations but also performs well in large pose and expression variations.
Jittering effects significantly degrade the performance of UAV communications especially in millimeter-wave (mmWave) band. To investigate and mitigate the impacts of UAV jitter to mmWave communications, we firstly model UAV mmWave channel based on the geometric relationship between element antennas of the uniform planar arrays (UPAs) in receiver side and transmitter side, and we incorporate the jittering effects to our channel model through extracting the relationship between UAV attitude & position and angle of arrival (AoA) & angle of departure (AoD) of the UPAs. Then, based on the extracted relationship, we propose to utilize UAV navigation information to obtain a rough estimation of AoA and AoD, and we also analyze the impact of AoA and AoD estimation error to UAV beamforming. Finally, we propose a direction-constrained beam training scheme to refine the AoA/AoD estimation. Particularly, we construct a partially random sensing matrix to measure the channel within a narrow angle range that is centered at the aforementioned rough estimate of AoA/AoD. Numerical results show that our proposed UAV beam training scheme with navigation information is able to fast and accurately estimate the AoA/AoD fluctuation caused by UAV jitter.
Although single-image super-resolution (SISR) methods have achieved great success on single degradation, they still suffer performance drop with multiple degrading effects in real scenarios. Recently, some blind and non-blind models for multiple degradations have been explored. However, those methods usually degrade significantly for distribution shifts between the training and test data. Towards this end, we propose a conditional meta-network framework (named CMDSR) for the first time, which helps SR framework learn how to adapt to changes in input distribution. We extract degradation prior at task-level with the proposed ConditionNet, which will be used to adapt the parameters of the basic SR network (BaseNet). Specifically, the ConditionNet of our framework first learns the degradation prior from a support set, which is composed of a series of degraded image patches from the same task. Then the adaptive BaseNet rapidly shifts its parameters according to the conditional features. Moreover, in order to better extract degradation prior, we propose a task contrastive loss to decrease the inner-task distance and increase the cross-task distance between task-level features. Without predefining degradation maps, our blind framework can conduct one single parameter update to yield considerable SR results. Extensive experiments demonstrate the effectiveness of CMDSR over various blind, even non-blind methods. The flexible BaseNet structure also reveals that CMDSR can be a general framework for large series of SISR models.
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. However, the derivatives of the eigenvectors tend to be numerically unstable, whether using the SVD to compute them analytically or using the Power Iteration (PI) method to approximate them. This instability arises in the presence of eigenvalues that are close to each other. This makes integrating eigendecomposition into deep networks difficult and often results in poor convergence, particularly when dealing with large matrices. While this can be mitigated by partitioning the data into small arbitrary groups, doing so has no theoretical basis and makes it impossible to exploit the full power of eigendecomposition. In previous work, we mitigated this using SVD during the forward pass and PI to compute the gradients during the backward pass. However, the iterative deflation procedure required to compute multiple eigenvectors using PI tends to accumulate errors and yield inaccurate gradients. Here, we show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying in practice on an iterative process and thus yields more accurate gradients. We demonstrate the benefits of this increased accuracy for image classification and style transfer.
This paper develops a machine learning-driven portfolio optimization framework for virtual bidding in electricity markets considering both risk constraint and price sensitivity. The algorithmic trading strategy is developed from the perspective of a proprietary trading firm to maximize profit. A recurrent neural network-based Locational Marginal Price (LMP) spread forecast model is developed by leveraging the inter-hour dependencies of the market clearing algorithm. The LMP spread sensitivity with respect to net virtual bids is modeled as a monotonic function with the proposed constrained gradient boosting tree. We leverage the proposed algorithmic virtual bid trading strategy to evaluate both the profitability of the virtual bid portfolio and the efficiency of U.S. wholesale electricity markets. The comprehensive empirical analysis on PJM, ISO-NE, and CAISO indicates that the proposed virtual bid portfolio optimization strategy considering the price sensitivity explicitly outperforms the one that neglects the price sensitivity. The Sharpe ratio of virtual bid portfolios for all three electricity markets are much higher than that of the S&P 500 index. It was also shown that the efficiency of CAISO's two-settlement system is lower than that of PJM and ISO-NE.