The rapid development of facial manipulation techniques has aroused public concerns in recent years. Following the success of deep learning, existing methods always formulate DeepFake video detection as a binary classification problem and develop frame-based and video-based solutions. However, little attention has been paid to capturing the spatial-temporal inconsistency in forged videos. To address this issue, we term this task as a Spatial-Temporal Inconsistency Learning (STIL) process and instantiate it into a novel STIL block, which consists of a Spatial Inconsistency Module (SIM), a Temporal Inconsistency Module (TIM), and an Information Supplement Module (ISM). Specifically, we present a novel temporal modeling paradigm in TIM by exploiting the temporal difference over adjacent frames along with both horizontal and vertical directions. And the ISM simultaneously utilizes the spatial information from SIM and temporal information from TIM to establish a more comprehensive spatial-temporal representation. Moreover, our STIL block is flexible and could be plugged into existing 2D CNNs. Extensive experiments and visualizations are presented to demonstrate the effectiveness of our method against the state-of-the-art competitors.
A photonic approach for radio-frequency (RF) self-interference cancellation (SIC) incorporated in an in-band full-duplex radio-over-fiber system is proposed. A dual-polarization binary phase-shift keying modulator is used for dual-polarization multiplexing at the central office (CO). A local oscillator signal and an intermediate-frequency signal carrying the downlink data are single-sideband modulated on the two polarization directions of the modulator, respectively. The optical signal is then transmitted to the remote unit, where the optical signals in the two polarization directions are split into two parts. One part is detected to generate the up-converted downlink RF signal, and the other part is re-modulated by the uplink RF signal and the self-interference, which is then transmitted back to the CO for the signal down-conversion and SIC via the optical domain signal adjustment and balanced detection. The functions of SIC, frequency up-conversion, down-conversion, and fiber transmission with dispersion immunity are all incorporated in the system. An experiment is performed. Cancellation depths of more than 39 dB for the single-tone signal and more than 20 dB for the 20-MBaud 16 quadrature amplitude modulation signal are achieved in the back-to-back case. The performance of the system does not have a significant decline when a section of 4.1-km optical fiber is incorporated.
A simplified Doppler frequency shift measurement approach based on Serrodyne optical frequency translation is reported. A sawtooth wave with an appropriate amplitude is sent to one phase modulation arm of a Mach-Zehnder modulator in conjunction with the transmitted signal to implement the Serrodyne optical frequency transition, as well as the optical phase modulation of the transmitted signal on the frequency-shifted optical carrier. The echo signal is applied to the other phase modulation arm of the Mach-Zehnder modulator. The optical signals from the two arms are combined in the Mach-Zehnder modulator, whose lower optical sidebands are selected by an optical bandpass filter and then detected in a photodetector. By simply measuring the frequency of the output low-frequency signal, the value and direction of DFS can be determined simultaneously. An experiment is performed. DFS from -100 to 100 kHz is measured for microwave signals from 6 to 17 GHz with a measurement error of less than 0.03 Hz and a measurement stability of 0.015 Hz in 30 minutes when a 500-kHz sawtooth wave is used as the reference.
This work proposes an autonomous docking control for nonholonomic constrained mobile robots and applies it to an intelligent mobility device or wheelchair for assisting the user in approaching resting furniture such as a chair or a bed. We defined a virtual landmark inferred from the target docking destination. Then, we solve the problem of keeping the targeted volume inside the field of view (FOV) of a tracking camera and docking to the virtual landmark through a novel definition that enables to control for the desired end-pose. In this article, we proposed a nonlinear feedback controller to perform the docking with the depth camera's FOV as a constraint. Then, a numerical method is proposed to find the feasible space of initial states where convergence could be guaranteed. Finally, the entire system was embedded for real-time operation on a standing wheelchair with the virtual landmark estimation by 3D object tracking with an RGB-D camera and we validated the effectiveness in simulation and experimental evaluations. The results show the guaranteed convergence for the feasible space depending on the virtual landmark location. In the implementation, the robot converges to the virtual landmark while respecting the FOV constraints.
Working memory (WM) is a basic part of human cognition, which plays an important role in the study of human cognitive load. Among various brain imaging techniques, electroencephalography has shown its advantage on easy access and reliability. However, one of the critical challenges is that individual difference may cause the ineffective results, especially when the established model meets an unfamiliar subject. In this work, we propose a cross-subject deep adaptation model with spatial attention (CS-DASA) to generalize the workload classifications across subjects. First, we transform time-series EEG data into multi-frame EEG images incorporating more spatio-temporal information. First, the subject-shared module in CS-DASA receives multi-frame EEG image data from both source and target subjects and learns the common feature representations. Then, in subject-specific module, the maximum mean discrepancy is implemented to measure the domain distribution divergence in a reproducing kernel Hilbert space, which can add an effective penalty loss for domain adaptation. Additionally, the subject-to-subject spatial attention mechanism is employed to focus on the most discriminative spatial feature in EEG image data. Experiments conducted on a public WM EEG dataset containing 13 subjects show that the proposed model is capable of achieve better performance than existing state-of-the art methods.
3D complete renal structures(CRS) segmentation targets on segmenting the kidneys, tumors, renal arteries and veins in one inference. Once successful, it will provide preoperative plans and intraoperative guidance for laparoscopic partial nephrectomy(LPN), playing a key role in the renal cancer treatment. However, no success has been reported in 3D CRS segmentation due to the complex shapes of renal structures, low contrast and large anatomical variation. In this study, we utilize the adversarial ensemble learning and propose Ensemble Multi-condition GAN(EnMcGAN) for 3D CRS segmentation for the first time. Its contribution is three-fold. 1)Inspired by windowing, we propose the multi-windowing committee which divides CTA image into multiple narrow windows with different window centers and widths enhancing the contrast for salient boundaries and soft tissues. And then, it builds an ensemble segmentation model on these narrow windows to fuse the segmentation superiorities and improve whole segmentation quality. 2)We propose the multi-condition GAN which equips the segmentation model with multiple discriminators to encourage the segmented structures meeting their real shape conditions, thus improving the shape feature extraction ability. 3)We propose the adversarial weighted ensemble module which uses the trained discriminators to evaluate the quality of segmented structures, and normalizes these evaluation scores for the ensemble weights directed at the input image, thus enhancing the ensemble results. 122 patients are enrolled in this study and the mean Dice coefficient of the renal structures achieves 84.6%. Extensive experiments with promising results on renal structures reveal powerful segmentation accuracy and great clinical significance in renal cancer treatment.
Multiple instance learning (MIL) is a powerful tool to solve the weakly supervised classification in whole slide image (WSI) based pathology diagnosis. However, the current MIL methods are usually based on independent and identical distribution hypothesis, thus neglect the correlation among different instances. To address this problem, we proposed a new framework, called correlated MIL, and provided a proof for convergence. Based on this framework, we devised a Transformer based MIL (TransMIL), which explored both morphological and spatial information. The proposed TransMIL can effectively deal with unbalanced/balanced and binary/multiple classification with great visualization and interpretability. We conducted various experiments for three different computational pathology problems and achieved better performance and faster convergence compared with state-of-the-art methods. The test AUC for the binary tumor classification can be up to 93.09% over CAMELYON16 dataset. And the AUC over the cancer subtypes classification can be up to 96.03% and 98.82% over TCGA-NSCLC dataset and TCGA-RCC dataset, respectively.
Mean field games (MFG) facilitate the application of reinforcement learning (RL) in large-scale multi-agent systems, through reducing interplays among agents to those between an individual agent and the average effect from the population. However, RL agents are notoriously prone to unexpected behaviours due to the reward mis-specification. Although inverse RL (IRL) holds promise for automatically acquiring suitable rewards from demonstrations, its extension to MFG is challenging due to the complicated notion of mean-field-type equilibria and the coupling between agent-level and population-level dynamics. To this end, we propose a novel IRL framework for MFG, called Mean Field IRL (MFIRL), where we build upon a new equilibrium concept and the maximum entropy IRL framework. Crucially, MFIRL is brought forward as the first IRL method that can recover the agent-level (ground-truth) reward functions for MFG. Experiments show the superior performance of MFIRL on sample efficiency, reward recovery and robustness against varying environment dynamics, compared to the state-of-the-art method.
General-purpose object-detection algorithms often dismiss the fine structure of detected objects. This can be traced back to how their proposed regions are evaluated. Our goal is to renegotiate the trade-off between the generality of these algorithms and their coarse detections. In this work, we present a new metric that is a marriage of a popular evaluation metric, namely Intersection over Union (IoU), and a geometrical concept, called fractal dimension. We propose Multiscale IoU (MIoU) which allows comparison between the detected and ground-truth regions at multiple resolution levels. Through several reproducible examples, we show that MIoU is indeed sensitive to the fine boundary structures which are completely overlooked by IoU and f1-score. We further examine the overall reliability of MIoU by comparing its distribution with that of IoU on synthetic and real-world datasets of objects. We intend this work to re-initiate exploration of new evaluation methods for object-detection algorithms.
A photonic-assisted multi-functional radar system for simultaneous distance and velocity measurement and high-resolution microwave imaging is proposed and experimentally demonstrated by using a composite transmitted microwave signal of a single-chirped linearly frequency-modulated (LFM) signal and a single-tone microwave signal. In the system, the transmitted signal is generated via photonic frequency up-conversion based on a single integrated dual-polarization dual-parallel Mach-Zehnder modulator (DPol-DPMZM), whereas the echo signals scattered from the target are de-chirped to two low-frequency signals using a microwave photonic frequency mixer. By using the two low-frequency de-chirped signals, the real-time distance and radial velocity of the moving target can be measured accurately according to the round-trip time of the echo signal and its Doppler frequency shift. Compared with the previous reported distance and velocity measurement methods, where two LFM signals with opposite chirps are used, these parameters can be obtained using only a single-chirped LFM signal and a single-tone microwave signal. Meanwhile, high-resolution inverse synthetic aperture radar (ISAR) imaging can also be realized using ISAR imaging algorithms. An experiment is performed to verify the proposed multi-functional microwave photonic radar system. An up-chirped LFM signal from 8.5 to 12.5 GHz and an 8.0 GHz single-tone microwave signal are used as the transmitted signal. The results show that the absolute measurement errors of distance and radial velocity are less than 5.9 cm and 2.8 cm/s, respectively. ISAR imaging results are also demonstrated, which proves the high-resolution and real-time ISAR imaging ability of the proposed system.