Abstract:Self-supervised monocular depth estimation (MDE) has received increasing interests in the last few years. The objects in the scene, including the object size and relationship among different objects, are the main clues to extract the scene structure. However, previous works lack the explicit handling of the changing sizes of the object due to the change of its depth. Especially in a monocular video, the size of the same object is continuously changed, resulting in size and depth ambiguity. To address this problem, we propose a Depth-converted-Scale Convolution (DcSConv) enhanced monocular depth estimation framework, by incorporating the prior relationship between the object depth and object scale to extract features from appropriate scales of the convolution receptive field. The proposed DcSConv focuses on the adaptive scale of the convolution filter instead of the local deformation of its shape. It establishes that the scale of the convolution filter matters no less (or even more in the evaluated task) than its local deformation. Moreover, a Depth-converted-Scale aware Fusion (DcS-F) is developed to adaptively fuse the DcSConv features and the conventional convolution features. Our DcSConv enhanced monocular depth estimation framework can be applied on top of existing CNN based methods as a plug-and-play module to enhance the conventional convolution block. Extensive experiments with different baselines have been conducted on the KITTI benchmark and our method achieves the best results with an improvement up to 11.6% in terms of SqRel reduction. Ablation study also validates the effectiveness of each proposed module.




Abstract:As a new realm of AI security, backdoor attack has drew growing attention research in recent years. It is well known that backdoor can be injected in a DNN model through the process of model training with poisoned dataset which is consist of poisoned sample. The injected model output correct prediction on benign samples yet behave abnormally on poisoned samples included trigger pattern. Most existing trigger of poisoned sample are visible and can be easily found by human visual inspection, and the trigger injection process will cause the feature loss of natural sample and trigger. To solve the above problems and inspire by spatial attention mechanism, we introduce a novel backdoor attack named SATBA, which is invisible and can minimize the loss of trigger to improve attack success rate and model accuracy. It extracts data features and generate trigger pattern related to clean data through spatial attention, poisons clean image by using a U-type models to plant a trigger into the original data. We demonstrate the effectiveness of our attack against three popular image classification DNNs on three standard datasets. Besides, we conduct extensive experiments about image similarity to show that our proposed attack can provide practical stealthiness which is critical to resist to backdoor defense.