Urban-scene Image segmentation is an important and trending topic in computer vision with wide use cases like autonomous driving [1]. Starting with the breakthrough work of Long et al. [2] that introduces Fully Convolutional Networks (FCNs), the development of novel architectures and practical uses of neural networks in semantic segmentation has been expedited in the recent 5 years. Aside from seeking solutions in general model design for information shrinkage due to pooling, urban-scene image itself has intrinsic features like positional patterns [3]. Our project seeks an advanced and integrated solution that specifically targets urban-scene image semantic segmentation among the most novel approaches in the current field. We re-implement the cutting edge model DeepLabv3+ [4] with ResNet-101 [5] backbone as our strong baseline model. Based upon DeepLabv3+, we incorporate HANet [3] to account for the vertical spatial priors in urban-scene image tasks. To boost up model efficiency and performance, we further explore the Atrous Spatial Pooling (ASP) layer in DeepLabv3+ and infuse a computational efficient variation called "Waterfall" Atrous Spatial Pooling (WASP) [6] architecture in our model. We find that our two-step integrated model improves the mean Intersection-Over-Union (mIoU) score gradually from the baseline model. In particular, HANet successfully identifies height-driven patterns and improves per-class IoU of common class labels in urban scenario like fence and bus. We also demonstrate the improvement of model efficiency with help of WASP in terms of computational times during training and parameter reduction from the original ASPP module.
Adversarial attacks can generate adversarial inputs by applying small but intentionally worst-case perturbations to samples from the dataset, which leads to even state-of-the-art deep neural networks outputting incorrect answers with high confidence. Hence, some adversarial defense techniques are developed to improve the security and robustness of the models and avoid them being attacked. Gradually, a game-like competition between attackers and defenders formed, in which both players would attempt to play their best strategies against each other while maximizing their own payoffs. To solve the game, each player would choose an optimal strategy against the opponent based on the prediction of the opponent's strategy choice. In this work, we are on the defensive side to apply game-theoretic approaches on defending against attacks. We use two randomization methods, random initialization and stochastic activation pruning, to create diversity of networks. Furthermore, we use one denoising technique, super resolution, to improve models' robustness by preprocessing images before attacks. Our experimental results indicate that those three methods can effectively improve the robustness of deep-learning neural networks.