Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yu Zhang

AI Lab, Netease

Detection of Pavement Cracks by Deep Learning Models of Transformer and UNet

Apr 25, 2023

Yu Zhang, Lin Zhang

Abstract:Fracture is one of the main failure modes of engineering structures such as buildings and roads. Effective detection of surface cracks is significant for damage evaluation and structure maintenance. In recent years, the emergence and development of deep learning techniques have shown great potential to facilitate surface crack detection. Currently, most reported tasks were performed by a convolutional neural network (CNN), while the limitation of CNN may be improved by the transformer architecture introduced recently. In this study, we investigated nine promising models to evaluate their performance in pavement surface crack detection by model accuracy, computational complexity, and model stability. We created 711 images of 224 by 224 pixels with crack labels, selected an optimal loss function, compared the evaluation metrics of the validation dataset and test dataset, analyzed the data details, and checked the segmentation outcomes of each model. We find that transformer-based models generally are easier to converge during the training process and have higher accuracy, but usually exhibit more memory consumption and low processing efficiency. Among nine models, SwinUNet outperforms the other two transformers and shows the highest accuracy among nine models. The results should shed light on surface crack detection by various deep-learning models and provide a guideline for future applications in this field.

Via

Access Paper or Ask Questions

Mastering Asymmetrical Multiplayer Game with Multi-Agent Asymmetric-Evolution Reinforcement Learning

Apr 20, 2023

Chenglu Sun, Yichi Zhang, Yu Zhang, Ziling Lu, Jingbin Liu, Sijia Xu, Weidong Zhang

Abstract:Asymmetrical multiplayer (AMP) game is a popular game genre which involves multiple types of agents competing or collaborating with each other in the game. It is difficult to train powerful agents that can defeat top human players in AMP games by typical self-play training method because of unbalancing characteristics in their asymmetrical environments. We propose asymmetric-evolution training (AET), a novel multi-agent reinforcement learning framework that can train multiple kinds of agents simultaneously in AMP game. We designed adaptive data adjustment (ADA) and environment randomization (ER) to optimize the AET process. We tested our method in a complex AMP game named Tom \& Jerry, and our AIs trained without using any human data can achieve a win rate of 98.5% against top human players over 65 matches. The ablation experiments indicated that the proposed modules are beneficial to the framework.

Via

Access Paper or Ask Questions

Personalized Federated Learning with Local Attention

Apr 14, 2023

Sicong Liang, Junchao Tian, Shujun Yang, Yu Zhang

Abstract:Federated Learning (FL) aims to learn a single global model that enables the central server to help the model training in local clients without accessing their local data. The key challenge of FL is the heterogeneity of local data in different clients, such as heterogeneous label distribution and feature shift, which could lead to significant performance degradation of the learned models. Although many studies have been proposed to address the heterogeneous label distribution problem, few studies attempt to explore the feature shift issue. To address this issue, we propose a simple yet effective algorithm, namely \textbf{p}ersonalized \textbf{Fed}erated learning with \textbf{L}ocal \textbf{A}ttention (pFedLA), by incorporating the attention mechanism into personalized models of clients while keeping the attention blocks client-specific. Specifically, two modules are proposed in pFedLA, i.e., the personalized single attention module and the personalized hybrid attention module. In addition, the proposed pFedLA method is quite flexible and general as it can be incorporated into any FL method to improve their performance without introducing additional communication costs. Extensive experiments demonstrate that the proposed pFedLA method can boost the performance of state-of-the-art FL methods on different tasks such as image classification and object detection tasks.

* We have decided to withdraw this paper because upon further review, we have identified that the explanations regarding the parameters of each layer in the experiments should be more complete and precise, and that further experiments are needed to validate the correctness of our assumptions

Via

Access Paper or Ask Questions

SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Apr 14, 2023

Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Yue Zhang

Figure 1 for SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Figure 2 for SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Figure 3 for SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Figure 4 for SPColor: Semantic Prior Guided Exemplar-based Image Colorization

Abstract:Exemplar-based image colorization aims to colorize a target grayscale image based on a color reference image, and the key is to establish accurate pixel-level semantic correspondence between these two images. Previous methods search for correspondence across the entire reference image, and this type of global matching is easy to get mismatch. We summarize the difficulties in two aspects: (1) When the reference image only contains a part of objects related to target image, improper correspondence will be established in unrelated regions. (2) It is prone to get mismatch in regions where the shape or texture of the object is easily confused. To overcome these issues, we propose SPColor, a semantic prior guided exemplar-based image colorization framework. Different from previous methods, SPColor first coarsely classifies pixels of the reference and target images to several pseudo-classes under the guidance of semantic prior, then the correspondences are only established locally between the pixels in the same class via the newly designed semantic prior guided correspondence network. In this way, improper correspondence between different semantic classes is explicitly excluded, and the mismatch is obviously alleviated. Besides, to better reserve the color from reference, a similarity masked perceptual loss is designed. Noting that the carefully designed SPColor utilizes the semantic prior provided by an unsupervised segmentation model, which is free for additional manual semantic annotations. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively on public dataset.

Via

Access Paper or Ask Questions

Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Apr 11, 2023

Tao Lei, Junwen Bai, Siddhartha Brahma, Joshua Ainslie, Kenton Lee, Yanqi Zhou, Nan Du, Vincent Y. Zhao, Yuexin Wu, Bo Li(+2 more)

Figure 1 for Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Figure 2 for Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Figure 3 for Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Figure 4 for Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference

Abstract:We propose Conditional Adapter (CoDA), a parameter-efficient transfer learning method that also improves inference efficiency. CoDA generalizes beyond standard adapter approaches to enable a new way of balancing speed and accuracy using conditional computation. Starting with an existing dense pretrained model, CoDA adds sparse activation together with a small number of new parameters and a light-weight training phase. Our experiments demonstrate that the CoDA approach provides an unexpectedly efficient way to transfer knowledge. Across a variety of language, vision, and speech tasks, CoDA achieves a 2x to 8x inference speed-up compared to the state-of-the-art Adapter approach with moderate to no accuracy loss and the same parameter efficiency.

Via

Access Paper or Ask Questions

A Fast and Lightweight Network for Low-Light Image Enhancement

Apr 06, 2023

Yu Zhang, Xiaoguang Di, Junde Wu, RAO FU, Yong Li, Yue Wang, Yanwu Xu, Guohui YANG, Chunhui Wang

Figure 1 for A Fast and Lightweight Network for Low-Light Image Enhancement

Figure 2 for A Fast and Lightweight Network for Low-Light Image Enhancement

Figure 3 for A Fast and Lightweight Network for Low-Light Image Enhancement

Figure 4 for A Fast and Lightweight Network for Low-Light Image Enhancement

Abstract:Low-light images often suffer from severe noise, low brightness, low contrast, and color deviation. While several low-light image enhancement methods have been proposed, there remains a lack of efficient methods that can simultaneously solve all of these problems. In this paper, we introduce FLW-Net, a Fast and LightWeight Network for low-light image enhancement that significantly improves processing speed and overall effect. To achieve efficient low-light image enhancement, we recognize the challenges of the lack of an absolute reference and the need for a large receptive field to obtain global contrast. Therefore, we propose an efficient global feature information extraction component and design loss functions based on relative information to overcome these challenges. Finally, we conduct comparative experiments to demonstrate the effectiveness of the proposed method, and the results confirm that FLW-Net can significantly reduce the complexity of supervised low-light image enhancement networks while improving processing effect. Code is available at https://github.com/hitzhangyu/FLW-Net

* 10 pages, 7 figures

Via

Access Paper or Ask Questions

Safe Explicable Robot Planning

Apr 04, 2023

Akkamahadevi Hanni, Andrew Boateng, Yu Zhang

Abstract:Human expectations stem from their knowledge of the others and the world. Where human-robot interaction is concerned, such knowledge about the robot may be inconsistent with the ground truth, resulting in the robot not meeting its expectations. Explicable planning was previously introduced as a novel planning approach to reconciling human expectations and the optimal robot behavior for more interpretable robot decision-making. One critical issue that remains unaddressed is safety during explicable decision-making which can lead to explicable behaviors that are unsafe. We propose Safe Explicable Planning (SEP), which extends explicable planning to support the specification of a safety bound. The objective of SEP is to find a policy that generates a behavior close to human expectations while satisfying the safety constraints introduced by the bound, which is a special case of multi-objective optimization where the solution to SEP lies on the Pareto frontier. Under such a formulation, we propose a novel and efficient method that returns the safe explicable policy and an approximate solution. In addition, we provide theoretical proof for the optimality of the exact solution under the designer-specified bound. Our evaluation results confirm the applicability and efficacy of our method for safe explicable planning.

Via

Access Paper or Ask Questions

Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

Mar 27, 2023

Siqi Chen, Xueming Li, Xianlin Zhang, Mingdao Wang, Yu Zhang, Jiatong Han, Yue Zhang

Figure 1 for Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

Figure 2 for Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

Figure 3 for Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

Figure 4 for Exemplar-based Video Colorization with Long-term Spatiotemporal Dependency

Abstract:Exemplar-based video colorization is an essential technique for applications like old movie restoration. Although recent methods perform well in still scenes or scenes with regular movement, they always lack robustness in moving scenes due to their weak ability in modeling long-term dependency both spatially and temporally, leading to color fading, color discontinuity or other artifacts. To solve this problem, we propose an exemplar-based video colorization framework with long-term spatiotemporal dependency. To enhance the long-term spatial dependency, a parallelized CNN-Transformer block and a double head non-local operation are designed. The proposed CNN-Transformer block can better incorporate long-term spatial dependency with local texture and structural features, and the double head non-local operation further leverages the performance of augmented feature. While for long-term temporal dependency enhancement, we further introduce the novel linkage subnet. The linkage subnet propagate motion information across adjacent frame blocks and help to maintain temporal continuity. Experiments demonstrate that our model outperforms recent state-of-the-art methods both quantitatively and qualitatively. Also, our model can generate more colorful, realistic and stabilized results, especially for scenes where objects change greatly and irregularly.

Via

Access Paper or Ask Questions

$P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting

Mar 27, 2023

Guoliang You, Xiaomeng Chu, Yifan Duan, Jie Peng, Jianmin Ji, Yu Zhang, Yanyong Zhang

$Figure 1 for $P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting$

$Figure 2 for $P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting$

$Figure 3 for $P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting$

$Figure 4 for $P^{3}O$: Transferring Visual Representations for Reinforcement Learning via Prompting$

Abstract:It is important for deep reinforcement learning (DRL) algorithms to transfer their learned policies to new environments that have different visual inputs. In this paper, we introduce Prompt based Proximal Policy Optimization ($P^{3}O$), a three-stage DRL algorithm that transfers visual representations from a target to a source environment by applying prompting. The process of $P^{3}O$ consists of three stages: pre-training, prompting, and predicting. In particular, we specify a prompt-transformer for representation conversion and propose a two-step training process to train the prompt-transformer for the target environment, while the rest of the DRL pipeline remains unchanged. We implement $P^{3}O$ and evaluate it on the OpenAI CarRacing video game. The experimental results show that $P^{3}O$ outperforms the state-of-the-art visual transferring schemes. In particular, $P^{3}O$ allows the learned policies to perform well in environments with different visual inputs, which is much more effective than retraining the policies in these environments.

* This paper has been accepted to be presented at the upcoming IEEE International Conference on Multimedia & Expo (ICME) in 2023

Via

Access Paper or Ask Questions

Diffusion-based Target Sampler for Unsupervised Domain Adaptation

Mar 17, 2023

Yulong Zhang, Shuhao Chen, Yu Zhang, Jiangang Lu

Abstract:Limited transferability hinders the performance of deep learning models when applied to new application scenarios. Recently, unsupervised domain adaptation (UDA) has achieved significant progress in addressing this issue via learning domain-invariant features. However, large domain shifts and the sample scarcity in the target domain make existing UDA methods achieve suboptimal performance. To alleviate these issues, we propose a plug-and-play Diffusion-based Target Sampler (DTS) to generate high fidelity and diversity pseudo target samples. By introducing class-conditional information, the labels of the generated target samples can be controlled. The generated samples can well simulate the data distribution of the target domain and help existing UDA methods transfer from the source domain to the target domain more easily, thus improving the transfer performance. Extensive experiments on various benchmarks demonstrate that the performance of existing UDA methods can be greatly improved through the proposed DTS method.

Via

Access Paper or Ask Questions