Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jinwei Yuan

YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Aug 24, 2022

Cheng Han, Qichao Zhao, Shuyi Zhang, Yinzi Chen, Zhenlin Zhang, Jinwei Yuan

Figure 1 for YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Figure 2 for YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Figure 3 for YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Figure 4 for YOLOPv2: Better, Faster, Stronger for Panoptic Driving Perception

Abstract:Over the last decade, multi-tasking learning approaches have achieved promising results in solving panoptic driving perception problems, providing both high-precision and high-efficiency performance. It has become a popular paradigm when designing networks for real-time practical autonomous driving system, where computation resources are limited. This paper proposed an effective and efficient multi-task learning network to simultaneously perform the task of traffic object detection, drivable road area segmentation and lane detection. Our model achieved the new state-of-the-art (SOTA) performance in terms of accuracy and speed on the challenging BDD100K dataset. Especially, the inference time is reduced by half compared to the previous SOTA model. Code will be released in the near future.

Via

Access Paper or Ask Questions

Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Mar 23, 2022

Hsin-Ping Huang, Deqing Sun, Yaojie Liu, Wen-Sheng Chu, Taihong Xiao, Jinwei Yuan, Hartwig Adam, Ming-Hsuan Yang

Figure 1 for Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Figure 2 for Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Figure 3 for Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Figure 4 for Adaptive Transformers for Robust Few-shot Cross-domain Face Anti-spoofing

Abstract:While recent face anti-spoofing methods perform well under the intra-domain setups, an effective approach needs to account for much larger appearance variations of images acquired in complex scenes with different sensors for robust performance. In this paper, we present adaptive vision transformers (ViT) for robust cross-domain face anti-spoofing. Specifically, we adopt ViT as a backbone to exploit its strength to account for long-range dependencies among pixels. We further introduce the ensemble adapters module and feature-wise transformation layers in the ViT to adapt to different domains for robust performance with a few samples. Experiments on several benchmark datasets show that the proposed models achieve both robust and competitive performance against the state-of-the-art methods.

Via

Access Paper or Ask Questions

CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval

Nov 10, 2021

Zijian Gao, Jingyu Liu, Sheng Chen, Dedan Chang, Hao Zhang, Jinwei Yuan

Figure 1 for CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval

Figure 2 for CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval

Figure 3 for CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval

Figure 4 for CLIP2TV: An Empirical Study on Transformer-based Methods for Video-Text Retrieval

Abstract:Modern video-text retrieval frameworks basically consist of three parts: video encoder, text encoder and the similarity head. With the success on both visual and textual representation learning, transformer based encoders and fusion methods have also been adopted in the field of video-text retrieval. In this report, we present CLIP2TV, aiming at exploring where the critical elements lie in transformer based methods. To achieve this, We first revisit some recent works on multi-modal learning, then introduce some techniques into video-text retrieval, finally evaluate them through extensive experiments in different configurations. Notably, CLIP2TV achieves 52.9@R1 on MSR-VTT dataset, outperforming the previous SOTA result by 4.1%.

* Tech Report

Via

Access Paper or Ask Questions

Learnable Cost Volume Using the Cayley Representation

Jul 21, 2020

Taihong Xiao, Jinwei Yuan, Deqing Sun, Qifei Wang, Xin-Yu Zhang, Kehan Xu, Ming-Hsuan Yang

Figure 1 for Learnable Cost Volume Using the Cayley Representation

Figure 2 for Learnable Cost Volume Using the Cayley Representation

Figure 3 for Learnable Cost Volume Using the Cayley Representation

Figure 4 for Learnable Cost Volume Using the Cayley Representation

Abstract:Cost volume is an essential component of recent deep models for optical flow estimation and is usually constructed by calculating the inner product between two feature vectors. However, the standard inner product in the commonly-used cost volume may limit the representation capacity of flow models because it neglects the correlation among different channel dimensions and weighs each dimension equally. To address this issue, we propose a learnable cost volume (LCV) using an elliptical inner product, which generalizes the standard inner product by a positive definite kernel matrix. To guarantee its positive definiteness, we perform spectral decomposition on the kernel matrix and re-parameterize it via the Cayley representation. The proposed LCV is a lightweight module and can be easily plugged into existing models to replace the vanilla cost volume. Experimental results show that the LCV module not only improves the accuracy of state-of-the-art models on standard benchmarks, but also promotes their robustness against illumination change, noises, and adversarial perturbations of the input signals.

* ECCV 2020

Via

Access Paper or Ask Questions