Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yicheng Wang

AutoVideo: An Automated Video Action Recognition System

Aug 09, 2021
Daochen Zha, Zaid Pervaiz Bhat, Yi-Wei Chen, Yicheng Wang, Sirui Ding, AnmollKumar Jain, Mohammad Qazim Bhat, Kwei-Herng Lai, Jiaben Chen, Na Zou, Xia Hu

Figure 1 for AutoVideo: An Automated Video Action Recognition System

Figure 2 for AutoVideo: An Automated Video Action Recognition System

Action recognition is a crucial task for video understanding. In this paper, we present AutoVideo, a Python system for automated video action recognition. It currently supports seven action recognition algorithms and various pre-processing modules. Unlike the existing libraries that only provide model zoos, AutoVideo is built with the standard pipeline language. The basic building block is primitive, which wraps a pre-processing module or an algorithm with some hyperparameters. AutoVideo is highly modular and extendable. It can be easily combined with AutoML searchers. The pipeline language is quite general so that we can easily enrich AutoVideo with algorithms for various other video-related tasks in the future. AutoVideo is released under MIT license at https://github.com/datamllab/autovideo

* https://github.com/datamllab/autovideo

Via

Access Paper or Ask Questions

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

May 17, 2021
Andrey Ignatov, Grigory Malivenko, David Plowman, Samarth Shukla, Radu Timofte, Ziyu Zhang, Yicheng Wang, Zilong Huang, Guozhong Luo, Gang Yu, Bin Fu, Yiran Wang, Xingyi Li, Min Shi, Ke Xian, Zhiguo Cao, Jin-Hua Du, Pei-Lin Wu, Chao Ge, Jiaoyang Yao, Fangwen Tu, Bo Li, Jung Eun Yoo, Kwanggyoon Seo, Jialei Xu, Zhenyu Li, Xianming Liu, Junjun Jiang, Wei-Chi Chen, Shayan Joya, Huanhuan Fan, Zhaobing Kang, Ang Li, Tianpeng Feng, Yang Liu, Chuannan Sheng, Jian Yin, Fausto T. Benavide

Figure 1 for Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Figure 2 for Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Figure 3 for Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Figure 4 for Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report

Depth estimation is an important computer vision problem with many practical applications to mobile devices. While many solutions have been proposed for this task, they are usually very computationally expensive and thus are not applicable for on-device inference. To address this problem, we introduce the first Mobile AI challenge, where the target is to develop an end-to-end deep learning-based depth estimation solutions that can demonstrate a nearly real-time performance on smartphones and IoT platforms. For this, the participants were provided with a new large-scale dataset containing RGB-depth image pairs obtained with a dedicated stereo ZED camera producing high-resolution depth maps for objects located at up to 50 meters. The runtime of all models was evaluated on the popular Raspberry Pi 4 platform with a mobile ARM-based Broadcom chipset. The proposed solutions can generate VGA resolution depth maps at up to 10 FPS on the Raspberry Pi 4 while achieving high fidelity results, and are compatible with any Android or Linux-based mobile devices. A detailed description of all models developed in the challenge is provided in this paper.

* Mobile AI 2021 Workshop and Challenges: https://ai-benchmark.com/workshops/mai/2021/. arXiv admin note: text overlap with arXiv:2105.07809

Via

Access Paper or Ask Questions

MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Sep 22, 2020
Yicheng Wang, Shuang Xu, Junmin Liu, Zixiang Zhao, Chunxia Zhang, Jiangshe Zhang

Figure 1 for MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Figure 2 for MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Figure 3 for MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Figure 4 for MFIF-GAN: A New Generative Adversarial Network for Multi-Focus Image Fusion

Multi-Focus Image Fusion (MFIF) is one of the promising techniques to obtain all-in-focus images to meet people's visual needs and it is a precondition of other computer vision tasks. One of the research trends of MFIF is to solve the defocus spread effect (DSE) around the focus/defocus boundary (FDB). In this paper, we present a novel generative adversarial network termed MFIF-GAN to translate multi-focus images into focus maps and to get the all-in-focus images further. The Squeeze and Excitation Residual Network (SE-ResNet) module as an attention mechanism is employed in the network. During the training, we propose reconstruction and gradient regularization loss functions to guarantee the accuracy of generated focus maps. In addition, by combining the prior knowledge of training conditon, this network is trained on a synthetic dataset with DSE based on an {\alpha}-matte model. A series of experimental results demonstrate that the MFIF-GAN is superior to several representative state-of-the-art (SOTA) algorithms in visual perception, quantitative analysis as well as efficiency.

Via

Access Paper or Ask Questions

Deep Convolutional Sparse Coding Networks for Image Fusion

May 18, 2020
Shuang Xu, Zixiang Zhao, Yicheng Wang, Chunxia Zhang, Junmin Liu, Jiangshe Zhang

Figure 1 for Deep Convolutional Sparse Coding Networks for Image Fusion

Figure 2 for Deep Convolutional Sparse Coding Networks for Image Fusion

Figure 3 for Deep Convolutional Sparse Coding Networks for Image Fusion

Figure 4 for Deep Convolutional Sparse Coding Networks for Image Fusion

Image fusion is a significant problem in many fields including digital photography, computational imaging and remote sensing, to name but a few. Recently, deep learning has emerged as an important tool for image fusion. This paper presents three deep convolutional sparse coding (CSC) networks for three kinds of image fusion tasks (i.e., infrared and visible image fusion, multi-exposure image fusion, and multi-modal image fusion). The CSC model and the iterative shrinkage and thresholding algorithm are generalized into dictionary convolution units. As a result, all hyper-parameters are learned from data. Our extensive experiments and comprehensive comparisons reveal the superiority of the proposed networks with regard to quantitative evaluation and visual inspection.

Via

Access Paper or Ask Questions

Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Mar 05, 2020
Xiangyu Zhang, Ramin Bashizade, Yicheng Wang, Cheng Lyu, Sayan Mukherjee, Alvin R. Lebeck

Figure 1 for Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Figure 2 for Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Figure 3 for Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Figure 4 for Beyond Application End-Point Results: Quantifying Statistical Robustness of MCMC Accelerators

Statistical machine learning often uses probabilistic algorithms, such as Markov Chain Monte Carlo (MCMC), to solve a wide range of problems. Probabilistic computations, often considered too slow on conventional processors, can be accelerated with specialized hardware by exploiting parallelism and optimizing the design using various approximation techniques. Current methodologies for evaluating correctness of probabilistic accelerators are often incomplete, mostly focusing only on end-point result quality ("accuracy"). It is important for hardware designers and domain experts to look beyond end-point "accuracy" and be aware of the hardware optimizations impact on other statistical properties. This work takes a first step towards defining metrics and a methodology for quantitatively evaluating correctness of probabilistic accelerators beyond end-point result quality. We propose three pillars of statistical robustness: 1) sampling quality, 2) convergence diagnostic, and 3) goodness of fit. We apply our framework to a representative MCMC accelerator and surface design issues that cannot be exposed using only application end-point result quality. Applying the framework to guide design space exploration shows that statistical robustness comparable to floating-point software can be achieved by slightly increasing the bit representation, without floating-point hardware requirements.

Via

Access Paper or Ask Questions

Analyzing Compositionality-Sensitivity of NLI Models

Nov 16, 2018
Yixin Nie, Yicheng Wang, Mohit Bansal

Figure 1 for Analyzing Compositionality-Sensitivity of NLI Models

Figure 2 for Analyzing Compositionality-Sensitivity of NLI Models

Figure 3 for Analyzing Compositionality-Sensitivity of NLI Models

Figure 4 for Analyzing Compositionality-Sensitivity of NLI Models

Success in natural language inference (NLI) should require a model to understand both lexical and compositional semantics. However, through adversarial evaluation, we find that several state-of-the-art models with diverse architectures are over-relying on the former and fail to use the latter. Further, this compositionality unawareness is not reflected via standard evaluation on current datasets. We show that removing RNNs in existing models or shuffling input words during training does not induce large performance loss despite the explicit removal of compositional information. Therefore, we propose a compositionality-sensitivity testing setup that analyzes models on natural examples from existing datasets that cannot be solved via lexical features alone (i.e., on which a bag-of-words model gives a high probability to one wrong label), hence revealing the models' actual compositionality awareness. We show that this setup not only highlights the limited compositional ability of current NLI models, but also differentiates model performance based on design, e.g., separating shallow bag-of-words models from deeper, linguistically-grounded tree-based models. Our evaluation setup is an important analysis tool: complementing currently existing adversarial and linguistically driven diagnostic evaluations, and exposing opportunities for future work on evaluating models' compositional understanding.

* AAAI 2019

Via

Access Paper or Ask Questions

Commonsense for Generative Multi-Hop Question Answering Tasks

Sep 17, 2018
Lisa Bauer, Yicheng Wang, Mohit Bansal

Figure 1 for Commonsense for Generative Multi-Hop Question Answering Tasks

Figure 2 for Commonsense for Generative Multi-Hop Question Answering Tasks

Figure 3 for Commonsense for Generative Multi-Hop Question Answering Tasks

Figure 4 for Commonsense for Generative Multi-Hop Question Answering Tasks

Reading comprehension QA tasks have seen a recent surge in popularity, yet most works have focused on fact-finding extractive QA. We instead focus on a more challenging multi-hop generative task (NarrativeQA), which requires the model to reason, gather, and synthesize disjoint pieces of information within the context to generate an answer. This type of multi-step reasoning also often requires understanding implicit relations, which humans resolve via external, background commonsense knowledge. We first present a strong generative baseline that uses a multi-attention mechanism to perform multiple hops of reasoning and a pointer-generator decoder to synthesize the answer. This model performs substantially better than previous generative models, and is competitive with current state-of-the-art span prediction models. We next introduce a novel system for selecting grounded multi-hop relational commonsense information from ConceptNet via a pointwise mutual information and term-frequency based scoring function. Finally, we effectively use this extracted commonsense information to fill in gaps of reasoning between context hops, using a selectively-gated attention mechanism. This boosts the model's performance significantly (also verified via human evaluation), establishing a new state-of-the-art for the task. We also show that our background knowledge enhancements are generalizable and improve performance on QAngaroo-WikiHop, another multi-hop reasoning dataset.

* EMNLP 2018 (22 pages)

Via

Access Paper or Ask Questions

Qiniu Submission to ActivityNet Challenge 2018

Jun 12, 2018
Xiaoteng Zhang, Yixin Bao, Feiyun Zhang, Kai Hu, Yicheng Wang, Liang Zhu, Qinzhu He, Yining Lin, Jie Shao, Yao Peng

Figure 1 for Qiniu Submission to ActivityNet Challenge 2018

Figure 2 for Qiniu Submission to ActivityNet Challenge 2018

Figure 3 for Qiniu Submission to ActivityNet Challenge 2018

Figure 4 for Qiniu Submission to ActivityNet Challenge 2018

In this paper, we introduce our submissions for the tasks of trimmed activity recognition (Kinetics) and trimmed event recognition (Moments in Time) for Activitynet Challenge 2018. In the two tasks, non-local neural networks and temporal segment networks are implemented as our base models. Multi-modal cues such as RGB image, optical flow and acoustic signal have also been used in our method. We also propose new non-local-based models for further improvement on the recognition accuracy. The final submissions after ensembling the models achieve 83.5% top-1 accuracy and 96.8% top-5 accuracy on the Kinetics validation set, 35.81% top-1 accuracy and 62.59% top-5 accuracy on the MIT validation set.

* 4 pages, 3 figures, CVPR workshop

Via

Access Paper or Ask Questions

Robust Machine Comprehension Models via Adversarial Training

Apr 17, 2018
Yicheng Wang, Mohit Bansal

Figure 1 for Robust Machine Comprehension Models via Adversarial Training

Figure 2 for Robust Machine Comprehension Models via Adversarial Training

Figure 3 for Robust Machine Comprehension Models via Adversarial Training

Figure 4 for Robust Machine Comprehension Models via Adversarial Training

It is shown that many published models for the Stanford Question Answering Dataset (Rajpurkar et al., 2016) lack robustness, suffering an over 50% decrease in F1 score during adversarial evaluation based on the AddSent (Jia and Liang, 2017) algorithm. It has also been shown that retraining models on data generated by AddSent has limited effect on their robustness. We propose a novel alternative adversary-generation algorithm, AddSentDiverse, that significantly increases the variance within the adversarial training data by providing effective examples that punish the model for making certain superficial assumptions. Further, in order to improve robustness to AddSent's semantic perturbations (e.g., antonyms), we jointly improve the model's semantic-relationship learning capabilities in addition to our AddSentDiverse-based adversarial training data augmentation. With these additions, we show that we can make a state-of-the-art model significantly more robust, achieving a 36.5% increase in F1 score under many different types of adversarial evaluation while maintaining performance on the regular SQuAD task.

* NAACL 2018 (7 pages)

Via

Access Paper or Ask Questions