Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yoshitaka Ushiku

OMRON SINIC X

Divergence Optimization for Noisy Universal Domain Adaptation

Apr 01, 2021

Qing Yu, Atsushi Hashimoto, Yoshitaka Ushiku

Figure 1 for Divergence Optimization for Noisy Universal Domain Adaptation

Figure 2 for Divergence Optimization for Noisy Universal Domain Adaptation

Figure 3 for Divergence Optimization for Noisy Universal Domain Adaptation

Figure 4 for Divergence Optimization for Noisy Universal Domain Adaptation

Abstract:Universal domain adaptation (UniDA) has been proposed to transfer knowledge learned from a label-rich source domain to a label-scarce target domain without any constraints on the label sets. In practice, however, it is difficult to obtain a large amount of perfectly clean labeled data in a source domain with limited resources. Existing UniDA methods rely on source samples with correct annotations, which greatly limits their application in the real world. Hence, we consider a new realistic setting called Noisy UniDA, in which classifiers are trained with noisy labeled data from the source domain and unlabeled data with an unknown class distribution from the target domain. This paper introduces a two-head convolutional neural network framework to solve all problems simultaneously. Our network consists of one common feature generator and two classifiers with different decision boundaries. By optimizing the divergence between the two classifiers' outputs, we can detect noisy source samples, find "unknown" classes in the target domain, and align the distribution of the source and target domains. In an extensive evaluation of different domain adaptation settings, the proposed method outperformed existing methods by a large margin in most settings.

* CVPR 2021

Via

Access Paper or Ask Questions

Crowd Density Forecasting by Modeling Patch-based Dynamics

Nov 22, 2019

Hiroaki Minoura, Ryo Yonetani, Mai Nishimura, Yoshitaka Ushiku

Figure 1 for Crowd Density Forecasting by Modeling Patch-based Dynamics

Figure 2 for Crowd Density Forecasting by Modeling Patch-based Dynamics

Figure 3 for Crowd Density Forecasting by Modeling Patch-based Dynamics

Figure 4 for Crowd Density Forecasting by Modeling Patch-based Dynamics

Abstract:Forecasting human activities observed in videos is a long-standing challenge in computer vision, which leads to various real-world applications such as mobile robots, autonomous driving, and assistive systems. In this work, we present a new visual forecasting task called crowd density forecasting. Given a video of a crowd captured by a surveillance camera, our goal is to predict how that crowd will move in future frames. To address this task, we have developed the patch-based density forecasting network (PDFN), which enables forecasting over a sequence of crowd density maps describing how crowded each location is in each video frame. PDFN represents a crowd density map based on spatially overlapping patches and learns density dynamics patch-wise in a compact latent space. This enables us to model diverse and complex crowd density dynamics efficiently, even when the input video involves a variable number of crowds that each move independently. Experimental results with several public datasets demonstrate the effectiveness of our approach compared with state-of-the-art forecasting methods.

Via

Access Paper or Ask Questions

Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data

May 23, 2019

Ryo Yonetani, Tomohiro Takahashi, Atsushi Hashimoto, Yoshitaka Ushiku

Figure 1 for Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data

Figure 2 for Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data

Figure 3 for Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data

Figure 4 for Decentralized Learning of Generative Adversarial Networks from Multi-Client Non-iid Data

Abstract:This work addresses a new problem of learning generative adversarial networks (GANs) from multiple data collections that are each i) owned separately and privately by different clients and ii) drawn from a non-identical distribution that comprises different classes. Given such multi-client and non-iid data as input, we aim to achieve a distribution involving all the classes input data can belong to, while keeping the data decentralized and private in each client storage. Our key contribution to this end is a new decentralized approach for learning GANs from non-iid data called Forgiver-First Update (F2U), which a) asks clients to train an individual discriminator with their own data and b) updates a generator to fool the most `forgiving' discriminators who deem generated samples as the most real. Our theoretical analysis proves that this updating strategy indeed allows the decentralized GAN to learn a generator's distribution with all the input classes as its global optimum based on f-divergence minimization. Moreover, we propose a relaxed version of F2U called Forgiver-First Aggregation (F2A), which adaptively aggregates the discriminators while emphasizing forgiving ones to perform well in practice. Our empirical evaluations with image generation tasks demonstrated the effectiveness of our approach over state-of-the-art decentralized learning methods.

Via

Access Paper or Ask Questions

Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Mar 15, 2019

Yang Li, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Figure 2 for Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Figure 3 for Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Figure 4 for Pose Graph Optimization for Unsupervised Monocular Visual Odometry

Abstract:Unsupervised Learning based monocular visual odometry (VO) has lately drawn significant attention for its potential in label-free leaning ability and robustness to camera parameters and environmental variations. However, partially due to the lack of drift correction technique, these methods are still by far less accurate than geometric approaches for large-scale odometry estimation. In this paper, we propose to leverage graph optimization and loop closure detection to overcome limitations of unsupervised learning based monocular visual odometry. To this end, we propose a hybrid VO system which combines an unsupervised monocular VO called NeuralBundler with a pose graph optimization back-end. NeuralBundler is a neural network architecture that uses temporal and spatial photometric loss as main supervision and generates a windowed pose graph consists of multi-view 6DoF constraints. We propose a novel pose cycle consistency loss to relieve the tensions in the windowed pose graph, leading to improved performance and robustness. In the back-end, a global pose graph is built from local and loop 6DoF constraints estimated by NeuralBundler and is optimized over SE(3). Empirical evaluation on the KITTI odometry dataset demonstrates that 1) NeuralBundler achieves state-of-the-art performance on unsupervised monocular VO estimation, and 2) our whole approach can achieve efficient loop closing and show favorable overall translational accuracy compared to established monocular SLAM systems.

* Accepted to ICRA'2019

Via

Access Paper or Ask Questions

Strong-Weak Distribution Alignment for Adaptive Object Detection

Dec 12, 2018

Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko

Figure 1 for Strong-Weak Distribution Alignment for Adaptive Object Detection

Figure 2 for Strong-Weak Distribution Alignment for Adaptive Object Detection

Figure 3 for Strong-Weak Distribution Alignment for Adaptive Object Detection

Figure 4 for Strong-Weak Distribution Alignment for Adaptive Object Detection

Abstract:We propose an approach for unsupervised adaptation of object detectors from label-rich to label-poor domains which can significantly reduce annotation costs associated with detection. Recently, approaches that align distributions of source and target images using an adversarial loss have been proven effective for adapting object classifiers. However, for object detection, fully matching the entire distributions of source and target images to each other at the global image level may fail, as domains could have distinct scene layouts and different combinations of objects. On the other hand, strong matching of local features such as texture and color makes sense, as it does not change category level semantics. This motivates us to propose a novel approach for detector adaptation based on strong local alignment and weak global alignment. Our key contribution is the weak alignment model, which focuses the adversarial alignment loss on images that are globally similar and puts less emphasis on aligning images that are globally dissimilar. Additionally, we design the strong domain alignment model to only look at local receptive fields of the feature map. We empirically verify the effectiveness of our approach on several detection datasets comprising both large and small domain shifts.

Via

Access Paper or Ask Questions

Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

Dec 11, 2018

Kohei Watanabe, Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

Figure 2 for Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

Figure 3 for Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

Figure 4 for Multichannel Semantic Segmentation with Unsupervised Domain Adaptation

Abstract:Most contemporary robots have depth sensors, and research on semantic segmentation with RGBD images has shown that depth images boost the accuracy of segmentation. Since it is time-consuming to annotate images with semantic labels per pixel, it would be ideal if we could avoid this laborious work by utilizing an existing dataset or a synthetic dataset which we can generate on our own. Robot motions are often tested in a synthetic environment, where multichannel (eg, RGB + depth + instance boundary) images plus their pixel-level semantic labels are available. However, models trained simply on synthetic images tend to demonstrate poor performance on real images. In order to address this, we propose two approaches that can efficiently exploit multichannel inputs combined with an unsupervised domain adaptation (UDA) algorithm. One is a fusion-based approach that uses depth images as inputs. The other is a multitask learning approach that uses depth images as outputs. We demonstrated that the segmentation results were improved by using a multitask learning approach with a post-process and created a benchmark for this task.

* published on AUTONUE Workshops of ECCV 2018

Via

Access Paper or Ask Questions

Conditional Video Generation Using Action-Appearance Captions

Dec 05, 2018

Shohei Yamamoto, Antonio Tejero-de-Pablos, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Conditional Video Generation Using Action-Appearance Captions

Figure 2 for Conditional Video Generation Using Action-Appearance Captions

Figure 3 for Conditional Video Generation Using Action-Appearance Captions

Figure 4 for Conditional Video Generation Using Action-Appearance Captions

Abstract:The field of automatic video generation has received a boost thanks to the recent Generative Adversarial Networks (GANs). However, most existing methods cannot control the contents of the generated video using a text caption, losing their usefulness to a large extent. This particularly affects human videos due to their great variety of actions and appearances. This paper presents Conditional Flow and Texture GAN (CFT-GAN), a GAN-based video generation method from action-appearance captions. We propose a novel way of generating video by encoding a caption (e.g., "a man in blue jeans is playing golf") in a two-stage generation pipeline. Our CFT-GAN uses such caption to generate an optical flow (action) and a texture (appearance) for each frame. As a result, the output video reflects the content specified in the caption in a plausible way. Moreover, to train our method, we constructed a new dataset for human video generation with captions. We evaluated the proposed method qualitatively and quantitatively via an ablation study and a user study. The results demonstrate that CFT-GAN is able to successfully generate videos containing the action and appearances indicated in the captions.

Via

Access Paper or Ask Questions

Towards Human-Friendly Referring Expression Generation

Nov 29, 2018

Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Towards Human-Friendly Referring Expression Generation

Figure 2 for Towards Human-Friendly Referring Expression Generation

Figure 3 for Towards Human-Friendly Referring Expression Generation

Figure 4 for Towards Human-Friendly Referring Expression Generation

Abstract:This paper addresses the generation of referring expressions that not only refer to objects correctly but also ease human comprehension. As the composition of an image becomes more complicated and a target becomes relatively less salient, identifying referred objects comes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by humans. If the target is not salient, humans utilize relationships with the salient contexts around it to help listeners to comprehend it better. To derive these information from human annotations, our model is designed to extract information from the inside and outside of the target. Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans. We optimized it by using the time required to locate the referred objects by humans and their accuracies. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons. Our proposed method outperformed previous methods both on machine evaluation and on crowd-sourced human evaluation. The source code and dataset will be available soon.

Via

Access Paper or Ask Questions

Label-Noise Robust Generative Adversarial Networks

Nov 27, 2018

Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Label-Noise Robust Generative Adversarial Networks

Figure 2 for Label-Noise Robust Generative Adversarial Networks

Figure 3 for Label-Noise Robust Generative Adversarial Networks

Figure 4 for Label-Noise Robust Generative Adversarial Networks

Abstract:Generative adversarial networks (GANs) are a framework that learns a generative distribution through adversarial training. Recently, their class conditional extensions (e.g., conditional GAN (cGAN) and auxiliary classifier GAN (AC-GAN)) have attracted much attention owing to their ability to learn the disentangled representations and to improve the training stability. However, their training requires the availability of large-scale accurate class-labeled data, which are often laborious or impractical to collect in a real-world scenario. To remedy the drawback, we propose a novel family of GANs called label-noise robust GANs (rGANs), which, by incorporating a noise transition model, can learn a clean label conditional generative distribution even when training labels are noisy. In particular, we propose two variants: rAC-GAN, which is a bridging model between AC-GAN and the noise-robust classification model, and rcGAN, which is an extension of cGAN and is guaranteed to learn the clean label conditional distribution in an optimal condition. In addition to providing the theoretical background, we demonstrate the effectiveness of our models through extensive experiments using diverse GAN configurations, various noise settings, and multiple evaluation metrics (in which we tested 402 patterns in total).

Via

Access Paper or Ask Questions

Class-Distinct and Class-Mutual Image Generation with GANs

Nov 27, 2018

Takuhiro Kaneko, Yoshitaka Ushiku, Tatsuya Harada

Figure 1 for Class-Distinct and Class-Mutual Image Generation with GANs

Figure 2 for Class-Distinct and Class-Mutual Image Generation with GANs

Figure 3 for Class-Distinct and Class-Mutual Image Generation with GANs

Figure 4 for Class-Distinct and Class-Mutual Image Generation with GANs

Abstract:We describe a new problem called class-distinct and class-mutual (DM) image generation. Typically in class-conditional image generation, it is assumed that there are no intersections between classes, and a generative model is optimized to fit discrete class labels. However, in real-world scenarios, it is often required to handle data in which class boundaries are ambiguous or unclear. For example, data crawled from the web tend to contain mislabeled data resulting from confusion. Given such data, our goal is to construct a generative model that can be controlled for class specificity, which we employ to selectively generate class-distinct and class-mutual images in a controllable manner. To achieve this, we propose novel families of generative adversarial networks (GANs) called class-mixture GAN (CMGAN) and class-posterior GAN (CPGAN). In these new networks, we redesign the generator prior and the objective function in auxiliary classifier GAN (AC-GAN), then extend these to class-mixture and arbitrary class-overlapping settings. In addition to an analysis from an information theory perspective, we empirically demonstrate the effectiveness of our proposed models for various class-overlapping settings (including synthetic to real-world settings) and tasks (i.e., image generation and image-to-image translation).

Via

Access Paper or Ask Questions