Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Norimichi Ukita

Size-Variable Virtual Try-On with Physical Clothes Size

Dec 09, 2024

Yohei Yamashita, Chihiro Nakatani, Norimichi Ukita

Figure 1 for Size-Variable Virtual Try-On with Physical Clothes Size

Figure 2 for Size-Variable Virtual Try-On with Physical Clothes Size

Figure 3 for Size-Variable Virtual Try-On with Physical Clothes Size

Figure 4 for Size-Variable Virtual Try-On with Physical Clothes Size

Abstract:This paper addresses a new virtual try-on problem of fitting any size of clothes to a reference person in the image domain. While previous image-based virtual try-on methods can produce highly natural try-on images, these methods fit the clothes on the person without considering the relative relationship between the physical sizes of the clothes and the person. Different from these methods, our method achieves size-variable virtual try-on in which the image size of the try-on clothes is changed depending on this relative relationship of the physical sizes. To relieve the difficulty in maintaining the physical size of the closes while synthesizing the high-fidelity image of the whole clothes, our proposed method focuses on the residual between the silhouettes of the clothes in the reference and try-on images. We also develop a size-variable virtual try-on dataset consisting of 1,524 images provided by 26 subjects. Furthermore, we propose an evaluation metric for size-variable virtual-try-on. Quantitative and qualitative experimental results show that our method can achieve size-variable virtual try-on better than general virtual try-on methods.

Via

Access Paper or Ask Questions

Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components

Dec 07, 2024

Kazutoshi Akita, Norimichi Ukita

Figure 1 for Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components

Figure 2 for Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components

Figure 3 for Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components

Figure 4 for Test-time Cost-and-Quality Controllable Arbitrary-Scale Super-Resolution with Variable Fourier Components

Abstract:Super-resolution (SR) with arbitrary scale factor and cost-and-quality controllability at test time is essential for various applications. While several arbitrary-scale SR methods have been proposed, these methods require us to modify the model structure and retrain it to control the computational cost and SR quality. To address this limitation, we propose a novel SR method using a Recurrent Neural Network (RNN) with the Fourier representation. In our method, the RNN sequentially estimates Fourier components, each consisting of frequency and amplitude, and aggregates these components to reconstruct an SR image. Since the RNN can adjust the number of recurrences at test time, we can control the computational cost and SR quality in a single model: fewer recurrences (i.e., fewer Fourier components) lead to lower cost but lower quality, while more recurrences (i.e., more Fourier components) lead to better quality but more cost. Experimental results prove that more Fourier components improve the PSNR score. Furthermore, even with fewer Fourier components, our method achieves a lower PSNR drop than other state-of-the-art arbitrary-scale SR methods.

* 14 pages, 10 figures

Via

Access Paper or Ask Questions

Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Apr 08, 2024

Kyotaro Tokoro, Kazutoshi Akita, Norimichi Ukita

Figure 1 for Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Figure 2 for Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Figure 3 for Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Figure 4 for Burst Super-Resolution with Diffusion Models for Improving Perceptual Quality

Abstract:While burst LR images are useful for improving the SR image quality compared with a single LR image, prior SR networks accepting the burst LR images are trained in a deterministic manner, which is known to produce a blurry SR image. In addition, it is difficult to perfectly align the burst LR images, making the SR image more blurry. Since such blurry images are perceptually degraded, we aim to reconstruct the sharp high-fidelity boundaries. Such high-fidelity images can be reconstructed by diffusion models. However, prior SR methods using the diffusion model are not properly optimized for the burst SR task. Specifically, the reverse process starting from a random sample is not optimized for image enhancement and restoration methods, including burst SR. In our proposed method, on the other hand, burst LR features are used to reconstruct the initial burst SR image that is fed into an intermediate step in the diffusion model. This reverse process from the intermediate step 1) skips diffusion steps for reconstructing the global structure of the image and 2) focuses on steps for refining detailed textures. Our experimental results demonstrate that our method can improve the scores of the perceptual quality metrics. Code: https://github.com/placerkyo/BSRD

* Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

Via

Access Paper or Ask Questions

Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks

Mar 23, 2024

Hiroshi Mori, Norimichi Ukita

Figure 1 for Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks

Figure 2 for Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks

Figure 3 for Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks

Figure 4 for Time-series Initialization and Conditioning for Video-agnostic Stabilization of Video Super-Resolution using Recurrent Networks

Abstract:A Recurrent Neural Network (RNN) for Video Super Resolution (VSR) is generally trained with randomly clipped and cropped short videos extracted from original training videos due to various challenges in learning RNNs. However, since this RNN is optimized to super-resolve short videos, VSR of long videos is degraded due to the domain gap. Our preliminary experiments reveal that such degradation changes depending on the video properties, such as the video length and dynamics. To avoid this degradation, this paper proposes the training strategy of RNN for VSR that can work efficiently and stably independently of the video length and dynamics. The proposed training strategy stabilizes VSR by training a VSR network with various RNN hidden states changed depending on the video properties. Since computing such a variety of hidden states is time-consuming, this computational cost is reduced by reusing the hidden states for efficient training. In addition, training stability is further improved with frame-number conditioning. Our experimental results demonstrate that the proposed method performed better than base methods in videos with various lengths and dynamics.

* Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

Via

Access Paper or Ask Questions

Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Mar 23, 2024

Masaya Kotani, Takeru Oba, Norimichi Ukita

Figure 1 for Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Figure 2 for Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Figure 3 for Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Figure 4 for Depth Estimation fusing Image and Radar Measurements with Uncertain Directions

Abstract:This paper proposes a depth estimation method using radar-image fusion by addressing the uncertain vertical directions of sparse radar measurements. In prior radar-image fusion work, image features are merged with the uncertain sparse depths measured by radar through convolutional layers. This approach is disturbed by the features computed with the uncertain radar depths. Furthermore, since the features are computed with a fully convolutional network, the uncertainty of each depth corresponding to a pixel is spread out over its surrounding pixels. Our method avoids this problem by computing features only with an image and conditioning the features pixelwise with the radar depth. Furthermore, the set of possibly correct radar directions is identified with reliable LiDAR measurements, which are available only in the training stage. Our method improves training data by learning only these possibly correct radar directions, while the previous method trains raw radar measurements, including erroneous measurements. Experimental results demonstrate that our method can improve the quantitative and qualitative results compared with its base method using radar-image fusion.

* Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

Via

Access Paper or Ask Questions

Inpainting-Driven Mask Optimization for Object Removal

Mar 23, 2024

Kodai Shimosato, Norimichi Ukita

Abstract:This paper proposes a mask optimization method for improving the quality of object removal using image inpainting. While many inpainting methods are trained with a set of random masks, a target for inpainting may be an object, such as a person, in many realistic scenarios. This domain gap between masks in training and inference images increases the difficulty of the inpainting task. In our method, this domain gap is resolved by training the inpainting network with object masks extracted by segmentation, and such object masks are also used in the inference step. Furthermore, to optimize the object masks for inpainting, the segmentation network is connected to the inpainting network and end-to-end trained to improve the inpainting performance. The effect of this end-to-end training is further enhanced by our mask expansion loss for achieving the trade-off between large and small masks. Experimental results demonstrate the effectiveness of our method for better object removal using image inpainting.

* Accepted to IJCNN 2024 (International Joint Conference on Neural Networks)

Via

Access Paper or Ask Questions

NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Mar 15, 2024

Yuki Kondo, Riku Miyata, Fuma Yasue, Taito Naruki, Norimichi Ukita

Figure 1 for NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Figure 2 for NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Figure 3 for NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Figure 4 for NTIRE 2023 Image Shadow Removal Challenge Technical Report: Team IIM_TTI

Abstract:In this paper, we analyze and discuss ShadowFormer in preparation for the NTIRE2023 Shadow Removal Challenge [1], implementing five key improvements: image alignment, the introduction of a perceptual quality loss function, the semi-automatic annotation for shadow detection, joint learning of shadow detection and removal, and the introduction of new data augmentation technique "CutShadow" for shadow removal. Our method achieved scores of 0.196 (3rd out of 19) in LPIPS and 7.44 (4th out of 19) in the Mean Opinion Score (MOS).

* This version is a brief technical report submitted to the organizers, and there are still some points to be added; please wait for updates until May 2024. The code can be found here (https://github.com/Yuki-11/NTIRE2023_ShadowRemoval_IIM_TTI)

Via

Access Paper or Ask Questions

Learning Group Activity Features Through Person Attribute Prediction

Mar 11, 2024

Chihiro Nakatani, Hiroaki Kawashima, Norimichi Ukita

Abstract:This paper proposes Group Activity Feature (GAF) learning in which features of multi-person activity are learned as a compact latent vector. Unlike prior work in which the manual annotation of group activities is required for supervised learning, our method learns the GAF through person attribute prediction without group activity annotations. By learning the whole network in an end-to-end manner so that the GAF is required for predicting the person attributes of people in a group, the GAF is trained as the features of multi-person activity. As a person attribute, we propose to use a person's action class and appearance features because the former is easy to annotate due to its simpleness, and the latter requires no manual annotation. In addition, we introduce a location-guided attribute prediction to disentangle the complex GAF for extracting the features of each target person properly. Various experimental results validate that our method outperforms SOTA methods quantitatively and qualitatively on two public datasets. Visualization of our GAF also demonstrates that our method learns the GAF representing fined-grained group activity classes. Code: https://github.com/chihina/GAFL-CVPR2024.

* Accepted to CVPR2024

Via

Access Paper or Ask Questions

Active Transfer Learning for Efficient Video-Specific Human Pose Estimation

Nov 08, 2023

Hiromu Taketsugu, Norimichi Ukita

Abstract:Human Pose (HP) estimation is actively researched because of its wide range of applications. However, even estimators pre-trained on large datasets may not perform satisfactorily due to a domain gap between the training and test data. To address this issue, we present our approach combining Active Learning (AL) and Transfer Learning (TL) to adapt HP estimators to individual video domains efficiently. For efficient learning, our approach quantifies (i) the estimation uncertainty based on the temporal changes in the estimated heatmaps and (ii) the unnaturalness in the estimated full-body HPs. These quantified criteria are then effectively combined with the state-of-the-art representativeness criterion to select uncertain and diverse samples for efficient HP estimator learning. Furthermore, we reconsider the existing Active Transfer Learning (ATL) method to introduce novel ideas related to the retraining methods and Stopping Criteria (SC). Experimental results demonstrate that our method enhances learning efficiency and outperforms comparative methods. Our code is publicly available at: https://github.com/ImIntheMiddle/VATL4Pose-WACV2024

* 17 pages, 12 figures, Accepted by WACV 2024

Via

Access Paper or Ask Questions

Fast Inference and Update of Probabilistic Density Estimation on Trajectory Prediction

Aug 17, 2023

Takahiro Maeda, Norimichi Ukita

Abstract:Safety-critical applications such as autonomous vehicles and social robots require fast computation and accurate probability density estimation on trajectory prediction. To address both requirements, this paper presents a new normalizing flow-based trajectory prediction model named FlowChain. FlowChain is a stack of conditional continuously-indexed flows (CIFs) that are expressive and allow analytical probability density computation. This analytical computation is faster than the generative models that need additional approximations such as kernel density estimation. Moreover, FlowChain is more accurate than the Gaussian mixture-based models due to fewer assumptions on the estimated density. FlowChain also allows a rapid update of estimated probability densities. This update is achieved by adopting the \textit{newest observed position} and reusing the flow transformations and its log-det-jacobians that represent the \textit{motion trend}. This update is completed in less than one millisecond because this reuse greatly omits the computational cost. Experimental results showed our FlowChain achieved state-of-the-art trajectory prediction accuracy compared to previous methods. Furthermore, our FlowChain demonstrated superiority in the accuracy and speed of density estimation. Our code is available at \url{https://github.com/meaten/FlowChain-ICCV2023}

* Accepted at ICCV2023

Via

Access Paper or Ask Questions