Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tatsuya Harada

The University of Tokyo, RIKEN AIP

Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Apr 02, 2025

Ziteng Cui, Xuangeng Chu, Tatsuya Harada

Figure 1 for Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Figure 2 for Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Figure 3 for Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Figure 4 for Luminance-GS: Adapting 3D Gaussian Splatting to Challenging Lighting Conditions with View-Adaptive Curve Adjustment

Abstract:Capturing high-quality photographs under diverse real-world lighting conditions is challenging, as both natural lighting (e.g., low-light) and camera exposure settings (e.g., exposure time) significantly impact image quality. This challenge becomes more pronounced in multi-view scenarios, where variations in lighting and image signal processor (ISP) settings across viewpoints introduce photometric inconsistencies. Such lighting degradations and view-dependent variations pose substantial challenges to novel view synthesis (NVS) frameworks based on Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS). To address this, we introduce Luminance-GS, a novel approach to achieving high-quality novel view synthesis results under diverse challenging lighting conditions using 3DGS. By adopting per-view color matrix mapping and view-adaptive curve adjustments, Luminance-GS achieves state-of-the-art (SOTA) results across various lighting conditions -- including low-light, overexposure, and varying exposure -- while not altering the original 3DGS explicit representation. Compared to previous NeRF- and 3DGS-based baselines, Luminance-GS provides real-time rendering speed with improved reconstruction quality.

* CVPR 2025, project page: https://cuiziteng.github.io/Luminance_GS_web/

Via

Access Paper or Ask Questions

Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Mar 10, 2025

Gexin Huang, Ruinan Jin, Yucheng Tang, Can Zhao, Tatsuya Harada, Xiaoxiao Li, Gu Lin

Figure 1 for Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Figure 2 for Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Figure 3 for Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Figure 4 for Interactive Tumor Progression Modeling via Sketch-Based Image Editing

Abstract:Accurately visualizing and editing tumor progression in medical imaging is crucial for diagnosis, treatment planning, and clinical communication. To address the challenges of subjectivity and limited precision in existing methods, we propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing. By leveraging sketches as structural priors, our method enables precise modifications of tumor regions while maintaining structural integrity and visual realism. We evaluate SkEditTumor on four public datasets - BraTS, LiTS, KiTS, and MSD-Pancreas - covering diverse organs and imaging modalities. Experimental results demonstrate that our method outperforms state-of-the-art baselines, achieving superior image fidelity and segmentation accuracy. Our contributions include a novel integration of sketches with diffusion models for medical image editing, fine-grained control over tumor progression visualization, and extensive validation across multiple datasets, setting a new benchmark in the field.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model

Feb 28, 2025

Xuangeng Chu, Nabarun Goswami, Ziteng Cui, Hanqin Wang, Tatsuya Harada

Figure 1 for ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model

Figure 2 for ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model

Figure 3 for ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model

Figure 4 for ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model

Abstract:Speech-driven 3D facial animation aims to generate realistic lip movements and facial expressions for 3D head models from arbitrary audio clips. Although existing diffusion-based methods are capable of producing natural motions, their slow generation speed limits their application potential. In this paper, we introduce a novel autoregressive model that achieves real-time generation of highly synchronized lip movements and realistic head poses and eye blinks by learning a mapping from speech to a multi-scale motion codebook. Furthermore, our model can adapt to unseen speaking styles using sample motion sequences, enabling the creation of 3D talking avatars with unique personal styles beyond the identities seen during training. Extensive evaluations and user studies demonstrate that our method outperforms existing approaches in lip synchronization accuracy and perceived quality.

* More video demonstrations, code, models and data can be found on our project website: http://xg-chu.site/project_artalk/

Via

Access Paper or Ask Questions

Discovering an Image-Adaptive Coordinate System for Photography Processing

Jan 11, 2025

Ziteng Cui, Lin Gu, Tatsuya Harada

Abstract:Curve & Lookup Table (LUT) based methods directly map a pixel to the target output, making them highly efficient tools for real-time photography processing. However, due to extreme memory complexity to learn full RGB space mapping, existing methods either sample a discretized 3D lattice to build a 3D LUT or decompose into three separate curves (1D LUTs) on the RGB channels. Here, we propose a novel algorithm, IAC, to learn an image-adaptive Cartesian coordinate system in the RGB color space before performing curve operations. This end-to-end trainable approach enables us to efficiently adjust images with a jointly learned image-adaptive coordinate system and curves. Experimental results demonstrate that this simple strategy achieves state-of-the-art (SOTA) performance in various photography processing tasks, including photo retouching, exposure correction, and white-balance editing, while also maintaining a lightweight design and fast inference speed.

* BMVC 2024

Via

Access Paper or Ask Questions

Emergence of Painting Ability via Recognition-Driven Evolution

Jan 09, 2025

Yi Lin, Lin Gu, Ziteng Cui, Shenghan Su, Yumo Hao, Yingtao Tian, Tatsuya Harada, Jianfei Yang

Abstract:From Paleolithic cave paintings to Impressionism, human painting has evolved to depict increasingly complex and detailed scenes, conveying more nuanced messages. This paper attempts to emerge this artistic capability by simulating the evolutionary pressures that enhance visual communication efficiency. Specifically, we present a model with a stroke branch and a palette branch that together simulate human-like painting. The palette branch learns a limited colour palette, while the stroke branch parameterises each stroke using B\'ezier curves to render an image, subsequently evaluated by a high-level recognition module. We quantify the efficiency of visual communication by measuring the recognition accuracy achieved with machine vision. The model then optimises the control points and colour choices for each stroke to maximise recognition accuracy with minimal strokes and colours. Experimental results show that our model achieves superior performance in high-level recognition tasks, delivering artistic expression and aesthetic appeal, especially in abstract sketches. Additionally, our approach shows promise as an efficient bit-level image compression technique, outperforming traditional methods.

Via

Access Paper or Ask Questions

Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Dec 27, 2024

Junjie Zhang, Zhimin Zong, Lin Gu, Shenghan Su, Ziteng Cui, Yan Pu, Zirui Chen, Jing Lu, Daisuke Kojima, Tatsuya Harada(+1 more)

Figure 1 for Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Figure 2 for Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Figure 3 for Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Figure 4 for Paleoinspired Vision: From Exploring Colour Vision Evolution to Inspiring Camera Design

Abstract:The evolution of colour vision is captivating, as it reveals the adaptive strategies of extinct species while simultaneously inspiring innovations in modern imaging technology. In this study, we present a simplified model of visual transduction in the retina, introducing a novel opsin layer. We quantify evolutionary pressures by measuring machine vision recognition accuracy on colour images shaped by specific opsins. Building on this, we develop an evolutionary conservation optimisation algorithm to reconstruct the spectral sensitivity of opsins, enabling mutation-driven adaptations to to more effectively spot fruits or predators. This model condenses millions of years of evolution within seconds on GPU, providing an experimental framework to test long-standing hypotheses in evolutionary biology , such as vision of early mammals, primate trichromacy from gene duplication, retention of colour blindness, blue-shift of fish rod and multiple rod opsins with bioluminescence. Moreover, the model enables speculative explorations of hypothetical species, such as organisms with eyes adapted to the conditions on Mars. Our findings suggest a minimalist yet effective approach to task-specific camera filter design, optimising the spectral response function to meet application-driven demands. The code will be made publicly available upon acceptance.

* 15 pages, 6 figures

Via

Access Paper or Ask Questions

Generalizable and Animatable Gaussian Head Avatar

Oct 10, 2024

Xuangeng Chu, Tatsuya Harada

Abstract:In this paper, we propose Generalizable and Animatable Gaussian head Avatar (GAGAvatar) for one-shot animatable head avatar reconstruction. Existing methods rely on neural radiance fields, leading to heavy rendering consumption and low reenactment speeds. To address these limitations, we generate the parameters of 3D Gaussians from a single image in a single forward pass. The key innovation of our work is the proposed dual-lifting method, which produces high-fidelity 3D Gaussians that capture identity and facial details. Additionally, we leverage global image features and the 3D morphable model to construct 3D Gaussians for controlling expressions. After training, our model can reconstruct unseen identities without specific optimizations and perform reenactment rendering at real-time speeds. Experiments show that our method exhibits superior performance compared to previous methods in terms of reconstruction quality and expression accuracy. We believe our method can establish new benchmarks for future research and advance applications of digital avatars. Code and demos are available https://github.com/xg-chu/GAGAvatar.

* NeurIPS 2024, code is available at https://github.com/xg-chu/GAGAvatar, more demos are available at https://xg-chu.site/project_gagavatar

Via

Access Paper or Ask Questions

RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Aug 27, 2024

Ziteng Cui, Tatsuya Harada

Figure 1 for RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Figure 2 for RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Figure 3 for RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Figure 4 for RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images

Abstract:sRGB images are now the predominant choice for pre-training visual models in computer vision research, owing to their ease of acquisition and efficient storage. Meanwhile, the advantage of RAW images lies in their rich physical information under variable real-world challenging lighting conditions. For computer vision tasks directly based on camera RAW data, most existing studies adopt methods of integrating image signal processor (ISP) with backend networks, yet often overlook the interaction capabilities between the ISP stages and subsequent networks. Drawing inspiration from ongoing adapter research in NLP and CV areas, we introduce RAW-Adapter, a novel approach aimed at adapting sRGB pre-trained models to camera RAW data. RAW-Adapter comprises input-level adapters that employ learnable ISP stages to adjust RAW inputs, as well as model-level adapters to build connections between ISP stages and subsequent high-level networks. Additionally, RAW-Adapter is a general framework that could be used in various computer vision frameworks. Abundant experiments under different lighting conditions have shown our algorithm's state-of-the-art (SOTA) performance, demonstrating its effectiveness and efficiency across a range of real-world and synthetic datasets.

* ECCV 2024, code link: https://github.com/cuiziteng/ECCV_RAW_Adapter

Via

Access Paper or Ask Questions

Frequency-aware Feature Fusion for Dense Image Prediction

Aug 23, 2024

Linwei Chen, Ying Fu, Lin Gu, Chenggang Yan, Tatsuya Harada, Gao Huang

Figure 1 for Frequency-aware Feature Fusion for Dense Image Prediction

Figure 2 for Frequency-aware Feature Fusion for Dense Image Prediction

Figure 3 for Frequency-aware Feature Fusion for Dense Image Prediction

Figure 4 for Frequency-aware Feature Fusion for Dense Image Prediction

Abstract:Dense image prediction tasks demand features with strong category information and precise spatial boundary details at high resolution. To achieve this, modern hierarchical models often utilize feature fusion, directly adding upsampled coarse features from deep layers and high-resolution features from lower levels. In this paper, we observe rapid variations in fused feature values within objects, resulting in intra-category inconsistency due to disturbed high-frequency features. Additionally, blurred boundaries in fused features lack accurate high frequency, leading to boundary displacement. Building upon these observations, we propose Frequency-Aware Feature Fusion (FreqFusion), integrating an Adaptive Low-Pass Filter (ALPF) generator, an offset generator, and an Adaptive High-Pass Filter (AHPF) generator. The ALPF generator predicts spatially-variant low-pass filters to attenuate high-frequency components within objects, reducing intra-class inconsistency during upsampling. The offset generator refines large inconsistent features and thin boundaries by replacing inconsistent features with more consistent ones through resampling, while the AHPF generator enhances high-frequency detailed boundary information lost during downsampling. Comprehensive visualization and quantitative analysis demonstrate that FreqFusion effectively improves feature consistency and sharpens object boundaries. Extensive experiments across various dense prediction tasks confirm its effectiveness. The code is made publicly available at https://github.com/Linwei-Chen/FreqFusion.

* Accepted by TPAMI (2024)

Via

Access Paper or Ask Questions

DistML.js: Installation-free Distributed Deep Learning Framework for Web Browsers

Jul 01, 2024

Masatoshi Hidaka, Tomohiro Hashimoto, Yuto Nishizawa, Tatsuya Harada

Figure 1 for DistML.js: Installation-free Distributed Deep Learning Framework for Web Browsers

Figure 2 for DistML.js: Installation-free Distributed Deep Learning Framework for Web Browsers

Figure 3 for DistML.js: Installation-free Distributed Deep Learning Framework for Web Browsers

Abstract:We present "DistML.js", a library designed for training and inference of machine learning models within web browsers. Not only does DistML.js facilitate model training on local devices, but it also supports distributed learning through communication with servers. Its design and define-by-run API for deep learning model construction resemble PyTorch, thereby reducing the learning curve for prototyping. Matrix computations involved in model training and inference are executed on the backend utilizing WebGL, enabling high-speed calculations. We provide a comprehensive explanation of DistML.js's design, API, and implementation, alongside practical applications including data parallelism in learning. The source code is publicly available at https://github.com/mil-tokyo/distmljs.

Via

Access Paper or Ask Questions