Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hao-Yu Hsu

Seeing Without Eyes: 4D Human-Scene Understanding from Wearable IMUs

Apr 23, 2026

Hao-Yu Hsu, Tianhang Cheng, Jing Wen, Alexander G. Schwing, Shenlong Wang

Abstract:Understanding human activities and their surrounding environments typically relies on visual perception, yet cameras pose persistent challenges in privacy, safety, energy efficiency, and scalability. We explore an alternative: 4D perception without vision. Its goal is to reconstruct human motion and 3D scene layouts purely from everyday wearable sensors. For this we introduce IMU-to-4D, a framework that repurposes large language models for non-visual spatiotemporal understanding of human-scene dynamics. IMU-to-4D uses data from a few inertial sensors from earbuds, watches, or smartphones and predicts detailed 4D human motion together with coarse scene structure. Experiments across diverse human-scene datasets show that IMU-to-4D yields more coherent and temporally stable results than SoTA cascaded pipelines, suggesting wearable motion sensors alone can support rich 4D understanding.

* Project page: https://tianhang-cheng.github.io/IMU4D

Via

Access Paper or Ask Questions

AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Nov 04, 2024

Hao-Yu Hsu, Zhi-Hao Lin, Albert Zhai, Hongchi Xia, Shenlong Wang

Figure 1 for AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Figure 2 for AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Figure 3 for AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Figure 4 for AutoVFX: Physically Realistic Video Editing from Natural Language Instructions

Abstract:Modern visual effects (VFX) software has made it possible for skilled artists to create imagery of virtually anything. However, the creation process remains laborious, complex, and largely inaccessible to everyday users. In this work, we present AutoVFX, a framework that automatically creates realistic and dynamic VFX videos from a single video and natural language instructions. By carefully integrating neural scene modeling, LLM-based code generation, and physical simulation, AutoVFX is able to provide physically-grounded, photorealistic editing effects that can be controlled directly using natural language instructions. We conduct extensive experiments to validate AutoVFX's efficacy across a diverse spectrum of videos and instructions. Quantitative and qualitative results suggest that AutoVFX outperforms all competing methods by a large margin in generative quality, instruction alignment, editing versatility, and physical plausibility.

* Project page: https://haoyuhsu.github.io/autovfx-website/

Via

Access Paper or Ask Questions

NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

Apr 28, 2022

Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

Figure 1 for NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

Figure 2 for NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

Figure 3 for NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

Figure 4 for NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

Abstract:We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance. NeurMiPs leverages a collection of local planar experts in 3D space as the scene representation. Each planar expert consists of the parameters of the local rectangular shape representing geometry and a neural radiance field modeling the color and opacity. We render novel views by calculating ray-plane intersections and composite output colors and densities at intersected points to the image. NeurMiPs blends the efficiency of explicit mesh rendering and flexibility of the neural radiance field. Experiments demonstrate superior performance and speed of our proposed method, compared to other 3D representations in novel view synthesis.

* CVPR 2022. Project page: https://zhihao-lin.github.io/neurmips/

Via

Access Paper or Ask Questions