Abstract:Long-range human movement generation remains a central challenge in computer vision and graphics. Generating coherent transitions across semantically distinct motion domains remains largely unexplored. This capability is particularly important for applications such as dance choreography, where movements must fluidly transition across diverse stylistic and semantic motifs. We propose a simple and effective inference-time optimization framework inspired by diffusion-based stochastic optimal control. Specifically, a control-energy objective that explicitly regularizes the transition trajectories of a pretrained diffusion model. We show that optimizing this objective at inference time yields transitions with fidelity and temporal coherence. This is the first work to provide a general framework for controlled long-range human motion generation with explicit transition modeling.
Abstract:Video neural network (VNN) processing using the conventional pipeline first converts Bayer video information into human understandable RGB videos using image signal processing (ISP) on a pixel by pixel basis. Then, VNN processing is performed on a frame by frame basis. Both ISP and VNN are computationally expensive with high power consumption and latency. In this paper, we propose an efficient VNN processing framework. Instead of using ISP, computer vision tasks are directly accomplished using Bayer pattern information. To accelerate VNN processing, motion estimation is introduced to find temporal redundancies in input video data so as to avoid repeated and unnecessary computations. Experiments show greater than 67\% computation reduction, while maintaining computer vision task accuracy for typical computer vision tasks and data sets.