Abstract:Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person) videos opens up new possibilities for immersive understanding but remains highly challenging due to extreme camera pose variations and minimal view overlap. This task requires faithfully preserving visible content while synthesizing unseen regions in a geometrically consistent manner. To achieve this, we present EgoX, a novel framework for generating egocentric videos from a single exocentric input. EgoX leverages the pretrained spatio temporal knowledge of large-scale video diffusion models through lightweight LoRA adaptation and introduces a unified conditioning strategy that combines exocentric and egocentric priors via width and channel wise concatenation. Additionally, a geometry-guided self-attention mechanism selectively attends to spatially relevant regions, ensuring geometric coherence and high visual fidelity. Our approach achieves coherent and realistic egocentric video generation while demonstrating strong scalability and robustness across unseen and in-the-wild videos.




Abstract:Bradykinesia, characterized by involuntary slowing or decrement of movement, is a fundamental symptom of Parkinson's Disease (PD) and is vital for its clinical diagnosis. Despite various methodologies explored to quantify bradykinesia, computer vision-based approaches have shown promising results. However, these methods often fall short in adequately addressing key bradykinesia characteristics in repetitive limb movements: "occasional arrest" and "decrement in amplitude." This research advances vision-based quantification of bradykinesia by introducing nuanced numerical analysis to capture decrement in amplitudes and employing a simple deep learning technique, LSTM-FCN, for precise classification of occasional arrests. Our approach structures the classification process hierarchically, tailoring it to the unique dynamics of bradykinesia in PD. Statistical analysis of the extracted features, including those representing arrest and fatigue, has demonstrated their statistical significance in most cases. This finding underscores the importance of considering "occasional arrest" and "decrement in amplitude" in bradykinesia quantification of limb movement. Our enhanced diagnostic tool has been rigorously tested on an extensive dataset comprising 1396 motion videos from 310 PD patients, achieving an accuracy of 80.3%. The results confirm the robustness and reliability of our method.

Abstract:Realtime face identification (FID) from a video feed is highly computation-intensive, and may exhaust computation resources if performed on a device with a limited amount of resources (e.g., a mobile device). In general, FID performs better when images are sampled at a higher rate, minimizing false negatives. However, performing it at an overwhelmingly high rate exposes the system to the risk of a queue overflow that hampers the system's reliability. This paper proposes a novel, queue-aware FID framework that adapts the sampling rate to maximize the FID performance while avoiding a queue overflow by implementing the Lyapunov optimization. A preliminary evaluation via a trace-based simulation confirms the effectiveness of the framework.