Abstract:Micro-actions are subtle, localized movements lasting 1-3 seconds such as scratching one's head or tapping fingers. Such subtle actions are essential for social communication, ubiquitously used in natural interactions, and thus critical for fine-grained video understanding, yet remain poorly understood by current computer vision systems. We identify a fundamental challenge: micro-actions exhibit diverse spatio-temporal characteristics where some are defined by spatial configurations while others manifest through temporal dynamics. Existing methods that commit to a single spatio-temporal decomposition cannot accommodate this diversity. We propose a dual-path network that processes anatomically-grounded spatial entities through parallel Spatial-Temporal (ST) and Temporal-Spatial (TS) pathways. The ST path captures spatial configurations before modeling temporal dynamics, while the TS path inverts this order to prioritize temporal dynamics. Rather than fixed fusion, we introduce entity-level adaptive routing where each body part learns its optimal processing preference, complemented by Mutual Action Consistency (MAC) loss that enforces cross-path coherence. Extensive experiments demonstrate competitive performance on MA-52 dataset and state-of-the-art results on iMiGUE dataset. Our work reveals that architectural adaptation to the inherent complexity of micro-actions is essential for advancing fine-grained video understanding.

Abstract:Many neurodevelopmental disorders can be understood as divergent patterns of neural interactions during brain development. Advances in neuroimaging have illuminated these patterns by modeling the brain as a network structure using diffution MRI tractography. However, characterizing and quantifying individual heterogeneity in neurodevelopmental disorders within these highly complex brain networks remains a significant challenge. In this paper, we present for the first time, a framework that integrates deep generative models with graph-based normative modeling to characterize brain network development in the neurotypical population, which can then be used to quantify the individual-level neurodivergence associated with disorders. Our deep generative model incorporates bio-inspired wiring constraints to effectively capture the developmental trajectories of neurotypical brain networks. Neurodivergence is quantified by comparing individuals to this neurotypical trajectory, enabling the creation of region-wise divergence maps that reveal latent developmental differences at each brain regions, along with overall neurodivergence scores based on predicted brain age gaps. We demonstrate the clinical utility of this framework by applying it to a large sample of children with autism spectrum disorders, showing that the individualized region-wise maps help parse the heterogeneity in autism, and the neurodivergence scores correlate with clinical assessments. Together, we provide powerful tools for quantifying neurodevelopmental divergence in brain networks, paying the way for developing imaging markers that will support disorder stratification, monitor progression, and evaluate therapeutic effectiveness.