Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Repcount

GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Aug 31, 2024

Jun Li, Jinying Wu, Qiming Li, Feifei Guo

Figure 1 for GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Figure 2 for GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Figure 3 for GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Figure 4 for GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Abstract:With the continuous development of deep learning, the field of repetitive action counting is gradually gaining notice from many researchers. Extraction of pose keypoints using human pose estimation networks is proven to be an effective pose-level method. However, existing pose-level methods suffer from the shortcomings that the single coordinate is not stable enough to handle action distortions due to changes in camera viewpoints, thus failing to accurately identify salient poses, and is vulnerable to misdetection during the transition from the exception to the actual action. To overcome these problems, we propose a simple but efficient Global Multi-geometric Feature Learning Network (GMFL-Net). Specifically, we design a MIA-Module that aims to improve information representation by fusing multi-geometric features, and learning the semantic similarity among the input multi-geometric features. Then, to improve the feature representation from a global perspective, we also design a GBFL-Module that enhances the inter-dependencies between point-wise and channel-wise elements and combines them with the rich local information generated by the MIA-Module to synthesise a comprehensive and most representative global feature representation. In addition, considering the insufficient existing dataset, we collect a new dataset called Countix-Fitness-pose (https://github.com/Wantong66/Countix-Fitness) which contains different cycle lengths and exceptions, a test set with longer duration, and annotate it with fine-grained annotations at the pose-level. We also add two new action classes, namely lunge and rope push-down. Finally, extensive experiments on the challenging RepCount-pose, UCFRep-pose, and Countix-Fitness-pose benchmarks show that our proposed GMFL-Net achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

Rethinking temporal self-similarity for repetitive action counting

Jul 12, 2024

Yanan Luo, Jinhui Yi, Yazan Abu Farha, Moritz Wolter, Juergen Gall

Figure 1 for Rethinking temporal self-similarity for repetitive action counting

Figure 2 for Rethinking temporal self-similarity for repetitive action counting

Figure 3 for Rethinking temporal self-similarity for repetitive action counting

Figure 4 for Rethinking temporal self-similarity for repetitive action counting

Abstract:Counting repetitive actions in long untrimmed videos is a challenging task that has many applications such as rehabilitation. State-of-the-art methods predict action counts by first generating a temporal self-similarity matrix (TSM) from the sampled frames and then feeding the matrix to a predictor network. The self-similarity matrix, however, is not an optimal input to a network since it discards too much information from the frame-wise embeddings. We thus rethink how a TSM can be utilized for counting repetitive actions and propose a framework that learns embeddings and predicts action start probabilities at full temporal resolution. The number of repeated actions is then inferred from the action start probabilities. In contrast to current approaches that have the TSM as an intermediate representation, we propose a novel loss based on a generated reference TSM, which enforces that the self-similarity of the learned frame-wise embeddings is consistent with the self-similarity of repeated actions. The proposed framework achieves state-of-the-art results on three datasets, i.e., RepCount, UCFRep, and Countix.

* Accepted to ICIP 2024

Via

Access Paper or Ask Questions

FCA-RAC: First Cycle Annotated Repetitive Action Counting

Jun 18, 2024

Jiada Lu, WeiWei Zhou, Xiang Qian, Dongze Lian, Yanyu Xu, Weifeng Wang, Lina Cao, Shenghua Gao

Figure 1 for FCA-RAC: First Cycle Annotated Repetitive Action Counting

Figure 2 for FCA-RAC: First Cycle Annotated Repetitive Action Counting

Figure 3 for FCA-RAC: First Cycle Annotated Repetitive Action Counting

Figure 4 for FCA-RAC: First Cycle Annotated Repetitive Action Counting

Abstract:Repetitive action counting quantifies the frequency of specific actions performed by individuals. However, existing action-counting datasets have limited action diversity, potentially hampering model performance on unseen actions. To address this issue, we propose a framework called First Cycle Annotated Repetitive Action Counting (FCA-RAC). This framework contains 4 parts: 1) a labeling technique that annotates each training video with the start and end of the first action cycle, along with the total action count. This technique enables the model to capture the correlation between the initial action cycle and subsequent actions; 2) an adaptive sampling strategy that maximizes action information retention by adjusting to the speed of the first annotated action cycle in videos; 3) a Multi-Temporal Granularity Convolution (MTGC) module, that leverages the muli-scale first action as a kernel to convolve across the entire video. This enables the model to capture action variations at different time scales within the video; 4) a strategy called Training Knowledge Augmentation (TKA) that exploits the annotated first action cycle information from the entire dataset. This allows the network to harness shared characteristics across actions effectively, thereby enhancing model performance and generalizability to unseen actions. Experimental results demonstrate that our approach achieves superior outcomes on RepCount-A and related datasets, highlighting the efficacy of our framework in improving model performance on seen and unseen actions. Our paper makes significant contributions to the field of action counting by addressing the limitations of existing datasets and proposing novel techniques for improving model generalizability.

Via

Access Paper or Ask Questions

Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Jun 13, 2024

Zhengqi Zhao, Xiaohu Huang, Hao Zhou, Kun Yao, Errui Ding, Jingdong Wang, Xinggang Wang, Wenyu Liu, Bin Feng

Figure 1 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 2 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 3 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Figure 4 for Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Abstract:The key to action counting is accurately locating each video's repetitive actions. Instead of estimating the probability of each frame belonging to an action directly, we propose a dual-branch network, i.e., SkimFocusNet, working in a two-step manner. The model draws inspiration from empirical observations indicating that humans typically engage in coarse skimming of entire sequences to grasp the general action pattern initially, followed by a finer, frame-by-frame focus to determine if it aligns with the target action. Specifically, SkimFocusNet incorporates a skim branch and a focus branch. The skim branch scans the global contextual information throughout the sequence to identify potential target action for guidance. Subsequently, the focus branch utilizes the guidance to diligently identify repetitive actions using a long-short adaptive guidance (LSAG) block. Additionally, we have observed that videos in existing datasets often feature only one type of repetitive action, which inadequately represents real-world scenarios. To more accurately describe real-life situations, we establish the Multi-RepCount dataset, which includes videos containing multiple repetitive motions. On Multi-RepCount, our SkimFoucsNet can perform specified action counting, that is, to enable counting a particular action type by referencing an exemplary video. This capability substantially exhibits the robustness of our method. Extensive experiments demonstrate that SkimFocusNet achieves state-of-the-art performances with significant improvements. We also conduct a thorough ablation study to evaluate the network components. The source code will be published upon acceptance.

* 13 pages, 9 figures

Via

Access Paper or Ask Questions

Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Mar 26, 2024

Saptarshi Sinha, Alexandros Stergiou, Dima Damen

Figure 1 for Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Figure 2 for Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Figure 3 for Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Figure 4 for Every Shot Counts: Using Exemplars for Repetition Counting in Videos

Abstract:Video repetition counting infers the number of repetitions of recurring actions or motion within a video. We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. In training, ESCounts regresses locations of high correspondence to the exemplars within the video. In tandem, our method learns a latent that encodes representations of general repetitive motions, which we use for exemplar-free, zero-shot inference. Extensive experiments over commonly used datasets (RepCount, Countix, and UCFRep) showcase ESCounts obtaining state-of-the-art performance across all three datasets. On RepCount, ESCounts increases the off-by-one from 0.39 to 0.56 and decreases the mean absolute error from 0.38 to 0.21. Detailed ablations further demonstrate the effectiveness of our method.

* Project website: https://sinhasaptarshi.github.io/escounts

Via

Access Paper or Ask Questions

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

Mar 20, 2024

Hang Wang, Zhi-Qi Cheng, Youtian Du, Lei Zhang

Abstract:Video Action Counting (VAC) is crucial in analyzing sports, fitness, and everyday activities by quantifying repetitive actions in videos. However, traditional VAC methods have overlooked the complexity of action repetitions, such as interruptions and the variability in cycle duration. Our research addresses the shortfall by introducing a novel approach to VAC, called Irregular Video Action Counting (IVAC). IVAC prioritizes modeling irregular repetition patterns in videos, which we define through two primary aspects: Inter-cycle Consistency and Cycle-interval Inconsistency. Inter-cycle Consistency ensures homogeneity in the spatial-temporal representations of cycle segments, signifying action uniformity within cycles. Cycle-interval inconsistency highlights the importance of distinguishing between cycle segments and intervals based on their inherent content differences. To encapsulate these principles, we propose a new methodology that includes consistency and inconsistency modules, supported by a unique pull-push loss (P2L) mechanism. The IVAC-P2L model applies a pull loss to promote coherence among cycle segment features and a push loss to clearly distinguish features of cycle segments from interval segments. Empirical evaluations conducted on the RepCount dataset demonstrate that the IVAC-P2L model sets a new benchmark in VAC task performance. Furthermore, the model demonstrates exceptional adaptability and generalization across various video contents, outperforming existing models on two additional datasets, UCFRep and Countix, without the need for dataset-specific optimization. These results confirm the efficacy of our approach in addressing irregular repetitions in videos and pave the way for further advancements in video analysis and understanding.

* Source code: https://github.com/hwang-cs-ime/IVAC-P2L

Via

Access Paper or Ask Questions

Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Aug 15, 2023

Haodong Chen, Ming C. Leu, Md Moniruzzaman, Zhaozheng Yin, Solmaz Hajmohammadi, Zhuoqing Chang

Figure 1 for Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Figure 2 for Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Figure 3 for Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Figure 4 for Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance

Abstract:Repetitive counting (RepCount) is critical in various applications, such as fitness tracking and rehabilitation. Previous methods have relied on the estimation of red-green-and-blue (RGB) frames and body pose landmarks to identify the number of action repetitions, but these methods suffer from a number of issues, including the inability to stably handle changes in camera viewpoints, over-counting, under-counting, difficulty in distinguishing between sub-actions, inaccuracy in recognizing salient poses, etc. In this paper, based on the work done by [1], we integrate joint angles with body pose landmarks to address these challenges and achieve better results than the state-of-the-art RepCount methods, with a Mean Absolute Error (MAE) of 0.211 and an Off-By-One (OBO) counting accuracy of 0.599 on the RepCount data set [2]. Comprehensive experimental results demonstrate the effectiveness and robustness of our method.

* 6 pages, 9 figures

Via

Access Paper or Ask Questions

Topic:Repcount

Papers and Code

GMFL-Net: A Global Multi-geometric Feature Learning Network for Repetitive Action Counting

Rethinking temporal self-similarity for repetitive action counting

FCA-RAC: First Cycle Annotated Repetitive Action Counting

Skim then Focus: Integrating Contextual and Fine-grained Views for Repetitive Action Counting

Every Shot Counts: Using Exemplars for Repetition Counting in Videos

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting

Advancements in Repetitive Action Counting: Joint-Based PoseRAC Model With Improved Performance