We introduce a pipeline to address anatomical inaccuracies in Stable Diffusion generated hand images. The initial step involves constructing a specialized dataset, focusing on hand anomalies, to train our models effectively. A finetuned detection model is pivotal for precise identification of these anomalies, ensuring targeted correction. Body pose estimation aids in understanding hand orientation and positioning, crucial for accurate anomaly correction. The integration of ControlNet and InstructPix2Pix facilitates sophisticated inpainting and pixel-level transformation, respectively. This dual approach allows for high-fidelity image adjustments. This comprehensive approach ensures the generation of images with anatomically accurate hands, closely resembling real-world appearances. Our experimental results demonstrate the pipeline's efficacy in enhancing hand image realism in Stable Diffusion outputs. We provide an online demo at https://fixhand.yiqun.io
Skeleton sequences are compact and lightweight. Numerous skeleton-based action recognizers have been proposed to classify human behaviors. In this work, we aim to incorporate components that are compatible with existing models and further improve their accuracy. To this end, we design two temporal accessories: discrete cosine encoding (DCE) and chronological loss (CRL). DCE facilitates models to analyze motion patterns from the frequency domain and meanwhile alleviates the influence of signal noise. CRL guides networks to explicitly capture the sequence's chronological order. These two components consistently endow many recently-proposed action recognizers with accuracy boosts, achieving new state-of-the-art (SOTA) accuracy on two large benchmark datasets (NTU60 and NTU120).
Skeleton-based action recognition, as a subarea of action recognition, is swiftly accumulating attention and popularity. The task is to recognize actions performed by human articulation points. Compared with other data modalities, 3D human skeleton representations have extensive unique desirable characteristics, including succinctness, robustness, racial-impartiality, and many more. We aim to provide a roadmap for new and existing researchers a on the landscapes of skeleton-based action recognition for new and existing researchers. To this end, we present a review in the form of a taxonomy on existing works of skeleton-based action recognition. We partition them into four major categories: (1) datasets; (2) extracting spatial features; (3) capturing temporal patterns; (4) improving signal quality. For each method, we provide concise yet informatively-sufficient descriptions. To promote more fair and comprehensive evaluation on existing approaches of skeleton-based action recognition, we collect ANUBIS, a large-scale human skeleton dataset. Compared with previously collected dataset, ANUBIS are advantageous in the following four aspects: (1) employing more recently released sensors; (2) containing novel back view; (3) encouraging high enthusiasm of subjects; (4) including actions of the COVID pandemic era. Using ANUBIS, we comparably benchmark performance of current skeleton-based action recognizers. At the end of this paper, we outlook future development of skeleton-based action recognition by listing several new technical problems. We believe they are valuable to solve in order to commercialize skeleton-based action recognition in the near future. The dataset of ANUBIS is available at: http://hcc-workshop.anu.edu.au/webs/anu101/home.
Skeleton-based action recognition, as a subarea of action recognition, is swiftly accumulating attention and popularity. The task is to recognize actions performed by human articulation points. Compared with other data modalities, 3D human skeleton representations have extensive unique desirable characteristics, including succinctness, robustness, racial-impartiality, and many more. We aim to provide a roadmap for new and existing researchers a on the landscapes of skeleton-based action recognition for new and existing researchers. To this end, we present a review in the form of a taxonomy on existing works of skeleton-based action recognition. We partition them into four major categories: (1) datasets; (2) extracting spatial features; (3) capturing temporal patterns; (4) improving signal quality. For each method, we provide concise yet informatively-sufficient descriptions. To promote more fair and comprehensive evaluation on existing approaches of skeleton-based action recognition, we collect ANUBIS, a large-scale human skeleton dataset. Compared with previously collected dataset, ANUBIS are advantageous in the following four aspects: (1) employing more recently released sensors; (2) containing novel back view; (3) encouraging high enthusiasm of subjects; (4) including actions of the COVID pandemic era. Using ANUBIS, we comparably benchmark performance of current skeleton-based action recognizers. At the end of this paper, we outlook future development of skeleton-based action recognition by listing several new technical problems. We believe they are valuable to solve in order to commercialize skeleton-based action recognition in the near future. The dataset of ANUBIS is available at: http://hcc-workshop.anu.edu.au/webs/anu101/home.
The skeleton-based action recognition attracts practitioners and researchers due to the lightweight, compact nature of datasets. Compared with RGB-video-based action recognition, skeleton-based action recognition is a safer way to protect the privacy of subjects while having competitive recognition performance. However, due to the improvements of skeleton estimation algorithms as well as motion- and depth-sensors, more details of motion characteristics can be preserved in the skeleton dataset, leading to a potential privacy leakage from the dataset. To investigate the potential privacy leakage from the skeleton datasets, we first train a classifier to categorize sensitive private information from a trajectory of joints. Experiments show the model trained to classify gender can predict with 88% accuracy and re-identify a person with 82% accuracy. We propose two variants of anonymization algorithms to protect the potential privacy leakage from the skeleton dataset. Experimental results show that the anonymized dataset can reduce the risk of privacy leakage while having marginal effects on the action recognition performance.
Discrete gene regulatory networks (GRNs) play a vital role in the study of robustness and modularity. A common method of evaluating the robustness of GRNs is to measure their ability to regulate a set of perturbed gene activation patterns back to their unperturbed forms. Usually, perturbations are obtained by collecting random samples produced by a predefined distribution of gene activation patterns. This sampling method introduces stochasticity, in turn inducing dynamicity. This dynamicity is imposed on top of an already complex fitness landscape. So where sampling is used, it is important to understand which effects arise from the structure of the fitness landscape, and which arise from the dynamicity imposed on it. Stochasticity of the fitness function also causes difficulties in reproducibility and in post-experimental analyses. We develop a deterministic distributional fitness evaluation by considering the complete distribution of gene activity patterns, so as to avoid stochasticity in fitness assessment. This fitness evaluation facilitates repeatability. Its determinism permits us to ascertain theoretical bounds on the fitness, and thus to identify whether the algorithm has reached a global optimum. It enables us to differentiate the effects of the problem domain from those of the noisy fitness evaluation, and thus to resolve two remaining anomalies in the behaviour of the problem domain of~\citet{espinosa2010specialization}. We also reveal some properties of solution GRNs that lead them to be robust and modular, leading to a deeper understanding of the nature of the problem domain. We conclude by discussing potential directions toward simulating and understanding the emergence of modularity in larger, more complex domains, which is key both to generating more useful modular solutions, and to understanding the ubiquity of modularity in biological systems.
We study how to evaluate the quantitative information content of a region within an image for a particular label. To this end, we bridge class activation maps with information theory. We develop an informative class activation map (infoCAM). Given a classification task, infoCAM depict how to accumulate information of partial regions to that of the entire image toward a label. Thus, we can utilise infoCAM to locate the most informative features for a label. When applied to an image classification task, infoCAM performs better than the traditional classification map in the weakly supervised object localisation task. We achieve state-of-the-art results on Tiny-ImageNet.
Cross-entropy loss with softmax output is a standard choice to train neural network classifiers. We give a new view of neural network classifiers with softmax and cross-entropy as mutual information evaluators. We show that when the dataset is balanced, training a neural network with cross-entropy maximises the mutual information between inputs and labels through a variational form of mutual information. Thereby, we develop a new form of softmax that also converts a classifier to a mutual information evaluator when the dataset is imbalanced. Experimental results show that the new form leads to better classification accuracy, in particular for imbalanced datasets.
Most existing graph neural networks (GNNs) learn node embeddings using the framework of message passing and aggregation. Such GNNs are incapable of learning relative positions between graph nodes within a graph. To empower GNNs with the awareness of node positions, some nodes are set as anchors. Then, using the distances from a node to the anchors, GNNs can infer relative positions between nodes. However, P-GNNs arbitrarily select anchors, leading to compromising position-awareness and feature extraction. To eliminate this compromise, we demonstrate that selecting evenly distributed and asymmetric anchors is essential. On the other hand, we show that choosing anchors that can aggregate embeddings of all the nodes within a graph is NP-hard. Therefore, devising efficient optimal algorithms in a deterministic approach is practically not feasible. To ensure position-awareness and bypass NP-completeness, we propose Position-Sensing Graph Neural Networks (PSGNNs), learning how to choose anchors in a back-propagatable fashion. Experiments verify the effectiveness of PSGNNs against state-of-the-art GNNs, substantially improving performance on various synthetic and real-world graph datasets while enjoying stable scalability. Specifically, PSGNNs on average boost AUC more than 14% for pairwise node classification and 18% for link prediction over the existing state-of-the-art position-aware methods. Our source code is publicly available at: https://github.com/ZhenyueQin/PSGNN