Modeling the interaction between humans and objects has been an emerging research direction in recent years. Capturing human-object interaction is however a very challenging task due to heavy occlusion and complex dynamics, which requires understanding not only 3D human pose, and object pose but also the interaction between them. Reconstruction of 3D humans and objects has been two separate research fields in computer vision for a long time. We hence proposed the first RHOBIN challenge: reconstruction of human-object interactions in conjunction with the RHOBIN workshop. It was aimed at bringing the research communities of human and object reconstruction as well as interaction modeling together to discuss techniques and exchange ideas. Our challenge consists of three tracks of 3D reconstruction from monocular RGB images with a focus on dealing with challenging interaction scenarios. Our challenge attracted more than 100 participants with more than 300 submissions, indicating the broad interest in the research communities. This paper describes the settings of our challenge and discusses the winning methods of each track in more detail. We observe that the human reconstruction task is becoming mature even under heavy occlusion settings while object pose estimation and joint reconstruction remain challenging tasks. With the growing interest in interaction modeling, we hope this report can provide useful insights and foster future research in this direction. Our workshop website can be found at \href{https://rhobin-challenge.github.io/}{https://rhobin-challenge.github.io/}.
Consistency Models (CMs) have showed a promise in creating visual content efficiently and with high quality. However, the way to add new conditional controls to the pretrained CMs has not been explored. In this technical report, we consider alternative strategies for adding ControlNet-like conditional control to CMs and present three significant findings. 1) ControlNet trained for diffusion models (DMs) can be directly applied to CMs for high-level semantic controls but struggles with low-level detail and realism control. 2) CMs serve as an independent class of generative models, based on which ControlNet can be trained from scratch using Consistency Training proposed by Song et al. 3) A lightweight adapter can be jointly optimized under multiple conditions through Consistency Training, allowing for the swift transfer of DMs-based ControlNet to CMs. We study these three solutions across various conditional controls, including edge, depth, human pose, low-resolution image and masked image with text-to-image latent consistency models.
Human can extrapolate well, generalize daily knowledge into unseen scenarios, raise and answer counterfactual questions. To imitate this ability via generative models, previous works have extensively studied explicitly encoding Structural Causal Models (SCMs) into architectures of generator networks. This methodology, however, limits the flexibility of the generator as they must be carefully crafted to follow the causal graph, and demands a ground truth SCM with strong ignorability assumption as prior, which is a nontrivial assumption in many real scenarios. Thus, many current causal GAN methods fail to generate high fidelity counterfactual results as they cannot easily leverage state-of-the-art generative models. In this paper, we propose to study counterfactual synthesis from a new perspective of knowledge extrapolation, where a given knowledge dimension of the data distribution is extrapolated, but the remaining knowledge is kept indistinguishable from the original distribution. We show that an adversarial game with a closed-form discriminator can be used to address the knowledge extrapolation problem, and a novel principal knowledge descent method can efficiently estimate the extrapolated distribution through the adversarial game. Our method enjoys both elegant theoretical guarantees and superior performance in many scenarios.
Machine-learning-based age estimation has received lots of attention. Traditional age estimation mechanism focuses estimation age error, but ignores that there is a deviation between the estimated age and real age due to disease. Pathological age estimation mechanism the author proposed before introduces age deviation to solve the above problem and improves classification capability of the estimated age significantly. However,it does not consider the age estimation error of the normal control (NC) group and results in a larger error between the estimated age and real age of NC group. Therefore, an integrated age estimation mechanism based on Decision-Level fusion of error and deviation orientation model is proposed to solve the problem.Firstly, the traditional age estimation and pathological age estimation mechanisms are weighted together.Secondly, their optimal weights are obtained by minimizing mean absolute error (MAE) between the estimated age and real age of normal people. In the experimental section, several representative age-related datasets are used for verification of the proposed method. The results show that the proposed age estimation mechanism achieves a good tradeoff effect of age estimation. It not only improves the classification ability of the estimated age, but also reduces the age estimation error of the NC group. In general, the proposed age estimation mechanism is effective. Additionally, the mechanism is a framework mechanism that can be used to construct different specific age estimation algorithms, contributing to relevant research.