Abstract:In multi-view 3D human pose estimation, models typically rely on images captured simultaneously from different camera views to predict a pose at a specific moment. While providing accurate spatial information, this traditional approach often overlooks the rich temporal dependencies between adjacent frames. We propose a novel 3D human pose estimation input method: the sparse interleaved input to address this. This method leverages images captured from different camera views at various time points (e.g., View 1 at time $t$ and View 2 at time $t+δ$), allowing our model to capture rich spatio-temporal information and effectively boost performance. More importantly, this approach offers two key advantages: First, it can theoretically increase the output pose frame rate by N times with N cameras, thereby breaking through single-view frame rate limitations and enhancing the temporal resolution of the production. Second, using a sparse subset of available frames, our method can reduce data redundancy and simultaneously achieve better performance. We introduce the DenseWarper model, which leverages epipolar geometry for efficient spatio-temporal heatmap exchange. We conducted extensive experiments on the Human3.6M and MPI-INF-3DHP datasets. Results demonstrate that our method, utilizing only sparse interleaved images as input, outperforms traditional dense multi-view input approaches and achieves state-of-the-art performance. The source code for this work is available at: https://github.com/lingli1724/DenseWarper-ICLR2026




Abstract:The AIDS epidemic has killed 40 million people and caused serious global problems. The identification of new HIV-inhibiting molecules is of great importance for combating the AIDS epidemic. Here, the Classifier Guidance Diffusion model and ligand-based virtual screening strategy are combined to discover potential HIV-inhibiting molecules for the first time. We call it Diff4VS. An extra classifier is trained using the HIV molecule dataset, and the gradient of the classifier is used to guide the Diffusion to generate HIV-inhibiting molecules. Experiments show that Diff4VS can generate more candidate HIV-inhibiting molecules than other methods. Inspired by ligand-based virtual screening, a new metric DrugIndex is proposed. The DrugIndex is the ratio of the proportion of candidate drug molecules in the generated molecule to the proportion of candidate drug molecules in the training set. DrugIndex provides a new evaluation method for evolving molecular generative models from a pharmaceutical perspective. Besides, we report a new phenomenon observed when using molecule generation models for virtual screening. Compared to real molecules, the generated molecules have a lower proportion that is highly similar to known drug molecules. We call it Degradation in molecule generation. Based on the data analysis, the Degradation may result from the difficulty of generating molecules with a specific structure in the generative model. Our research contributes to the application of generative models in drug design from method, metric, and phenomenon analysis.