HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausible twist and swing angles. In this paper we propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer, resulting in accuracy and computational efficiency improvement relative to the original HybrIK method. We demonstrate our results on commonly used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available https://github.com/boreshkinai/hybrik-transformer.
HybrIK relies on a combination of analytical inverse kinematics and deep learning to produce more accurate 3D pose estimation from 2D monocular images. HybrIK has three major components: (1) pretrained convolution backbone, (2) deconvolution to lift 3D pose from 2D convolution features, (3) analytical inverse kinematics pass correcting deep learning prediction using learned distribution of plausible twist and swing angles. In this paper we propose an enhancement of the 2D to 3D lifting module, replacing deconvolution with Transformer, resulting in accuracy and computational efficiency improvement relative to the original HybrIK method. We demonstrate our results on commonly used H36M, PW3D, COCO and HP3D datasets. Our code is publicly available https://github.com/boreshkinai/hybrik-transformer.
Inverse Kinematics (IK) systems are often rigid with respect to their input character, thus requiring user intervention to be adapted to new skeletons. In this paper we aim at creating a flexible, learned IK solver applicable to a wide variety of human morphologies. We extend a state-of-the-art machine learning IK solver to operate on the well known Skinned Multi-Person Linear model (SMPL). We call our model SMPL-IK, and show that when integrated into real-time 3D software, this extended system opens up opportunities for defining novel AI-assisted animation workflows. For example, pose authoring can be made more flexible with SMPL-IK by allowing users to modify gender and body shape while posing a character. Additionally, when chained with existing pose estimation algorithms, SMPL-IK accelerates posing by allowing users to bootstrap 3D scenes from 2D images while allowing for further editing. Finally, we propose a novel SMPL Shape Inversion mechanism (SMPL-SI) to map arbitrary humanoid characters to the SMPL space, allowing artists to leverage SMPL-IK on custom characters. In addition to qualitative demos showing proposed tools, we present quantitative SMPL-IK baselines on the H36M and AMASS datasets.
Recent progress in neural forecasting accelerated improvements in the performance of large-scale forecasting systems. Yet, long-horizon forecasting remains a very difficult task. Two common challenges afflicting long-horizon forecasting are the volatility of the predictions and their computational complexity. In this paper, we introduce N-HiTS, a model which addresses both challenges by incorporating novel hierarchical interpolation and multi-rate data sampling techniques. These techniques enable the proposed method to assemble its predictions sequentially, selectively emphasizing components with different frequencies and scales, while decomposing the input signal and synthesizing the forecast. We conduct an extensive empirical evaluation demonstrating the advantages of N-HiTS over the state-of-the-art long-horizon forecasting methods. On an array of multivariate forecasting tasks, the proposed method provides an average accuracy improvement of 25% over the latest Transformer architectures while reducing the computation time by an order of magnitude. Our code is available at https://github.com/cchallu/n-hits.
We show that the task of synthesizing missing middle frames, commonly known as motion in-betweening in the animation industry, can be solved more accurately and effectively if a deep learning interpolator operates in the delta mode, using the spherical linear interpolator as a baseline. We demonstrate our empirical findings on the publicly available LaFAN1 dataset. We further generalize this result by showing that the $\Delta$-regime is viable with respect to the reference of the last known frame (also known as the zero-velocity model). This supports the more general conclusion that deep in-betweening in the reference frame local to input frames is more accurate and robust than in-betweening in the global (world) reference frame advocated in previous work. Our code is publicly available at https://github.com/boreshkinai/delta-interpolator.
We study the problem of efficiently scaling ensemble-based deep neural networks for time series (TS) forecasting on a large set of time series. Current state-of-the-art deep ensemble models have high memory and computational requirements, hampering their use to forecast millions of TS in practical scenarios. We propose N-BEATS(P), a global multivariate variant of the N-BEATS model designed to allow simultaneous training of multiple univariate TS forecasting models. Our model addresses the practical limitations of related models, reducing the training time by half and memory requirement by a factor of 5, while keeping the same level of accuracy. We have performed multiple experiments detailing the various ways to train our model and have obtained results that demonstrate its capacity to support zero-shot TS forecasting, i.e., to train a neural network on a source TS dataset and deploy it on a different target TS dataset without retraining, which provides an efficient and reliable solution to forecast at scale even in difficult forecasting conditions.
Our work focuses on the development of a learnable neural representation of human pose for advanced AI assisted animation tooling. Specifically, we tackle the problem of constructing a full static human pose based on sparse and variable user inputs (e.g. locations and/or orientations of a subset of body joints). To solve this problem, we propose a novel neural architecture that combines residual connections with prototype encoding of a partially specified pose to create a new complete pose from the learned latent space. We show that our architecture outperforms a baseline based on Transformer, both in terms of accuracy and computational efficiency. Additionally, we develop a user interface to integrate our neural model in Unity, a real-time 3D development platform. Furthermore, we introduce two new datasets representing the static human pose modeling problem, based on high-quality human motion capture data, which will be released publicly along with model code.
Adaptive algorithms belong to an important class of algorithms used in radar target detection to overcome prior uncertainty of interference covariance. The contamination of the empirical covariance matrix by the useful signal leads to significant degradation of performance of this class of adaptive algorithms. Regularization, also known in radar literature as sample covariance loading, can be used to combat both ill conditioning of the original problem and contamination of the empirical covariance by the desired signal for the adaptive algorithms based on sample covariance matrix inversion. However, the optimum value of loading factor cannot be derived unless strong assumptions are made regarding the structure of covariance matrix and useful signal penetration model. Similarly, least mean square algorithm with linear constraint or without constraint, is also sensitive to the contamination of the learning sample with the target signal. We synthesize two approaches to improve the convergence of adaptive algorithms and protect them from the contamination of the learning sample with the signal from the target. The proposed approach is based on the maximization of empirical signal to interference plus noise ratio (SINR). Its effectiveness is demonstrated using simulated data.