Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jiaming Song

Shammie

PhysDiff: Physics-Guided Human Motion Diffusion Model

Dec 09, 2022

Ye Yuan, Jiaming Song, Umar Iqbal, Arash Vahdat, Jan Kautz

Abstract:Denoising diffusion models hold great promise for generating diverse and realistic human motions. However, existing motion diffusion models largely disregard the laws of physics in the diffusion process and often generate physically-implausible motions with pronounced artifacts such as floating, foot sliding, and ground penetration. This seriously impacts the quality of generated motions and limits their real-world application. To address this issue, we present a novel physics-guided motion diffusion model (PhysDiff), which incorporates physical constraints into the diffusion process. Specifically, we propose a physics-based motion projection module that uses motion imitation in a physics simulator to project the denoised motion of a diffusion step to a physically-plausible motion. The projected motion is further used in the next diffusion step to guide the denoising diffusion process. Intuitively, the use of physics in our model iteratively pulls the motion toward a physically-plausible space. Experiments on large-scale human motion datasets show that our approach achieves state-of-the-art motion quality and improves physical plausibility drastically (>78% for all datasets).

* Project page: https://nvlabs.github.io/PhysDiff

Via

Access Paper or Ask Questions

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Nov 17, 2022

Yogesh Balaji, Seungjun Nah, Xun Huang, Arash Vahdat, Jiaming Song, Karsten Kreis, Miika Aittala, Timo Aila, Samuli Laine, Bryan Catanzaro(+2 more)

Figure 1 for eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Figure 2 for eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Figure 3 for eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Figure 4 for eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Abstract:Large-scale diffusion-based generative models have led to breakthroughs in text-conditioned high-resolution image synthesis. Starting from random noise, such text-to-image diffusion models gradually synthesize images in an iterative fashion while conditioning on text prompts. We find that their synthesis behavior qualitatively changes throughout this process: Early in sampling, generation strongly relies on the text prompt to generate text-aligned content, while later, the text conditioning is almost entirely ignored. This suggests that sharing model parameters throughout the entire generation process may not be ideal. Therefore, in contrast to existing works, we propose to train an ensemble of text-to-image diffusion models specialized for different synthesis stages. To maintain training efficiency, we initially train a single model, which is then split into specialized models that are trained for the specific stages of the iterative generation process. Our ensemble of diffusion models, called eDiff-I, results in improved text alignment while maintaining the same inference computation cost and preserving high visual quality, outperforming previous large-scale text-to-image diffusion models on the standard benchmark. In addition, we train our model to exploit a variety of embeddings for conditioning, including the T5 text, CLIP text, and CLIP image embeddings. We show that these different embeddings lead to different behaviors. Notably, the CLIP image embedding allows an intuitive way of transferring the style of a reference image to the target text-to-image output. Lastly, we show a technique that enables eDiff-I's "paint-with-words" capability. A user can select the word in the input text and paint it in a canvas to control the output, which is very handy for crafting the desired image in mind. The project page is available at https://deepimagination.cc/eDiff-I/

Via

Access Paper or Ask Questions

Concrete Score Matching: Generalized Score Matching for Discrete Data

Nov 02, 2022

Chenlin Meng, Kristy Choi, Jiaming Song, Stefano Ermon

Figure 1 for Concrete Score Matching: Generalized Score Matching for Discrete Data

Figure 2 for Concrete Score Matching: Generalized Score Matching for Discrete Data

Figure 3 for Concrete Score Matching: Generalized Score Matching for Discrete Data

Figure 4 for Concrete Score Matching: Generalized Score Matching for Discrete Data

Abstract:Representing probability distributions by the gradient of their density functions has proven effective in modeling a wide range of continuous data modalities. However, this representation is not applicable in discrete domains where the gradient is undefined. To this end, we propose an analogous score function called the "Concrete score", a generalization of the (Stein) score for discrete settings. Given a predefined neighborhood structure, the Concrete score of any input is defined by the rate of change of the probabilities with respect to local directional changes of the input. This formulation allows us to recover the (Stein) score in continuous domains when measuring such changes by the Euclidean distance, while using the Manhattan distance leads to our novel score function in discrete domains. Finally, we introduce a new framework to learn such scores from samples called Concrete Score Matching (CSM), and propose an efficient training objective to scale our approach to high dimensions. Empirically, we demonstrate the efficacy of CSM on density estimation tasks on a mixture of synthetic, tabular, and high-dimensional image datasets, and demonstrate that it performs favorably relative to existing baselines for modeling discrete data.

* First two authors contributed equally

Via

Access Paper or Ask Questions

JPEG Artifact Correction using Denoising Diffusion Restoration Models

Sep 23, 2022

Bahjat Kawar, Jiaming Song, Stefano Ermon, Michael Elad

Figure 1 for JPEG Artifact Correction using Denoising Diffusion Restoration Models

Figure 2 for JPEG Artifact Correction using Denoising Diffusion Restoration Models

Figure 3 for JPEG Artifact Correction using Denoising Diffusion Restoration Models

Figure 4 for JPEG Artifact Correction using Denoising Diffusion Restoration Models

Abstract:Diffusion models can be used as learned priors for solving various inverse problems. However, most existing approaches are restricted to linear inverse problems, limiting their applicability to more general cases. In this paper, we build upon Denoising Diffusion Restoration Models (DDRM) and propose a method for solving some non-linear inverse problems. We leverage the pseudo-inverse operator used in DDRM and generalize this concept for other measurement operators, which allows us to use pre-trained unconditional diffusion models for applications such as JPEG artifact correction. We empirically demonstrate the effectiveness of our approach across various quality factors, attaining performance levels that are on par with state-of-the-art methods trained specifically for the JPEG restoration task.

Via

Access Paper or Ask Questions

A General Recipe for Likelihood-free Bayesian Optimization

Jun 27, 2022

Jiaming Song, Lantao Yu, Willie Neiswanger, Stefano Ermon

Figure 1 for A General Recipe for Likelihood-free Bayesian Optimization

Figure 2 for A General Recipe for Likelihood-free Bayesian Optimization

Figure 3 for A General Recipe for Likelihood-free Bayesian Optimization

Figure 4 for A General Recipe for Likelihood-free Bayesian Optimization

Abstract:The acquisition function, a critical component in Bayesian optimization (BO), can often be written as the expectation of a utility function under a surrogate model. However, to ensure that acquisition functions are tractable to optimize, restrictions must be placed on the surrogate model and utility function. To extend BO to a broader class of models and utilities, we propose likelihood-free BO (LFBO), an approach based on likelihood-free inference. LFBO directly models the acquisition function without having to separately perform inference with a probabilistic surrogate model. We show that computing the acquisition function in LFBO can be reduced to optimizing a weighted classification problem, where the weights correspond to the utility being chosen. By choosing the utility function for expected improvement (EI), LFBO outperforms various state-of-the-art black-box optimization methods on several real-world optimization problems. LFBO can also effectively leverage composite structures of the objective function, which further improves its regret by several orders of magnitude.

* ICML 2022

Via

Access Paper or Ask Questions

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

Jun 10, 2022

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso(+435 more)

Abstract:Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-future capabilities and limitations of language models. To address this challenge, we introduce the Beyond the Imitation Game benchmark (BIG-bench). BIG-bench currently consists of 204 tasks, contributed by 442 authors across 132 institutions. Task topics are diverse, drawing problems from linguistics, childhood development, math, common-sense reasoning, biology, physics, social bias, software development, and beyond. BIG-bench focuses on tasks that are believed to be beyond the capabilities of current language models. We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench, across model sizes spanning millions to hundreds of billions of parameters. In addition, a team of human expert raters performed all tasks in order to provide a strong baseline. Findings include: model performance and calibration both improve with scale, but are poor in absolute terms (and when compared with rater performance); performance is remarkably similar across model classes, though with benefits from sparsity; tasks that improve gradually and predictably commonly involve a large knowledge or memorization component, whereas tasks that exhibit "breakthrough" behavior at a critical scale often involve multiple steps or components, or brittle metrics; social bias typically increases with scale in settings with ambiguous context, but this can be improved with prompting.

* 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

Via

Access Paper or Ask Questions

BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Apr 03, 2022

Jason Dai, Ding Ding, Dongjie Shi, Shengsheng Huang, Jiao Wang, Xin Qiu, Kai Huang, Guoqiong Song, Yang Wang, Qiyuan Gong(+6 more)

Figure 1 for BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Figure 2 for BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Figure 3 for BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Figure 4 for BigDL 2.0: Seamless Scaling of AI Pipelines from Laptops to Distributed Cluster

Abstract:Most AI projects start with a Python notebook running on a single laptop; however, one usually needs to go through a mountain of pains to scale it to handle larger dataset (for both experimentation and production deployment). These usually entail many manual and error-prone steps for the data scientists to fully take advantage of the available hardware resources (e.g., SIMD instructions, multi-processing, quantization, memory allocation optimization, data partitioning, distributed computing, etc.). To address this challenge, we have open sourced BigDL 2.0 at https://github.com/intel-analytics/BigDL/ under Apache 2.0 license (combining the original BigDL and Analytics Zoo projects); using BigDL 2.0, users can simply build conventional Python notebooks on their laptops (with possible AutoML support), which can then be transparently accelerated on a single node (with up-to 9.6x speedup in our experiments), and seamlessly scaled out to a large cluster (across several hundreds servers in real-world use cases). BigDL 2.0 has already been adopted by many real-world users (such as Mastercard, Burger King, Inspur, etc.) in production.

* Accepted by CVPR 2022 (Demo Track)

Via

Access Paper or Ask Questions

Dual Diffusion Implicit Bridges for Image-to-Image Translation

Mar 16, 2022

Xuan Su, Jiaming Song, Chenlin Meng, Stefano Ermon

Figure 1 for Dual Diffusion Implicit Bridges for Image-to-Image Translation

Figure 2 for Dual Diffusion Implicit Bridges for Image-to-Image Translation

Figure 3 for Dual Diffusion Implicit Bridges for Image-to-Image Translation

Figure 4 for Dual Diffusion Implicit Bridges for Image-to-Image Translation

Abstract:Common image-to-image translation methods rely on joint training over data from both source and target domains. This excludes cases where domain data is private (e.g., in a federated setting), and often means that a new model has to be trained for a new pair of domains. We present Dual Diffusion Implicit Bridges (DDIBs), an image translation method based on diffusion models, that circumvents training on domain pairs. DDIBs allow translations between arbitrary pairs of source-target domains, given independently trained diffusion models on the respective domains. Image translation with DDIBs is a two-step process: DDIBs first obtain latent encodings for source images with the source diffusion model, and next decode such encodings using the target model to construct target images. Moreover, DDIBs enable cycle-consistency by default and is theoretically connected to optimal transport. Experimentally, we apply DDIBs on a variety of synthetic and high-resolution image datasets, demonstrating their utility in example-guided color transfer, image-to-image translation as well as their connections to optimal transport methods.

Via

Access Paper or Ask Questions

LISA: Learning Interpretable Skill Abstractions from Language

Feb 28, 2022

Divyansh Garg, Skanda Vaidyanath, Kuno Kim, Jiaming Song, Stefano Ermon

Figure 1 for LISA: Learning Interpretable Skill Abstractions from Language

Figure 2 for LISA: Learning Interpretable Skill Abstractions from Language

Figure 3 for LISA: Learning Interpretable Skill Abstractions from Language

Figure 4 for LISA: Learning Interpretable Skill Abstractions from Language

Abstract:Learning policies that effectually utilize language instructions in complex, multi-task environments is an important problem in imitation learning. While it is possible to condition on the entire language instruction directly, such an approach could suffer from generalization issues. To encode complex instructions into skills that can generalize to unseen instructions, we propose Learning Interpretable Skill Abstractions (LISA), a hierarchical imitation learning framework that can learn diverse, interpretable skills from language-conditioned demonstrations. LISA uses vector quantization to learn discrete skill codes that are highly correlated with language instructions and the behavior of the learned policy. In navigation and robotic manipulation environments, LISA is able to outperform a strong non-hierarchical baseline in the low data regime and compose learned skills to solve tasks containing unseen long-range instructions. Our method demonstrates a more natural way to condition on language in sequential decision-making problems and achieve interpretable and controllable behavior with the learned skills.

Via

Access Paper or Ask Questions

Denoising Diffusion Restoration Models

Feb 04, 2022

Bahjat Kawar, Michael Elad, Stefano Ermon, Jiaming Song

Figure 1 for Denoising Diffusion Restoration Models

Figure 2 for Denoising Diffusion Restoration Models

Figure 3 for Denoising Diffusion Restoration Models

Figure 4 for Denoising Diffusion Restoration Models

Abstract:Many interesting tasks in image restoration can be cast as linear inverse problems. A recent family of approaches for solving these problems uses stochastic algorithms that sample from the posterior distribution of natural images given the measurements. However, efficient solutions often require problem-specific supervised training to model the posterior, whereas unsupervised methods that are not problem-specific typically rely on inefficient iterative methods. This work addresses these issues by introducing Denoising Diffusion Restoration Models (DDRM), an efficient, unsupervised posterior sampling method. Motivated by variational inference, DDRM takes advantage of a pre-trained denoising diffusion generative model for solving any linear inverse problem. We demonstrate DDRM's versatility on several image datasets for super-resolution, deblurring, inpainting, and colorization under various amounts of measurement noise. DDRM outperforms the current leading unsupervised methods on the diverse ImageNet dataset in reconstruction quality, perceptual quality, and runtime, being 5x faster than the nearest competitor. DDRM also generalizes well for natural images out of the distribution of the observed ImageNet training set.

* Our code is available at https://github.com/bahjat-kawar/ddrm

Via

Access Paper or Ask Questions