Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guha Balakrishnan

DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Aug 27, 2024

Yiran Sun, Hana Baroudi, Tucker Netherton, Laurence Court, Osama Mawlawi, Ashok Veeraraghavan, Guha Balakrishnan

Figure 1 for DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Figure 2 for DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Figure 3 for DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Figure 4 for DIFR3CT: Latent Diffusion for Probabilistic 3D CT Reconstruction from Few Planar X-Rays

Abstract:Computed Tomography (CT) scans are the standard-of-care for the visualization and diagnosis of many clinical ailments, and are needed for the treatment planning of external beam radiotherapy. Unfortunately, the availability of CT scanners in low- and mid-resource settings is highly variable. Planar x-ray radiography units, in comparison, are far more prevalent, but can only provide limited 2D observations of the 3D anatomy. In this work we propose DIFR3CT, a 3D latent diffusion model, that can generate a distribution of plausible CT volumes from one or few (<10) planar x-ray observations. DIFR3CT works by fusing 2D features from each x-ray into a joint 3D space, and performing diffusion conditioned on these fused features in a low-dimensional latent space. We conduct extensive experiments demonstrating that DIFR3CT is better than recent sparse CT reconstruction baselines in terms of standard pixel-level (PSNR, SSIM) on both the public LIDC and in-house post-mastectomy CT datasets. We also show that DIFR3CT supports uncertainty quantification via Monte Carlo sampling, which provides an opportunity to measure reconstruction reliability. Finally, we perform a preliminary pilot study evaluating DIFR3CT for automated breast radiotherapy contouring and planning -- and demonstrate promising feasibility. Our code is available at https://github.com/yransun/DIFR3CT.

* 11 pages, 9 figures

Via

Access Paper or Ask Questions

Fairness and Bias Mitigation in Computer Vision: A Survey

Aug 05, 2024

Sepehr Dehdashtian, Ruozhen He, Yi Li, Guha Balakrishnan, Nuno Vasconcelos, Vicente Ordonez, Vishnu Naresh Boddeti

Figure 1 for Fairness and Bias Mitigation in Computer Vision: A Survey

Figure 2 for Fairness and Bias Mitigation in Computer Vision: A Survey

Figure 3 for Fairness and Bias Mitigation in Computer Vision: A Survey

Figure 4 for Fairness and Bias Mitigation in Computer Vision: A Survey

Abstract:Computer vision systems have witnessed rapid progress over the past two decades due to multiple advances in the field. As these systems are increasingly being deployed in high-stakes real-world applications, there is a dire need to ensure that they do not propagate or amplify any discriminatory tendencies in historical or human-curated data or inadvertently learn biases from spurious correlations. This paper presents a comprehensive survey on fairness that summarizes and sheds light on ongoing trends and successes in the context of computer vision. The topics we discuss include 1) The origin and technical definitions of fairness drawn from the wider fair machine learning literature and adjacent disciplines. 2) Work that sought to discover and analyze biases in computer vision systems. 3) A summary of methods proposed to mitigate bias in computer vision systems in recent years. 4) A comprehensive summary of resources and datasets produced by researchers to measure, analyze, and mitigate bias and enhance fairness. 5) Discussion of the field's success, continuing trends in the context of multimodal foundation and generative models, and gaps that still need to be addressed. The presented characterization should help researchers understand the importance of identifying and mitigating bias in computer vision and the state of the field and identify potential directions for future research.

* 20 pages, 4 figures

Via

Access Paper or Ask Questions

DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction

Jul 01, 2024

Yujin Ham, Mateusz Michalkiewicz, Guha Balakrishnan

Figure 1 for DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction

Figure 2 for DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction

Figure 3 for DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction

Figure 4 for DRAGON: Drone and Ground Gaussian Splatting for 3D Building Reconstruction

Abstract:3D building reconstruction from imaging data is an important task for many applications ranging from urban planning to reconnaissance. Modern Novel View synthesis (NVS) methods like NeRF and Gaussian Splatting offer powerful techniques for developing 3D models from natural 2D imagery in an unsupervised fashion. These algorithms generally require input training views surrounding the scene of interest, which, in the case of large buildings, is typically not available across all camera elevations. In particular, the most readily available camera viewpoints at scale across most buildings are at near-ground (e.g., with mobile phones) and aerial (drones) elevations. However, due to the significant difference in viewpoint between drone and ground image sets, camera registration - a necessary step for NVS algorithms - fails. In this work we propose a method, DRAGON, that can take drone and ground building imagery as input and produce a 3D NVS model. The key insight of DRAGON is that intermediate elevation imagery may be extrapolated by an NVS algorithm itself in an iterative procedure with perceptual regularization, thereby bridging the visual feature gap between the two elevations and enabling registration. We compiled a semi-synthetic dataset of 9 large building scenes using Google Earth Studio, and quantitatively and qualitatively demonstrate that DRAGON can generate compelling renderings on this dataset compared to baseline strategies.

* 12 pages, 9 figures, accepted to ICCP 2024

Via

Access Paper or Ask Questions

Taming Data and Transformers for Audio Generation

Jun 27, 2024

Moayed Haji-Ali, Willi Menapace, Aliaksandr Siarohin, Guha Balakrishnan, Sergey Tulyakov, Vicente Ordonez

Figure 1 for Taming Data and Transformers for Audio Generation

Figure 2 for Taming Data and Transformers for Audio Generation

Figure 3 for Taming Data and Transformers for Audio Generation

Figure 4 for Taming Data and Transformers for Audio Generation

Abstract:Generating ambient sounds and effects is a challenging problem due to data scarcity and often insufficient caption quality, making it difficult to employ large-scale generative models for the task. In this work, we tackle the problem by introducing two new models. First, we propose AutoCap, a high-quality and efficient automatic audio captioning model. We show that by leveraging metadata available with the audio modality, we can substantially improve the quality of captions. AutoCap reaches CIDEr score of 83.2, marking a 3.2% improvement from the best available captioning model at four times faster inference speed. We then use AutoCap to caption clips from existing datasets, obtaining 761,000 audio clips with high-quality captions, forming the largest available audio-text dataset. Second, we propose GenAu, a scalable transformer-based audio generation architecture that we scale up to 1.25B parameters and train with our new dataset. When compared to state-of-the-art audio generators, GenAu obtains significant improvements of 15.7% in FAD score, 22.7% in IS, and 13.5% in CLAP score, indicating significantly improved quality of generated audio compared to previous works. This shows that the quality of data is often as important as its quantity. Besides, since AutoCap is fully automatic, new audio samples can be added to the training dataset, unlocking the training of even larger generative models for audio synthesis.

* Project Webpage: https://snap-research.github.io/GenAU/

Via

Access Paper or Ask Questions

Metric-guided Image Reconstruction Bounds via Conformal Prediction

Apr 23, 2024

Matt Y Cheung, Tucker J Netherton, Laurence E Court, Ashok Veeraraghavan, Guha Balakrishnan

Abstract:Recent advancements in machine learning have led to novel imaging systems and algorithms that address ill-posed problems. Assessing their trustworthiness and understanding how to deploy them safely at test time remains an important and open problem. We propose a method that leverages conformal prediction to retrieve upper/lower bounds and statistical inliers/outliers of reconstructions based on the prediction intervals of downstream metrics. We apply our method to sparse-view CT for downstream radiotherapy planning and show 1) that metric-guided bounds have valid coverage for downstream metrics while conventional pixel-wise bounds do not and 2) anatomical differences of upper/lower bounds between metric-guided and pixel-wise methods. Our work paves the way for more meaningful reconstruction bounds. Code available at https://github.com/matthewyccheung/conformal-metric

Via

Access Paper or Ask Questions

ElasticDiffusion: Training-free Arbitrary Size Image Generation

Nov 30, 2023

Moayed Haji-Ali, Guha Balakrishnan, Vicente Ordonez

Figure 1 for ElasticDiffusion: Training-free Arbitrary Size Image Generation

Figure 2 for ElasticDiffusion: Training-free Arbitrary Size Image Generation

Figure 3 for ElasticDiffusion: Training-free Arbitrary Size Image Generation

Figure 4 for ElasticDiffusion: Training-free Arbitrary Size Image Generation

Abstract:Diffusion models have revolutionized image generation in recent years, yet they are still limited to a few sizes and aspect ratios. We propose ElasticDiffusion, a novel training-free decoding method that enables pretrained text-to-image diffusion models to generate images with various sizes. ElasticDiffusion attempts to decouple the generation trajectory of a pretrained model into local and global signals. The local signal controls low-level pixel information and can be estimated on local patches, while the global signal is used to maintain overall structural consistency and is estimated with a reference image. We test our method on CelebA-HQ (faces) and LAION-COCO (objects/indoor/outdoor scenes). Our experiments and qualitative results show superior image coherence quality across aspect ratios compared to MultiDiffusion and the standard decoding strategy of Stable Diffusion. Code: https://github.com/MoayedHajiAli/ElasticDiffusion-official.git

Via

Access Paper or Ask Questions

GELDA: A generative language annotation framework to reveal visual biases in datasets

Nov 29, 2023

Krish Kabra, Kathleen M. Lewis, Guha Balakrishnan

Figure 1 for GELDA: A generative language annotation framework to reveal visual biases in datasets

Figure 2 for GELDA: A generative language annotation framework to reveal visual biases in datasets

Figure 3 for GELDA: A generative language annotation framework to reveal visual biases in datasets

Figure 4 for GELDA: A generative language annotation framework to reveal visual biases in datasets

Abstract:Bias analysis is a crucial step in the process of creating fair datasets for training and evaluating computer vision models. The bottleneck in dataset analysis is annotation, which typically requires: (1) specifying a list of attributes relevant to the dataset domain, and (2) classifying each image-attribute pair. While the second step has made rapid progress in automation, the first has remained human-centered, requiring an experimenter to compile lists of in-domain attributes. However, an experimenter may have limited foresight leading to annotation "blind spots," which in turn can lead to flawed downstream dataset analyses. To combat this, we propose GELDA, a nearly automatic framework that leverages large generative language models (LLMs) to propose and label various attributes for a domain. GELDA takes a user-defined domain caption (e.g., "a photo of a bird," "a photo of a living room") and uses an LLM to hierarchically generate attributes. In addition, GELDA uses the LLM to decide which of a set of vision-language models (VLMs) to use to classify each attribute in images. Results on real datasets show that GELDA can generate accurate and diverse visual attribute suggestions, and uncover biases such as confounding between class labels and background features. Results on synthetic datasets demonstrate that GELDA can be used to evaluate the biases of text-to-image diffusion models and generative adversarial networks. Overall, we show that while GELDA is not accurate enough to replace human annotators, it can serve as a complementary tool to help humans analyze datasets in a cheap, low-effort, and flexible manner.

* 21 pages, 15 figures, 9 tables

Via

Access Paper or Ask Questions

Improving Denoising Diffusion Probabilistic Models via Exploiting Shared Representations

Nov 27, 2023

Delaram Pirhayatifard, Mohammad Taha Toghani, Guha Balakrishnan, César A. Uribe

Abstract:In this work, we address the challenge of multi-task image generation with limited data for denoising diffusion probabilistic models (DDPM), a class of generative models that produce high-quality images by reversing a noisy diffusion process. We propose a novel method, SR-DDPM, that leverages representation-based techniques from few-shot learning to effectively learn from fewer samples across different tasks. Our method consists of a core meta architecture with shared parameters, i.e., task-specific layers with exclusive parameters. By exploiting the similarity between diverse data distributions, our method can scale to multiple tasks without compromising the image quality. We evaluate our method on standard image datasets and show that it outperforms both unconditional and conditional DDPM in terms of FID and SSIM metrics.

Via

Access Paper or Ask Questions

ISLAND: Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator

Sep 21, 2023

Yuhao Liu, Pranavesh Panakkal, Sylvia Dee, Guha Balakrishnan, Jamie Padgett, Ashok Veeraraghavan

Figure 1 for ISLAND: Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator

Figure 2 for ISLAND: Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator

Figure 3 for ISLAND: Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator

Figure 4 for ISLAND: Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator

Abstract:Cloud occlusion is a common problem in the field of remote sensing, particularly for thermal infrared imaging. Remote sensing thermal instruments onboard operational satellites are supposed to enable frequent and high-resolution observations over land; unfortunately, clouds adversely affect thermal signals by blocking outgoing longwave radiation emission from Earth's surface, interfering with the retrieved ground emission temperature. Such cloud contamination severely reduces the set of serviceable thermal images for downstream applications, making it impractical to perform intricate time-series analysis of land surface temperature (LST). In this paper, we introduce a novel method to remove cloud occlusions from Landsat 8 LST images. We call our method ISLAND, an acronym for Informing Brightness and Surface Temperature Through a Land Cover-based Interpolator. Our approach uses thermal infrared images from Landsat 8 (at 30 m resolution with 16-day revisit cycles) and the NLCD land cover dataset. Inspired by Tobler's first law of Geography, ISLAND predicts occluded brightness temperature and LST through a set of spatio-temporal filters that perform distance-weighted spatio-temporal interpolation. A critical feature of ISLAND is that the filters are land cover-class aware, making it particularly advantageous in complex urban settings with heterogeneous land cover types and distributions. Through qualitative and quantitative analysis, we show that ISLAND achieves robust reconstruction performance across a variety of cloud occlusion and surface land cover conditions, and with a high spatio-temporal resolution. We provide a public dataset of 20 U.S. cities with pre-computed ISLAND thermal infrared and LST outputs. Using several case studies, we demonstrate that ISLAND opens the door to a multitude of high-impact urban and environmental applications across the continental United States.

* 22 pages, 9 figures

Via

Access Paper or Ask Questions

Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Aug 10, 2023

Hao Liang, Pietro Perona, Guha Balakrishnan

Figure 1 for Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Figure 2 for Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Figure 3 for Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Figure 4 for Benchmarking Algorithmic Bias in Face Recognition: An Experimental Approach Using Synthetic Faces and Human Evaluation

Abstract:We propose an experimental method for measuring bias in face recognition systems. Existing methods to measure bias depend on benchmark datasets that are collected in the wild and annotated for protected (e.g., race, gender) and non-protected (e.g., pose, lighting) attributes. Such observational datasets only permit correlational conclusions, e.g., "Algorithm A's accuracy is different on female and male faces in dataset X.". By contrast, experimental methods manipulate attributes individually and thus permit causal conclusions, e.g., "Algorithm A's accuracy is affected by gender and skin color." Our method is based on generating synthetic faces using a neural face generator, where each attribute of interest is modified independently while leaving all other attributes constant. Human observers crucially provide the ground truth on perceptual identity similarity between synthetic image pairs. We validate our method quantitatively by evaluating race and gender biases of three research-grade face recognition models. Our synthetic pipeline reveals that for these algorithms, accuracy is lower for Black and East Asian population subgroups. Our method can also quantify how perceptual changes in attributes affect face identity distances reported by these models. Our large synthetic dataset, consisting of 48,000 synthetic face image pairs (10,200 unique synthetic faces) and 555,000 human annotations (individual attributes and pairwise identity comparisons) is available to researchers in this important area.

* accepted to iccv2023; 18 figures

Via

Access Paper or Ask Questions