Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Kampffmeyer

UiT The Arctic University of Norway, Norwegian Computing Center

UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Feb 25, 2025

Haoyuan Li, Yanpeng Zhou, Tao Tang, Jifei Song, Yihan Zeng, Michael Kampffmeyer, Hang Xu, Xiaodan Liang

Figure 1 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Figure 2 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Figure 3 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Figure 4 for UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting

Abstract:Recent advancements in multi-modal 3D pre-training methods have shown promising efficacy in learning joint representations of text, images, and point clouds. However, adopting point clouds as 3D representation fails to fully capture the intricacies of the 3D world and exhibits a noticeable gap between the discrete points and the dense 2D pixels of images. To tackle this issue, we propose UniGS, integrating 3D Gaussian Splatting (3DGS) into multi-modal pre-training to enhance the 3D representation. We first rely on the 3DGS representation to model the 3D world as a collection of 3D Gaussians with color and opacity, incorporating all the information of the 3D scene while establishing a strong connection with 2D images. Then, to achieve Language-Image-3D pertaining, UniGS starts with a pre-trained vision-language model to establish a shared visual and textual space through extensive real-world image-text pairs. Subsequently, UniGS employs a 3D encoder to align the optimized 3DGS with the Language-Image representations to learn unified multi-modal representations. To facilitate the extraction of global explicit 3D features by the 3D encoder and achieve better cross-modal alignment, we additionally introduce a novel Gaussian-Aware Guidance module that guides the learning of fine-grained representations of the 3D domain. Through extensive experiments across the Objaverse, ABO, MVImgNet and SUN RGBD datasets with zero-shot classification, text-driven retrieval and open-world understanding tasks, we demonstrate the effectiveness of UniGS in learning a more general and stronger aligned multi-modal representation. Specifically, UniGS achieves leading results across different 3D tasks with remarkable improvements over previous SOTA, Uni3D, including on zero-shot classification (+9.36%), text-driven retrieval (+4.3%) and open-world understanding (+7.92%).

* ICLR 2025

Via

Access Paper or Ask Questions

Robust Classification by Coupling Data Mollification with Label Smoothing

Jun 03, 2024

Markus Heinonen, Ba-Hien Tran, Michael Kampffmeyer, Maurizio Filippone

Figure 1 for Robust Classification by Coupling Data Mollification with Label Smoothing

Figure 2 for Robust Classification by Coupling Data Mollification with Label Smoothing

Figure 3 for Robust Classification by Coupling Data Mollification with Label Smoothing

Figure 4 for Robust Classification by Coupling Data Mollification with Label Smoothing

Abstract:Introducing training-time augmentations is a key technique to enhance generalization and prepare deep neural networks against test-time corruptions. Inspired by the success of generative diffusion models, we propose a novel approach coupling data augmentation, in the form of image noising and blurring, with label smoothing to align predicted label confidences with image degradation. The method is simple to implement, introduces negligible overheads, and can be combined with existing augmentations. We demonstrate improved robustness and uncertainty quantification on the corrupted image benchmarks of the CIFAR and TinyImageNet datasets.

* Under review

Via

Access Paper or Ask Questions

MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

May 01, 2024

Xujie Zhang, Ente Lin, Xiu Li, Yuxuan Luo, Michael Kampffmeyer, Xin Dong, Xiaodan Liang

Figure 1 for MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

Figure 2 for MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

Figure 3 for MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

Figure 4 for MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation

Abstract:This paper introduces MMTryon, a multi-modal multi-reference VIrtual Try-ON (VITON) framework, which can generate high-quality compositional try-on results by taking as inputs a text instruction and multiple garment images. Our MMTryon mainly addresses two problems overlooked in prior literature: 1) Support of multiple try-on items and dressing styleExisting methods are commonly designed for single-item try-on tasks (e.g., upper/lower garments, dresses) and fall short on customizing dressing styles (e.g., zipped/unzipped, tuck-in/tuck-out, etc.) 2) Segmentation Dependency. They further heavily rely on category-specific segmentation models to identify the replacement regions, with segmentation errors directly leading to significant artifacts in the try-on results. For the first issue, our MMTryon introduces a novel multi-modality and multi-reference attention mechanism to combine the garment information from reference images and dressing-style information from text instructions. Besides, to remove the segmentation dependency, MMTryon uses a parsing-free garment encoder and leverages a novel scalable data generation pipeline to convert existing VITON datasets to a form that allows MMTryon to be trained without requiring any explicit segmentation. Extensive experiments on high-resolution benchmarks and in-the-wild test sets demonstrate MMTryon's superiority over existing SOTA methods both qualitatively and quantitatively. Besides, MMTryon's impressive performance on multi-items and style-controllable virtual try-on scenarios and its ability to try on any outfit in a large variety of scenarios from any source image, opens up a new avenue for future investigation in the fashion community.

Via

Access Paper or Ask Questions

ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Mar 20, 2024

Rwiddhi Chakraborty, Adrian Sletten, Michael Kampffmeyer

Figure 1 for ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Figure 2 for ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Figure 3 for ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Figure 4 for ExMap: Leveraging Explainability Heatmaps for Unsupervised Group Robustness to Spurious Correlations

Abstract:Group robustness strategies aim to mitigate learned biases in deep learning models that arise from spurious correlations present in their training datasets. However, most existing methods rely on the access to the label distribution of the groups, which is time-consuming and expensive to obtain. As a result, unsupervised group robustness strategies are sought. Based on the insight that a trained model's classification strategies can be inferred accurately based on explainability heatmaps, we introduce ExMap, an unsupervised two stage mechanism designed to enhance group robustness in traditional classifiers. ExMap utilizes a clustering module to infer pseudo-labels based on a model's explainability heatmaps, which are then used during training in lieu of actual labels. Our empirical studies validate the efficacy of ExMap - We demonstrate that it bridges the performance gap with its supervised counterparts and outperforms existing partially supervised and unsupervised methods. Additionally, ExMap can be seamlessly integrated with existing group robustness learning strategies. Finally, we demonstrate its potential in tackling the emerging issue of multiple shortcut mitigation\footnote{Code available at \url{https://github.com/rwchakra/exmap}}.

Via

Access Paper or Ask Questions

Data-Centric Machine Learning for Geospatial Remote Sensing Data

Dec 08, 2023

Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl(+1 more)

Figure 1 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 2 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 3 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 4 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Abstract:Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning models have been proposed, the majority of them have been developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that shifting the focus towards a complementary data-centric perspective is necessary to achieve further improvements in accuracy, generalization ability, and real impact in end-user applications. This work presents a definition and precise categorization of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.

Via

Access Paper or Ask Questions

WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Dec 06, 2023

xujie zhang, Xiu Li, Michael Kampffmeyer, Xin Dong, Zhenyu Xie, Feida Zhu, Haoye Dong, Xiaodan Liang

Figure 1 for WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Figure 2 for WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Figure 3 for WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Figure 4 for WarpDiffusion: Efficient Diffusion Model for High-Fidelity Virtual Try-on

Abstract:Image-based Virtual Try-On (VITON) aims to transfer an in-shop garment image onto a target person. While existing methods focus on warping the garment to fit the body pose, they often overlook the synthesis quality around the garment-skin boundary and realistic effects like wrinkles and shadows on the warped garments. These limitations greatly reduce the realism of the generated results and hinder the practical application of VITON techniques. Leveraging the notable success of diffusion-based models in cross-modal image synthesis, some recent diffusion-based methods have ventured to tackle this issue. However, they tend to either consume a significant amount of training resources or struggle to achieve realistic try-on effects and retain garment details. For efficient and high-fidelity VITON, we propose WarpDiffusion, which bridges the warping-based and diffusion-based paradigms via a novel informative and local garment feature attention mechanism. Specifically, WarpDiffusion incorporates local texture attention to reduce resource consumption and uses a novel auto-mask module that effectively retains only the critical areas of the warped garment while disregarding unrealistic or erroneous portions. Notably, WarpDiffusion can be integrated as a plug-and-play component into existing VITON methodologies, elevating their synthesis quality. Extensive experiments on high-resolution VITON benchmarks and an in-the-wild test set demonstrate the superiority of WarpDiffusion, surpassing state-of-the-art methods both qualitatively and quantitatively.

Via

Access Paper or Ask Questions

Defending Against Malicious Behaviors in Federated Learning with Blockchain

Jul 02, 2023

Nanqing Dong, Zhipeng Wang, Jiahao Sun, Michael Kampffmeyer, Yizhe Wen, Shuoying Zhang, William Knottenbelt, Eric Xing

Figure 1 for Defending Against Malicious Behaviors in Federated Learning with Blockchain

Figure 2 for Defending Against Malicious Behaviors in Federated Learning with Blockchain

Figure 3 for Defending Against Malicious Behaviors in Federated Learning with Blockchain

Figure 4 for Defending Against Malicious Behaviors in Federated Learning with Blockchain

Abstract:In the era of deep learning, federated learning (FL) presents a promising approach that allows multi-institutional data owners, or clients, to collaboratively train machine learning models without compromising data privacy. However, most existing FL approaches rely on a centralized server for global model aggregation, leading to a single point of failure. This makes the system vulnerable to malicious attacks when dealing with dishonest clients. In this work, we address this problem by proposing a secure and reliable FL system based on blockchain and distributed ledger technology. Our system incorporates a peer-to-peer voting mechanism and a reward-and-slash mechanism, which are powered by on-chain smart contracts, to detect and deter malicious behaviors. Both theoretical and empirical analyses are presented to demonstrate the effectiveness of the proposed approach, showing that our framework is robust against malicious client-side behaviors.

Via

Access Paper or Ask Questions

Forest Parameter Prediction by Multiobjective Deep Learning of Regression Models Trained with Pseudo-Target Imputation

Jun 19, 2023

Sara Björk, Stian N. Anfinsen, Michael Kampffmeyer, Erik Næsset, Terje Gobakken, Lennart Noordermeer

Abstract:In prediction of forest parameters with data from remote sensing (RS), regression models have traditionally been trained on a small sample of ground reference data. This paper proposes to impute this sample of true prediction targets with data from an existing RS-based prediction map that we consider as pseudo-targets. This substantially increases the amount of target training data and leverages the use of deep learning (DL) for semi-supervised regression modelling. We use prediction maps constructed from airborne laser scanning (ALS) data to provide accurate pseudo-targets and free data from Sentinel-1's C-band synthetic aperture radar (SAR) as regressors. A modified U-Net architecture is adapted with a selection of different training objectives. We demonstrate that when a judicious combination of loss functions is used, the semi-supervised imputation strategy produces results that surpass traditional ALS-based regression models, even though \sen data are considered as inferior for forest monitoring. These results are consistent for experiments on above-ground biomass prediction in Tanzania and stem volume prediction in Norway, representing a diversity in parameters and forest types that emphasises the robustness of the approach.

* Submitted to IEEE Transactions on Geoscience and Remote Sensing

Via

Access Paper or Ask Questions

Self-Supervised Few-Shot Learning for Ischemic Stroke Lesion Segmentation

Mar 16, 2023

Luca Tomasetti, Stine Hansen, Mahdieh Khanmohammadi, Kjersti Engan, Liv Jorunn Høllesli, Kathinka Dæhli Kurz, Michael Kampffmeyer

Abstract:Precise ischemic lesion segmentation plays an essential role in improving diagnosis and treatment planning for ischemic stroke, one of the prevalent diseases with the highest mortality rate. While numerous deep neural network approaches have recently been proposed to tackle this problem, these methods require large amounts of annotated regions during training, which can be impractical in the medical domain where annotated data is scarce. As a remedy, we present a prototypical few-shot segmentation approach for ischemic lesion segmentation using only one annotated sample during training. The proposed approach leverages a novel self-supervised training mechanism that is tailored to the task of ischemic stroke lesion segmentation by exploiting color-coded parametric maps generated from Computed Tomography Perfusion scans. We illustrate the benefits of our proposed training mechanism, leading to considerable improvements in performance in the few-shot setting. Given a single annotated patient, an average Dice score of 0.58 is achieved for the segmentation of ischemic lesions.

Via

Access Paper or Ask Questions

Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

Nov 25, 2022

Zaiyu Huang, Hanhui Li, Zhenyu Xie, Michael Kampffmeyer, Qingling Cai, Xiaodan Liang

Figure 1 for Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

Figure 2 for Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

Figure 3 for Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

Figure 4 for Towards Hard-pose Virtual Try-on via 3D-aware Global Correspondence Learning

Abstract:In this paper, we target image-based person-to-person virtual try-on in the presence of diverse poses and large viewpoint variations. Existing methods are restricted in this setting as they estimate garment warping flows mainly based on 2D poses and appearance, which omits the geometric prior of the 3D human body shape. Moreover, current garment warping methods are confined to localized regions, which makes them ineffective in capturing long-range dependencies and results in inferior flows with artifacts. To tackle these issues, we present 3D-aware global correspondences, which are reliable flows that jointly encode global semantic correlations, local deformations, and geometric priors of 3D human bodies. Particularly, given an image pair depicting the source and target person, (a) we first obtain their pose-aware and high-level representations via two encoders, and introduce a coarse-to-fine decoder with multiple refinement modules to predict the pixel-wise global correspondence. (b) 3D parametric human models inferred from images are incorporated as priors to regularize the correspondence refinement process so that our flows can be 3D-aware and better handle variations of pose and viewpoint. (c) Finally, an adversarial generator takes the garment warped by the 3D-aware flow, and the image of the target person as inputs, to synthesize the photo-realistic try-on result. Extensive experiments on public benchmarks and our HardPose test set demonstrate the superiority of our method against the SOTA try-on approaches.

* 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

Via

Access Paper or Ask Questions