Alert button
Picture for Haoxiang Wang

Haoxiang Wang

Alert button

Invariant-Feature Subspace Recovery: A New Class of Provable Domain Generalization Algorithms

Nov 02, 2023
Haoxiang Wang, Gargi Balasubramaniam, Haozhe Si, Bo Li, Han Zhao

Domain generalization asks for models trained over a set of training environments to generalize well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) have been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this work, we propose Invariant-feature Subspace Recovery (ISR): a new class of algorithms to achieve provable domain generalization across the settings of classification and regression problems. First, in the binary classification setup of Rosenfeld et al. (2021), we show that our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments. Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Next, we extend ISR-Mean to the more general setting of multi-class classification and propose ISR-Multiclass, which leverages class information and provably recovers the invariant-feature subspace with $\lceil d_s/k\rceil+1$ training environments for $k$-class classification. Finally, for regression problems, we propose ISR-Regression that can identify the invariant-feature subspace with $d_s+1$ training environments. Empirically, we demonstrate the superior performance of our ISRs on synthetic benchmarks. Further, ISR can be used as post-processing methods for feature extractors such as neural nets.

* Submitted to JMLR. This journal version significantly extends our ICML 2022 paper, arXiv:2201.12919 
Viaarxiv icon

SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

Oct 23, 2023
Haoxiang Wang, Pavan Kumar Anasosalu Vasu, Fartash Faghri, Raviteja Vemulapalli, Mehrdad Farajtabar, Sachin Mehta, Mohammad Rastegari, Oncel Tuzel, Hadi Pouransari

Figure 1 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 2 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 3 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Figure 4 for SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding

The landscape of publicly available vision foundation models (VFMs), such as CLIP and Segment Anything Model (SAM), is expanding rapidly. VFMs are endowed with distinct capabilities stemming from their pre-training objectives. For instance, CLIP excels in semantic understanding, while SAM specializes in spatial understanding for segmentation. In this work, we introduce a simple recipe to efficiently merge VFMs into a unified model that assimilates their expertise. Our proposed method integrates multi-task learning, continual learning techniques, and teacher-student distillation. This strategy entails significantly less computational cost compared to traditional multi-task training from scratch. Additionally, it only demands a small fraction of the pre-training datasets that were initially used to train individual models. By applying our method to SAM and CLIP, we derive SAM-CLIP: a unified model that amalgamates the strengths of SAM and CLIP into a single backbone, making it apt for edge device applications. We show that SAM-CLIP learns richer visual representations, equipped with both localization and semantic features, suitable for a broad range of vision tasks. SAM-CLIP obtains improved performance on several head probing tasks when compared with SAM and CLIP. We further show that SAM-CLIP not only retains the foundational strengths of its precursor models but also introduces synergistic functionalities, most notably in zero-shot semantic segmentation, where SAM-CLIP establishes new state-of-the-art results on 5 benchmarks. It outperforms previous models that are specifically designed for this task by a large margin, including +6.8% and +5.9% mean IoU improvement on Pascal-VOC and COCO-Stuff datasets, respectively.

Viaarxiv icon

Gradual Domain Adaptation: Theory and Algorithms

Oct 20, 2023
Yifei He, Haoxiang Wang, Bo Li, Han Zhao

Figure 1 for Gradual Domain Adaptation: Theory and Algorithms
Figure 2 for Gradual Domain Adaptation: Theory and Algorithms
Figure 3 for Gradual Domain Adaptation: Theory and Algorithms
Figure 4 for Gradual Domain Adaptation: Theory and Algorithms

Unsupervised domain adaptation (UDA) adapts a model from a labeled source domain to an unlabeled target domain in a one-off way. Though widely applied, UDA faces a great challenge whenever the distribution shift between the source and the target is large. Gradual domain adaptation (GDA) mitigates this limitation by using intermediate domains to gradually adapt from the source to the target domain. In this work, we first theoretically analyze gradual self-training, a popular GDA algorithm, and provide a significantly improved generalization bound compared with Kumar et al. (2020). Our theoretical analysis leads to an interesting insight: to minimize the generalization error on the target domain, the sequence of intermediate domains should be placed uniformly along the Wasserstein geodesic between the source and target domains. The insight is particularly useful under the situation where intermediate domains are missing or scarce, which is often the case in real-world applications. Based on the insight, we propose $\textbf{G}$enerative Gradual D$\textbf{O}$main $\textbf{A}$daptation with Optimal $\textbf{T}$ransport (GOAT), an algorithmic framework that can generate intermediate domains in a data-dependent way. More concretely, we first generate intermediate domains along the Wasserstein geodesic between two given consecutive domains in a feature space, then apply gradual self-training to adapt the source-trained classifier to the target along the sequence of intermediate domains. Empirically, we demonstrate that our GOAT framework can improve the performance of standard GDA when the given intermediate domains are scarce, significantly broadening the real-world application scenarios of GDA. Our code is available at https://github.com/yifei-he/GOAT.

* arXiv admin note: substantial text overlap with arXiv:2204.08200 
Viaarxiv icon

Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models

Sep 12, 2023
Yong Lin, Lu Tan, Hangyu Lin, Zeming Zheng, Renjie Pi, Jipeng Zhang, Shizhe Diao, Haoxiang Wang, Han Zhao, Yuan Yao, Tong Zhang

Figure 1 for Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models
Figure 2 for Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models
Figure 3 for Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models
Figure 4 for Speciality vs Generality: An Empirical Study on Catastrophic Forgetting in Fine-tuning Foundation Models

Foundation models, including Vision Language Models (VLMs) and Large Language Models (LLMs), possess the $generality$ to handle diverse distributions and tasks, which stems from their extensive pre-training datasets. The fine-tuning of foundation models is a common practice to enhance task performance or align the model's behavior with human expectations, allowing them to gain $speciality$. However, the small datasets used for fine-tuning may not adequately cover the diverse distributions and tasks encountered during pre-training. Consequently, the pursuit of speciality during fine-tuning can lead to a loss of {generality} in the model, which is related to catastrophic forgetting (CF) in deep learning. In this study, we demonstrate this phenomenon in both VLMs and LLMs. For instance, fine-tuning VLMs like CLIP on ImageNet results in a loss of generality in handling diverse distributions, and fine-tuning LLMs like Galactica in the medical domain leads to a loss in following instructions and common sense. To address the trade-off between the speciality and generality, we investigate multiple regularization methods from continual learning, the weight averaging method (Wise-FT) from out-of-distributional (OOD) generalization, which interpolates parameters between pre-trained and fine-tuned models, and parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA). Our findings show that both continual learning and Wise-ft methods effectively mitigate the loss of generality, with Wise-FT exhibiting the strongest performance in balancing speciality and generality.

* 30 Pages 
Viaarxiv icon

ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction

Sep 04, 2023
Xiaohang Yu, Haoxiang Wang, Yuqi Han, Lei Yang, Tao Yu, Qionghai Dai

Figure 1 for ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction
Figure 2 for ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction
Figure 3 for ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction
Figure 4 for ImmersiveNeRF: Hybrid Radiance Fields for Unbounded Immersive Light Field Reconstruction

This paper proposes a hybrid radiance field representation for unbounded immersive light field reconstruction which supports high-quality rendering and aggressive view extrapolation. The key idea is to first formally separate the foreground and the background and then adaptively balance learning of them during the training process. To fulfill this goal, we represent the foreground and background as two separate radiance fields with two different spatial mapping strategies. We further propose an adaptive sampling strategy and a segmentation regularizer for more clear segmentation and robust convergence. Finally, we contribute a novel immersive light field dataset, named THUImmersive, with the potential to achieve much larger space 6DoF immersive rendering effects compared with existing datasets, by capturing multiple neighboring viewpoints for the same scene, to stimulate the research and AR/VR applications in the immersive light field domain. Extensive experiments demonstrate the strong performance of our method for unbounded immersive light field reconstruction.

Viaarxiv icon

Federated Learning with Classifier Shift for Class Imbalance

Apr 11, 2023
Yunheng Shen, Haoxiang Wang, Hairong Lv

Figure 1 for Federated Learning with Classifier Shift for Class Imbalance
Figure 2 for Federated Learning with Classifier Shift for Class Imbalance
Figure 3 for Federated Learning with Classifier Shift for Class Imbalance
Figure 4 for Federated Learning with Classifier Shift for Class Imbalance

Federated learning aims to learn a global model collaboratively while the training data belongs to different clients and is not allowed to be exchanged. However, the statistical heterogeneity challenge on non-IID data, such as class imbalance in classification, will cause client drift and significantly reduce the performance of the global model. This paper proposes a simple and effective approach named FedShift which adds the shift on the classifier output during the local training phase to alleviate the negative impact of class imbalance. We theoretically prove that the classifier shift in FedShift can make the local optimum consistent with the global optimum and ensure the convergence of the algorithm. Moreover, our experiments indicate that FedShift significantly outperforms the other state-of-the-art federated learning approaches on various datasets regarding accuracy and communication efficiency.

Viaarxiv icon

AuE-IPA: An AU Engagement Based Infant Pain Assessment Method

Dec 09, 2022
Mingze Sun, Haoxiang Wang, Wei Yao, Jiawang Liu

Figure 1 for AuE-IPA: An AU Engagement Based Infant Pain Assessment Method
Figure 2 for AuE-IPA: An AU Engagement Based Infant Pain Assessment Method
Figure 3 for AuE-IPA: An AU Engagement Based Infant Pain Assessment Method
Figure 4 for AuE-IPA: An AU Engagement Based Infant Pain Assessment Method

Recent studies have found that pain in infancy has a significant impact on infant development, including psychological problems, possible brain injury, and pain sensitivity in adulthood. However, due to the lack of specialists and the fact that infants are unable to express verbally their experience of pain, it is difficult to assess infant pain. Most existing infant pain assessment systems directly apply adult methods to infants ignoring the differences between infant expressions and adult expressions. Meanwhile, as the study of facial action coding system continues to advance, the use of action units (AUs) opens up new possibilities for expression recognition and pain assessment. In this paper, a novel AuE-IPA method is proposed for assessing infant pain by leveraging different engagement levels of AUs. First, different engagement levels of AUs in infant pain are revealed, by analyzing the class activation map of an end-to-end pain assessment model. The intensities of top-engaged AUs are then used in a regression model for achieving automatic infant pain assessment. The model proposed is trained and experimented on YouTube Immunization dataset, YouTube Blood Test dataset, and iCOPEVid dataset. The experimental results show that our AuE-IPA method is more applicable to infants and possesses stronger generalization ability than end-to-end assessment model and the classic PSPI metric.

Viaarxiv icon

Predicting Properties of Quantum Systems with Conditional Generative Models

Nov 30, 2022
Haoxiang Wang, Maurice Weber, Josh Izaac, Cedric Yen-Yu Lin

Figure 1 for Predicting Properties of Quantum Systems with Conditional Generative Models
Figure 2 for Predicting Properties of Quantum Systems with Conditional Generative Models
Figure 3 for Predicting Properties of Quantum Systems with Conditional Generative Models
Figure 4 for Predicting Properties of Quantum Systems with Conditional Generative Models

Machine learning has emerged recently as a powerful tool for predicting properties of quantum many-body systems. For many ground states of gapped Hamiltonians, generative models can learn from measurements of a single quantum state to reconstruct the state accurately enough to predict local observables. Alternatively, kernel methods can predict local observables by learning from measurements on different but related states. In this work, we combine the benefits of both approaches and propose the use of conditional generative models to simultaneously represent a family of states, by learning shared structures of different quantum states from measurements. The trained model allows us to predict arbitrary local properties of ground states, even for states not present in the training data, and without necessitating further training for new observables. We numerically validate our approach (with simulations of up to 45 qubits) for two quantum many-body problems, 2D random Heisenberg models and Rydberg atom systems.

* 12 pages, 14 figures, 5 pages appendix. Open-source code is available at https://github.com/PennyLaneAI/generative-quantum-states 
Viaarxiv icon

Attention Based Relation Network for Facial Action Units Recognition

Oct 23, 2022
Yao Wei, Haoxiang Wang, Mingze Sun, Jiawang Liu

Figure 1 for Attention Based Relation Network for Facial Action Units Recognition
Figure 2 for Attention Based Relation Network for Facial Action Units Recognition
Figure 3 for Attention Based Relation Network for Facial Action Units Recognition
Figure 4 for Attention Based Relation Network for Facial Action Units Recognition

Facial action unit (AU) recognition is essential to facial expression analysis. Since there are highly positive or negative correlations between AUs, some existing AU recognition works have focused on modeling AU relations. However, previous relationship-based approaches typically embed predefined rules into their models and ignore the impact of various AU relations in different crowds. In this paper, we propose a novel Attention Based Relation Network (ABRNet) for AU recognition, which can automatically capture AU relations without unnecessary or even disturbing predefined rules. ABRNet uses several relation learning layers to automatically capture different AU relations. The learned AU relation features are then fed into a self-attention fusion module, which aims to refine individual AU features with attention weights to enhance the feature robustness. Furthermore, we propose an AU relation dropout strategy and AU relation loss (AUR-Loss) to better model AU relations, which can further improve AU recognition. Extensive experiments show that our approach achieves state-of-the-art performance on the DISFA and DISFA+ datasets.

Viaarxiv icon

Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

Sep 02, 2022
Mao Ye, Ruichen Jiang, Haoxiang Wang, Dhruv Choudhary, Xiaocong Du, Bhargav Bhushanam, Aryan Mokhtari, Arun Kejariwal, Qiang Liu

Figure 1 for Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems
Figure 2 for Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems
Figure 3 for Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems
Figure 4 for Future Gradient Descent for Adapting the Temporal Shifting Data Distribution in Online Recommendation Systems

One of the key challenges of learning an online recommendation model is the temporal domain shift, which causes the mismatch between the training and testing data distribution and hence domain generalization error. To overcome, we propose to learn a meta future gradient generator that forecasts the gradient information of the future data distribution for training so that the recommendation model can be trained as if we were able to look ahead at the future of its deployment. Compared with Batch Update, a widely used paradigm, our theory suggests that the proposed algorithm achieves smaller temporal domain generalization error measured by a gradient variation term in a local regret. We demonstrate the empirical advantage by comparing with various representative baselines.

Viaarxiv icon