Alert button
Picture for Longhui Yu

Longhui Yu

Alert button

Parameter-Efficient Orthogonal Finetuning via Butterfly Factorization

Nov 10, 2023
Weiyang Liu, Zeju Qiu, Yao Feng, Yuliang Xiu, Yuxuan Xue, Longhui Yu, Haiwen Feng, Zhen Liu, Juyeon Heo, Songyou Peng, Yandong Wen, Michael J. Black, Adrian Weller, Bernhard Schölkopf

Large foundation models are becoming ubiquitous, but training them from scratch is prohibitively expensive. Thus, efficiently adapting these powerful models to downstream tasks is increasingly important. In this paper, we study a principled finetuning paradigm -- Orthogonal Finetuning (OFT) -- for downstream task adaptation. Despite demonstrating good generalizability, OFT still uses a fairly large number of trainable parameters due to the high dimensionality of orthogonal matrices. To address this, we start by examining OFT from an information transmission perspective, and then identify a few key desiderata that enable better parameter-efficiency. Inspired by how the Cooley-Tukey fast Fourier transform algorithm enables efficient information transmission, we propose an efficient orthogonal parameterization using butterfly structures. We apply this parameterization to OFT, creating a novel parameter-efficient finetuning method, called Orthogonal Butterfly (BOFT). By subsuming OFT as a special case, BOFT introduces a generalized orthogonal finetuning framework. Finally, we conduct an extensive empirical study of adapting large vision transformers, large language models, and text-to-image diffusion models to various downstream tasks in vision and language.

* Technical Report (33 pages, 18 figures) 
Viaarxiv icon

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Sep 22, 2023
Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

Figure 1 for MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Figure 2 for MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Figure 3 for MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Figure 4 for MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.

* Technical Report, Work in Progress. Project Page: https://meta-math.github.io/ 
Viaarxiv icon

Forward-Backward Reasoning in Large Language Models for Verification

Aug 23, 2023
Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

Figure 1 for Forward-Backward Reasoning in Large Language Models for Verification
Figure 2 for Forward-Backward Reasoning in Large Language Models for Verification
Figure 3 for Forward-Backward Reasoning in Large Language Models for Verification
Figure 4 for Forward-Backward Reasoning in Large Language Models for Verification

Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., "\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}" Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.

* Preprint 
Viaarxiv icon

Backward Reasoning in Large Language Models for Verification

Aug 15, 2023
Weisen Jiang, Han Shi, Longhui Yu, Zhengying Liu, Yu Zhang, Zhenguo Li, James T. Kwok

Figure 1 for Backward Reasoning in Large Language Models for Verification
Figure 2 for Backward Reasoning in Large Language Models for Verification
Figure 3 for Backward Reasoning in Large Language Models for Verification
Figure 4 for Backward Reasoning in Large Language Models for Verification

Chain-of-Though (CoT) prompting has shown promising performance in various reasoning tasks. Recently, Self-Consistency \citep{wang2023selfconsistency} proposes to sample a diverse set of reasoning chains which may lead to different answers while the answer that receives the most votes is selected. In this paper, we propose a novel method to use backward reasoning in verifying candidate answers. We mask a token in the question by ${\bf x}$ and ask the LLM to predict the masked token when a candidate answer is provided by \textit{a simple template}, i.e., ``\textit{\textbf{If we know the answer of the above question is \{a candidate answer\}, what is the value of unknown variable ${\bf x}$?}}'' Intuitively, the LLM is expected to predict the masked token successfully if the provided candidate answer is correct. We further propose FOBAR to combine forward and backward reasoning for estimating the probability of candidate answers. We conduct extensive experiments on six data sets and three LLMs. Experimental results demonstrate that FOBAR achieves state-of-the-art performance on various reasoning benchmarks.

* Preprint 
Viaarxiv icon

ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

May 18, 2023
Shoukang Hu, Kaichen Zhou, Kaiyu Li, Longhui Yu, Lanqing Hong, Tianyang Hu, Zhenguo Li, Gim Hee Lee, Ziwei Liu

Figure 1 for ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis
Figure 2 for ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis
Figure 3 for ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis
Figure 4 for ConsistentNeRF: Enhancing Neural Radiance Fields with 3D Consistency for Sparse View Synthesis

Neural Radiance Fields (NeRF) has demonstrated remarkable 3D reconstruction capabilities with dense view images. However, its performance significantly deteriorates under sparse view settings. We observe that learning the 3D consistency of pixels among different views is crucial for improving reconstruction quality in such cases. In this paper, we propose ConsistentNeRF, a method that leverages depth information to regularize both multi-view and single-view 3D consistency among pixels. Specifically, ConsistentNeRF employs depth-derived geometry information and a depth-invariant loss to concentrate on pixels that exhibit 3D correspondence and maintain consistent depth relationships. Extensive experiments on recent representative works reveal that our approach can considerably enhance model performance in sparse view conditions, achieving improvements of up to 94% in PSNR, 76% in SSIM, and 31% in LPIPS compared to the vanilla baselines across various benchmarks, including DTU, NeRF Synthetic, and LLFF.

* https://github.com/skhu101/ConsistentNeRF 
Viaarxiv icon

DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality

Mar 25, 2023
Yuqing Wang, Yizhi Wang, Longhui Yu, Yuesheng Zhu, Zhouhui Lian

Figure 1 for DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
Figure 2 for DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
Figure 3 for DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality
Figure 4 for DeepVecFont-v2: Exploiting Transformers to Synthesize Vector Fonts with Higher Quality

Vector font synthesis is a challenging and ongoing problem in the fields of Computer Vision and Computer Graphics. The recently-proposed DeepVecFont achieved state-of-the-art performance by exploiting information of both the image and sequence modalities of vector fonts. However, it has limited capability for handling long sequence data and heavily relies on an image-guided outline refinement post-processing. Thus, vector glyphs synthesized by DeepVecFont still often contain some distortions and artifacts and cannot rival human-designed results. To address the above problems, this paper proposes an enhanced version of DeepVecFont mainly by making the following three novel technical contributions. First, we adopt Transformers instead of RNNs to process sequential data and design a relaxation representation for vector outlines, markedly improving the model's capability and stability of synthesizing long and complex outlines. Second, we propose to sample auxiliary points in addition to control points to precisely align the generated and target B\'ezier curves or lines. Finally, to alleviate error accumulation in the sequential generation process, we develop a context-based self-refinement module based on another Transformer-based decoder to remove artifacts in the initially synthesized glyphs. Both qualitative and quantitative results demonstrate that the proposed method effectively resolves those intrinsic problems of the original DeepVecFont and outperforms existing approaches in generating English and Chinese vector fonts with complicated structures and diverse styles.

* Accepted by CVPR 2023. Code: https://github.com/yizhiwang96/deepvecfont-v2 
Viaarxiv icon

Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

Mar 11, 2023
Weiyang Liu, Longhui Yu, Adrian Weller, Bernhard Schölkopf

Figure 1 for Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
Figure 2 for Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
Figure 3 for Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap
Figure 4 for Generalizing and Decoupling Neural Collapse via Hyperspherical Uniformity Gap

The neural collapse (NC) phenomenon describes an underlying geometric symmetry for deep neural networks, where both deeply learned features and classifiers converge to a simplex equiangular tight frame. It has been shown that both cross-entropy loss and mean square error can provably lead to NC. We remove NC's key assumption on the feature dimension and the number of classes, and then present a generalized neural collapse (GNC) hypothesis that effectively subsumes the original NC. Inspired by how NC characterizes the training target of neural networks, we decouple GNC into two objectives: minimal intra-class variability and maximal inter-class separability. We then use hyperspherical uniformity (which characterizes the degree of uniformity on the unit hypersphere) as a unified framework to quantify these two objectives. Finally, we propose a general objective -- hyperspherical uniformity gap (HUG), which is defined by the difference between inter-class and intra-class hyperspherical uniformity. HUG not only provably converges to GNC, but also decouples GNC into two separate objectives. Unlike cross-entropy loss that couples intra-class compactness and inter-class separability, HUG enjoys more flexibility and serves as a good alternative loss function. Empirical results show that HUG works well in terms of generalization and robustness.

* Published in ICLR 2023 
Viaarxiv icon

Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving

Oct 17, 2022
Longhui Yu, Yifan Zhang, Lanqing Hong, Fei Chen, Zhenguo Li

Figure 1 for Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving
Figure 2 for Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving
Figure 3 for Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving
Figure 4 for Dual-Curriculum Teacher for Domain-Inconsistent Object Detection in Autonomous Driving

Object detection for autonomous vehicles has received increasing attention in recent years, where labeled data are often expensive while unlabeled data can be collected readily, calling for research on semi-supervised learning for this area. Existing semi-supervised object detection (SSOD) methods usually assume that the labeled and unlabeled data come from the same data distribution. In autonomous driving, however, data are usually collected from different scenarios, such as different weather conditions or different times in a day. Motivated by this, we study a novel but challenging domain inconsistent SSOD problem. It involves two kinds of distribution shifts among different domains, including (1) data distribution discrepancy, and (2) class distribution shifts, making existing SSOD methods suffer from inaccurate pseudo-labels and hurting model performance. To address this problem, we propose a novel method, namely Dual-Curriculum Teacher (DucTeacher). Specifically, DucTeacher consists of two curriculums, i.e., (1) domain evolving curriculum seeks to learn from the data progressively to handle data distribution discrepancy by estimating the similarity between domains, and (2) distribution matching curriculum seeks to estimate the class distribution for each unlabeled domain to handle class distribution shifts. In this way, DucTeacher can calibrate biased pseudo-labels and handle the domain-inconsistent SSOD problem effectively. DucTeacher shows its advantages on SODA10M, the largest public semi-supervised autonomous driving dataset, and COCO, a widely used SSOD benchmark. Experiments show that DucTeacher achieves new state-of-the-art performance on SODA10M with 2.2 mAP improvement and on COCO with 0.8 mAP improvement.

* Accepted at BMVC 2022 
Viaarxiv icon

Continual Learning by Modeling Intra-Class Variation

Oct 11, 2022
Longhui Yu, Tianyang Hu, Lanqing Hong, Zhen Liu, Adrian Weller, Weiyang Liu

Figure 1 for Continual Learning by Modeling Intra-Class Variation
Figure 2 for Continual Learning by Modeling Intra-Class Variation
Figure 3 for Continual Learning by Modeling Intra-Class Variation
Figure 4 for Continual Learning by Modeling Intra-Class Variation

It has been observed that neural networks perform poorly when the data or tasks are presented sequentially. Unlike humans, neural networks suffer greatly from catastrophic forgetting, making it impossible to perform life-long learning. To address this issue, memory-based continual learning has been actively studied and stands out as one of the best-performing methods. We examine memory-based continual learning and identify that large variation in the representation space is crucial for avoiding catastrophic forgetting. Motivated by this, we propose to diversify representations by using two types of perturbations: model-agnostic variation (i.e., the variation is generated without the knowledge of the learned neural network) and model-based variation (i.e., the variation is conditioned on the learned neural network). We demonstrate that enlarging representational variation serves as a general principle to improve continual learning. Finally, we perform empirical studies which demonstrate that our method, as a simple plug-and-play component, can consistently improve a number of memory-based continual learning methods by a large margin.

* Technical Report (23 pages, 13 figures) 
Viaarxiv icon