Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ye Yuan

Sam

ParetoFlow: Guided Flows in Multi-Objective Optimization

Dec 04, 2024

Ye Yuan, Can Chen, Christopher Pal, Xue Liu

Figure 1 for ParetoFlow: Guided Flows in Multi-Objective Optimization

Figure 2 for ParetoFlow: Guided Flows in Multi-Objective Optimization

Figure 3 for ParetoFlow: Guided Flows in Multi-Objective Optimization

Figure 4 for ParetoFlow: Guided Flows in Multi-Objective Optimization

Abstract:In offline multi-objective optimization (MOO), we leverage an offline dataset of designs and their associated labels to simultaneously minimize multiple objectives. This setting more closely mirrors complex real-world problems compared to single-objective optimization. Recent works mainly employ evolutionary algorithms and Bayesian optimization, with limited attention given to the generative modeling capabilities inherent in such data. In this study, we explore generative modeling in offline MOO through flow matching, noted for its effectiveness and efficiency. We introduce ParetoFlow, specifically designed to guide flow sampling to approximate the Pareto front. Traditional predictor (classifier) guidance is inadequate for this purpose because it models only a single objective. In response, we propose a multi-objective predictor guidance module that assigns each sample a weight vector, representing a weighted distribution across multiple objective predictions. A local filtering scheme is introduced to address non-convex Pareto fronts. These weights uniformly cover the entire objective space, effectively directing sample generation towards the Pareto front. Since distributions with similar weights tend to generate similar samples, we introduce a neighboring evolution module to foster knowledge sharing among neighboring distributions. This module generates offspring from these distributions, and selects the most promising one for the next iteration. Our method achieves state-of-the-art performance across various tasks.

Via

Access Paper or Ask Questions

Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Oct 16, 2024

Zhenyu Jiang, Yuqi Xie, Jinhan Li, Ye Yuan, Yifeng Zhu, Yuke Zhu

Figure 1 for Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Figure 2 for Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Figure 3 for Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Figure 4 for Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions

Abstract:Humanoid robots, with their human-like embodiment, have the potential to integrate seamlessly into human environments. Critical to their coexistence and cooperation with humans is the ability to understand natural language communications and exhibit human-like behaviors. This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions. We leverage human motion priors from extensive human motion datasets to initialize humanoid motions and employ the commonsense reasoning capabilities of Vision Language Models (VLMs) to edit and refine these motions. Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions, validated through both simulated and real-world experiments. More videos can be found at https://ut-austin-rpl.github.io/Harmon/.

* Accepted for oral presentation at 8th Annual Conference on Robot Learning. Project website: https://ut-austin-rpl.github.io/Harmon/

Via

Access Paper or Ask Questions

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Oct 02, 2024

Xi Chen, Kaituo Feng, Changsheng Li, Xunhao Lai, Xiangyu Yue, Ye Yuan, Guoren Wang

Figure 1 for Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Figure 2 for Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Figure 3 for Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Figure 4 for Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

Abstract:Low-rank training has emerged as a promising approach for reducing memory usage in training Large Language Models (LLMs). Previous methods either rely on decomposing weight matrices (e.g., LoRA), or seek to decompose gradient matrices (e.g., GaLore) to ensure reduced memory consumption. However, both of them constrain the training in a low-rank subspace, thus inevitably leading to sub-optimal performance. This raises a question: whether it is possible to consistently preserve the low-rank constraint for memory efficiency, while achieving full-rank training (i.e., training with full-rank gradients of full-rank weights) to avoid inferior outcomes? In this paper, we propose a new plug-and-play training framework for LLMs called Fira, as the first attempt to achieve this goal. First, we observe an interesting phenomenon during LLM training: the scaling impact of adaptive optimizers (e.g., Adam) on the gradient norm remains similar from low-rank to full-rank training. Based on this observation, we propose a norm-based scaling method, which utilizes the scaling impact of low-rank optimizers as substitutes for that of original full-rank optimizers to enable full-rank training. In this way, we can preserve the low-rank constraint in the optimizer while achieving full-rank training for better performance. Moreover, we find that there are sudden gradient rises during the optimization process, potentially causing loss spikes. To address this, we further put forward a norm-growth limiter to smooth the gradient via regulating the relative increase of gradient norms. Extensive experiments on the pre-training and fine-tuning of LLMs show that Fira outperforms both LoRA and GaLore, achieving performance that is comparable to or even better than full-rank training.

* Code is available at: https://github.com/xichen-fy/Fira

Via

Access Paper or Ask Questions

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Sep 25, 2024

Chi Zhang, Huaping Zhong, Kuan Zhang, Chengliang Chai, Rui Wang, Xinlin Zhuang, Tianyi Bai, Jiantao Qiu, Lei Cao, Ye Yuan(+2 more)

Figure 1 for Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Figure 2 for Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Figure 3 for Harnessing Diversity for Important Data Selection in Pretraining Large Language Models

Abstract:Data selection is of great significance in pre-training large language models, given the variation in quality within the large-scale available training corpora. To achieve this, researchers are currently investigating the use of data influence to measure the importance of data instances, $i.e.,$ a high influence score indicates that incorporating this instance to the training set is likely to enhance the model performance. Consequently, they select the top-$k$ instances with the highest scores. However, this approach has several limitations. (1) Computing the influence of all available data is time-consuming. (2) The selected data instances are not diverse enough, which may hinder the pre-trained model's ability to generalize effectively to various downstream tasks. In this paper, we introduce \texttt{Quad}, a data selection approach that considers both quality and diversity by using data influence to achieve state-of-the-art pre-training results. In particular, noting that attention layers capture extensive semantic details, we have adapted the accelerated $iHVP$ computation methods for attention layers, enhancing our ability to evaluate the influence of data, $i.e.,$ its quality. For the diversity, \texttt{Quad} clusters the dataset into similar data instances within each cluster and diverse instances across different clusters. For each cluster, if we opt to select data from it, we take some samples to evaluate the influence to prevent processing all instances. To determine which clusters to select, we utilize the classic Multi-Armed Bandit method, treating each cluster as an arm. This approach favors clusters with highly influential instances (ensuring high quality) or clusters that have been selected less frequently (ensuring diversity), thereby well balancing between quality and diversity.

Via

Access Paper or Ask Questions

PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

Aug 31, 2024

Jialiang Wang, Yan Xia, Ye Yuan

Figure 1 for PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

Figure 2 for PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

Figure 3 for PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

Figure 4 for PSLF: A PID Controller-incorporated Second-order Latent Factor Analysis Model for Recommender System

Abstract:A second-order-based latent factor (SLF) analysis model demonstrates superior performance in graph representation learning, particularly for high-dimensional and incomplete (HDI) interaction data, by incorporating the curvature information of the loss landscape. However, its objective function is commonly bi-linear and non-convex, causing the SLF model to suffer from a low convergence rate. To address this issue, this paper proposes a PID controller-incorporated SLF (PSLF) model, leveraging two key strategies: a) refining learning error estimation by incorporating the PID controller principles, and b) acquiring second-order information insights through Hessian-vector products. Experimental results on multiple HDI datasets indicate that the proposed PSLF model outperforms four state-of-the-art latent factor models based on advanced optimizers regarding convergence rates and generalization performance.

Via

Access Paper or Ask Questions

COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

Aug 29, 2024

Jiefeng Li, Ye Yuan, Davis Rempe, Haotian Zhang, Pavlo Molchanov, Cewu Lu, Jan Kautz, Umar Iqbal

Figure 1 for COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

Figure 2 for COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

Figure 3 for COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

Figure 4 for COIN: Control-Inpainting Diffusion Prior for Human and Camera Motion Estimation

Abstract:Estimating global human motion from moving cameras is challenging due to the entanglement of human and camera motions. To mitigate the ambiguity, existing methods leverage learned human motion priors, which however often result in oversmoothed motions with misaligned 2D projections. To tackle this problem, we propose COIN, a control-inpainting motion diffusion prior that enables fine-grained control to disentangle human and camera motions. Although pre-trained motion diffusion models encode rich motion priors, we find it non-trivial to leverage such knowledge to guide global motion estimation from RGB videos. COIN introduces a novel control-inpainting score distillation sampling method to ensure well-aligned, consistent, and high-quality motion from the diffusion prior within a joint optimization framework. Furthermore, we introduce a new human-scene relation loss to alleviate the scale ambiguity by enforcing consistency among the humans, camera, and scene. Experiments on three challenging benchmarks demonstrate the effectiveness of COIN, which outperforms the state-of-the-art methods in terms of global human motion estimation and camera motion estimation. As an illustrative example, COIN outperforms the state-of-the-art method by 33% in world joint position error (W-MPJPE) on the RICH dataset.

* ECCV 2024

Via

Access Paper or Ask Questions

Image Segmentation in Foundation Model Era: A Survey

Aug 23, 2024

Tianfei Zhou, Fei Zhang, Boyu Chang, Wenguan Wang, Ye Yuan, Ender Konukoglu, Daniel Cremers

Figure 1 for Image Segmentation in Foundation Model Era: A Survey

Figure 2 for Image Segmentation in Foundation Model Era: A Survey

Figure 3 for Image Segmentation in Foundation Model Era: A Survey

Abstract:Image segmentation is a long-standing challenge in computer vision, studied continuously over several decades, as evidenced by seminal algorithms such as N-Cut, FCN, and MaskFormer. With the advent of foundation models (FMs), contemporary segmentation methodologies have embarked on a new epoch by either adapting FMs (e.g., CLIP, Stable Diffusion, DINO) for image segmentation or developing dedicated segmentation foundation models (e.g., SAM). These approaches not only deliver superior segmentation performance, but also herald newfound segmentation capabilities previously unseen in deep learning context. However, current research in image segmentation lacks a detailed analysis of distinct characteristics, challenges, and solutions associated with these advancements. This survey seeks to fill this gap by providing a thorough review of cutting-edge research centered around FM-driven image segmentation. We investigate two basic lines of research -- generic image segmentation (i.e., semantic segmentation, instance segmentation, panoptic segmentation), and promptable image segmentation (i.e., interactive segmentation, referring segmentation, few-shot segmentation) -- by delineating their respective task settings, background concepts, and key challenges. Furthermore, we provide insights into the emergence of segmentation knowledge from FMs like CLIP, Stable Diffusion, and DINO. An exhaustive overview of over 300 segmentation approaches is provided to encapsulate the breadth of current research efforts. Subsequently, we engage in a discussion of open issues and potential avenues for future research. We envisage that this fresh, comprehensive, and systematic survey catalyzes the evolution of advanced image segmentation systems.

* A comprehensive survey of image segmentation in foundation model era (work in progress)

Via

Access Paper or Ask Questions

The Key of Parameter Skew in Federated Learning

Aug 21, 2024

Sifan Wang, Junfeng Liao, Ye Yuan, Riquan Zhang

Figure 1 for The Key of Parameter Skew in Federated Learning

Figure 2 for The Key of Parameter Skew in Federated Learning

Figure 3 for The Key of Parameter Skew in Federated Learning

Figure 4 for The Key of Parameter Skew in Federated Learning

Abstract:Federated Learning (FL) has emerged as an excellent solution for performing deep learning on different data owners without exchanging raw data. However, statistical heterogeneity in FL presents a key challenge, leading to a phenomenon of skewness in local model parameter distributions that researchers have largely overlooked. In this work, we propose the concept of parameter skew to describe the phenomenon that can substantially affect the accuracy of global model parameter estimation. Additionally, we introduce FedSA, an aggregation strategy to obtain a high-quality global model, to address the implication from parameter skew. Specifically, we categorize parameters into high-dispersion and low-dispersion groups based on the coefficient of variation. For high-dispersion parameters, Micro-Classes (MIC) and Macro-Classes (MAC) represent the dispersion at the micro and macro levels, respectively, forming the foundation of FedSA. To evaluate the effectiveness of FedSA, we conduct extensive experiments with different FL algorithms on three computer vision datasets. FedSA outperforms eight state-of-the-art baselines by about 4.7% in test accuracy.

Via

Access Paper or Ask Questions

Macformer: Transformer with Random Maclaurin Feature Attention

Aug 21, 2024

Yuhan Guo, Lizhong Ding, Ye Yuan, Guoren Wang

Figure 1 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 2 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 3 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 4 for Macformer: Transformer with Random Maclaurin Feature Attention

Abstract:Random feature attention (RFA) adopts random fourier feature (RFF) methods to approximate the softmax function, resulting in a linear time and space attention mechanism that enables the construction of an efficient Transformer. Inspired by RFA, we propose Macformer, a Transformer architecture that employs random Maclaurin features (RMF) to approximate various dot-product kernels, thereby accelerating attention computations for long sequence. Macformer consists of Random Maclaurin Feature Attention (RMFA) and pre-post Scaling Batch Normalization (ppSBN), the former is an unbiased approximation for dot-product kernelized attention and the later is a two-stage regularization mechanism guaranteeing the error of RMFA. We conducted toy experiments to demonstrate the efficiency of RMFA and ppSBN, and experiments on long range arena (LRA) benchmark to validate the acceleration and accuracy of Macformer with different dot-product kernels. Experiment results of Macformer are consistent with our theoretical analysis.

Via

Access Paper or Ask Questions

Neighbor Overlay-Induced Graph Attention Network

Aug 16, 2024

Tiqiao Wei, Ye Yuan

Abstract:Graph neural networks (GNNs) have garnered significant attention due to their ability to represent graph data. Among various GNN variants, graph attention network (GAT) stands out since it is able to dynamically learn the importance of different nodes. However, present GATs heavily rely on the smoothed node features to obtain the attention coefficients rather than graph structural information, which fails to provide crucial contextual cues for node representations. To address this issue, this study proposes a neighbor overlay-induced graph attention network (NO-GAT) with the following two-fold ideas: a) learning favorable structural information, i.e., overlaid neighbors, outside the node feature propagation process from an adjacency matrix; b) injecting the information of overlaid neighbors into the node feature propagation process to compute the attention coefficient jointly. Empirical studies on graph benchmark datasets indicate that the proposed NO-GAT consistently outperforms state-of-the-art models.

Via

Access Paper or Ask Questions