Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Xu

Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Jul 28, 2024

Cheems Wang, Yiqin Lv, Yixiu Mao, Yun Qu, Yi Xu, Xiangyang Ji

Figure 1 for Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Figure 2 for Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Figure 3 for Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Figure 4 for Robust Fast Adaptation from Adversarially Explicit Task Distribution Generation

Abstract:Meta-learning is a practical learning paradigm to transfer skills across tasks from a few examples. Nevertheless, the existence of task distribution shifts tends to weaken meta-learners' generalization capability, particularly when the task distribution is naively hand-crafted or based on simple priors that fail to cover typical scenarios sufficiently. Here, we consider explicitly generative modeling task distributions placed over task identifiers and propose robustifying fast adaptation from adversarial training. Our approach, which can be interpreted as a model of a Stackelberg game, not only uncovers the task structure during problem-solving from an explicit generative model but also theoretically increases the adaptation robustness in worst cases. This work has practical implications, particularly in dealing with task distribution shifts in meta-learning, and contributes to theoretical insights in the field. Our method demonstrates its robustness in the presence of task subpopulation shifts and improved performance over SOTA baselines in extensive experiments. The project is available at https://sites.google.com/view/ar-metalearn.

* The project is available at https://sites.google.com/view/ar-metalearn

Via

Access Paper or Ask Questions

Diffusion Models for Multi-Task Generative Modeling

Jul 24, 2024

Changyou Chen, Han Ding, Bunyamin Sisman, Yi Xu, Ouye Xie, Benjamin Z. Yao, Son Dinh Tran, Belinda Zeng

Figure 1 for Diffusion Models for Multi-Task Generative Modeling

Figure 2 for Diffusion Models for Multi-Task Generative Modeling

Figure 3 for Diffusion Models for Multi-Task Generative Modeling

Figure 4 for Diffusion Models for Multi-Task Generative Modeling

Abstract:Diffusion-based generative modeling has been achieving state-of-the-art results on various generation tasks. Most diffusion models, however, are limited to a single-generation modeling. Can we generalize diffusion models with the ability of multi-modal generative training for more generalizable modeling? In this paper, we propose a principled way to define a diffusion model by constructing a unified multi-modal diffusion model in a common diffusion space. We define the forward diffusion process to be driven by an information aggregation from multiple types of task-data, e.g., images for a generation task and labels for a classification task. In the reverse process, we enforce information sharing by parameterizing a shared backbone denoising network with additional modality-specific decoder heads. Such a structure can simultaneously learn to generate different types of multi-modal data with a multi-task loss, which is derived from a new multi-modal variational lower bound that generalizes the standard diffusion model. We propose several multimodal generation settings to verify our framework, including image transition, masked-image training, joint image-label and joint image-representation generative modeling. Extensive experimental results on ImageNet indicate the effectiveness of our framework for various multi-modal generative modeling, which we believe is an important research direction worthy of more future explorations.

* Published as a conference paper at ICLR 2024

Via

Access Paper or Ask Questions

Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Jun 30, 2024

Shian Du, Xiaotian Cheng, Qi Qian, Henglu Wei, Yi Xu, Xiangyang Ji

Figure 1 for Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Figure 2 for Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Figure 3 for Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Figure 4 for Efficient Personalized Text-to-image Generation by Leveraging Textual Subspace

Abstract:Personalized text-to-image generation has attracted unprecedented attention in the recent few years due to its unique capability of generating highly-personalized images via using the input concept dataset and novel textual prompt. However, previous methods solely focus on the performance of the reconstruction task, degrading its ability to combine with different textual prompt. Besides, optimizing in the high-dimensional embedding space usually leads to unnecessary time-consuming training process and slow convergence. To address these issues, we propose an efficient method to explore the target embedding in a textual subspace, drawing inspiration from the self-expressiveness property. Additionally, we propose an efficient selection strategy for determining the basis vectors of the textual subspace. The experimental evaluations demonstrate that the learned embedding can not only faithfully reconstruct input image, but also significantly improves its alignment with novel input textual prompt. Furthermore, we observe that optimizing in the textual subspace leads to an significant improvement of the robustness to the initial word, relaxing the constraint that requires users to input the most relevant initial word. Our method opens the door to more efficient representation learning for personalized text-to-image generation.

Via

Access Paper or Ask Questions

The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

Jun 18, 2024

Hongpeng Pan, Shifeng Yi, Shouwei Yang, Lei Qi, Bing Hu, Yi Xu, Yang Yang

Figure 1 for The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

Figure 2 for The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

Figure 3 for The Solution for CVPR2024 Foundational Few-Shot Object Detection Challenge

Abstract:This report introduces an enhanced method for the Foundational Few-Shot Object Detection (FSOD) task, leveraging the vision-language model (VLM) for object detection. However, on specific datasets, VLM may encounter the problem where the detected targets are misaligned with the target concepts of interest. This misalignment hinders the zero-shot performance of VLM and the application of fine-tuning methods based on pseudo-labels. To address this issue, we propose the VLM+ framework, which integrates the multimodal large language model (MM-LLM). Specifically, we use MM-LLM to generate a series of referential expressions for each category. Based on the VLM predictions and the given annotations, we select the best referential expression for each category by matching the maximum IoU. Subsequently, we use these referential expressions to generate pseudo-labels for all images in the training set and then combine them with the original labeled data to fine-tune the VLM. Additionally, we employ iterative pseudo-label generation and optimization to further enhance the performance of the VLM. Our approach achieve 32.56 mAP in the final test.

* CVPR2024 Foundational Few-Shot Object Detection Challenge

Via

Access Paper or Ask Questions

OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Jun 14, 2024

Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu

Figure 1 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 2 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 3 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Figure 4 for OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control

Abstract:In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to refine and bolster its functionality. Subsequently, we utilize the pre-conditioned 2D images as a basis for optimizing a randomly initialized implicit representation (NeRF). This process is significantly expedited by a decoupled back-propagation technique, allowing for multiple updates of implicit parameters per optimization cycle. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods, as quantified by comparative metrics.

Via

Access Paper or Ask Questions

Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

May 27, 2024

Yi Xu, Yun Fu

Figure 1 for Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

Figure 2 for Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

Figure 3 for Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

Figure 4 for Deciphering Movement: Unified Trajectory Generation Model for Multi-Agent

Abstract:Understanding multi-agent behavior is critical across various fields. The conventional approach involves analyzing agent movements through three primary tasks: trajectory prediction, imputation, and spatial-temporal recovery. Considering the unique input formulation and constraint of these tasks, most existing methods are tailored to address only one specific task. However, in real-world applications, these scenarios frequently occur simultaneously. Consequently, methods designed for one task often fail to adapt to others, resulting in performance drops. To overcome this limitation, we propose a Unified Trajectory Generation model, UniTraj, that processes arbitrary trajectories as masked inputs, adaptable to diverse scenarios. Specifically, we introduce a Ghost Spatial Masking (GSM) module embedded within a Transformer encoder for spatial feature extraction. We further extend recent successful State Space Models (SSMs), particularly the Mamba model, into a Bidirectional Temporal Mamba to effectively capture temporal dependencies. Additionally, we incorporate a Bidirectional Temporal Scaled (BTS) module to comprehensively scan trajectories while maintaining the temporal missing relationships within the sequence. We curate and benchmark three practical sports game datasets, Basketball-U, Football-U, and Soccer-U, for evaluation. Extensive experiments demonstrate the superior performance of our model. To the best of our knowledge, this is the first work that addresses this unified problem through a versatile generative framework, thereby enhancing our understanding of multi-agent movement. Our datasets, code, and model weights are available at https://github.com/colorfulfuture/UniTraj-pytorch.

* Datasets, code, and model weights at available at: https://github.com/colorfulfuture/UniTraj-pytorch

Via

Access Paper or Ask Questions

Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

May 20, 2024

Qi Li, Liang Peng, Zhiyuan Wu, Pengda Ye, Weitao Zhang, Yi Xu, Qing Shi

Figure 1 for Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

Figure 2 for Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

Figure 3 for Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

Figure 4 for Modeling and simulation of a mechanism for suppressing the flipping problem of a jumping robot

Abstract:In order to solve the problem of stable jumping of micro robot, we design a special mechanism: elastic passive joint (EPJ). EPJ can assist in achieving smooth jumping through the opening-closing process when the robot jumps. First, we introduce the composition and operation principle of EPJ, and perform a dynamic modeling of the robot's jumping process. Then, in order to verify the effectiveness of EPJ in controlling the robot's smooth jump, we design a simulation experiment based on MATLAB. Through comparative experiments, it was proved that EPJ can greatly adjust the angular velocity of the robot and increase the jump distance of the robot. Finally, we analyze each parameter in EPJ and performs parameter optimization. After optimization, EPJ achieves a completely flip-free jump of the robot, laying an important foundation for improving the mobility of micro-robot.

Via

Access Paper or Ask Questions

The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

May 14, 2024

Ziquan Liu, Yufei Cui, Yan Yan, Yi Xu, Xiangyang Ji, Xue Liu, Antoni B. Chan

Figure 1 for The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Figure 2 for The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Figure 3 for The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Figure 4 for The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks

Abstract:In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).

* ICML2024

Via

Access Paper or Ask Questions

Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

May 11, 2024

Yang Yang, Nan Jiang, Yi Xu, De-Chuan Zhan

Figure 1 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 2 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 3 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Figure 4 for Robust Semi-supervised Learning by Wisely Leveraging Open-set Data

Abstract:Open-set Semi-supervised Learning (OSSL) holds a realistic setting that unlabeled data may come from classes unseen in the labeled set, i.e., out-of-distribution (OOD) data, which could cause performance degradation in conventional SSL models. To handle this issue, except for the traditional in-distribution (ID) classifier, some existing OSSL approaches employ an extra OOD detection module to avoid the potential negative impact of the OOD data. Nevertheless, these approaches typically employ the entire set of open-set data during their training process, which may contain data unfriendly to the OSSL task that can negatively influence the model performance. This inspires us to develop a robust open-set data selection strategy for OSSL. Through a theoretical understanding from the perspective of learning theory, we propose Wise Open-set Semi-supervised Learning (WiseOpen), a generic OSSL framework that selectively leverages the open-set data for training the model. By applying a gradient-variance-based selection mechanism, WiseOpen exploits a friendly subset instead of the whole open-set dataset to enhance the model's capability of ID classification. Moreover, to reduce the computational expense, we also propose two practical variants of WiseOpen by adopting low-frequency update and loss-based selection respectively. Extensive experiments demonstrate the effectiveness of WiseOpen in comparison with the state-of-the-art.

Via

Access Paper or Ask Questions

RepEval: Effective Text Evaluation with LLM Representation

Apr 30, 2024

Shuqian Sheng, Yi Xu, Tianhang Zhang, Zanwei Shen, Luoyi Fu, Jiaxin Ding, Lei Zhou, Xinbing Wang, Chenghu Zhou

Abstract:Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs for training, and through simple prompt modifications, it can easily transition to various tasks. Results on ten datasets from three tasks demonstrate the high effectiveness of our method, which exhibits stronger correlations with human judgments compared to previous metrics, even outperforming GPT-4. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.

Via

Access Paper or Ask Questions