Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi Ren

Arizona State University

Language Model Evolution: An Iterated Learning Perspective

Apr 04, 2024

Yi Ren, Shangmin Guo, Linlu Qiu, Bailin Wang, Danica J. Sutherland

Figure 1 for Language Model Evolution: An Iterated Learning Perspective

Figure 2 for Language Model Evolution: An Iterated Learning Perspective

Figure 3 for Language Model Evolution: An Iterated Learning Perspective

Figure 4 for Language Model Evolution: An Iterated Learning Perspective

Abstract:With the widespread adoption of Large Language Models (LLMs), the prevalence of iterative interactions among these models is anticipated to increase. Notably, recent advancements in multi-round self-improving methods allow LLMs to generate new examples for training subsequent models. At the same time, multi-agent LLM systems, involving automated interactions among agents, are also increasing in prominence. Thus, in both short and long terms, LLMs may actively engage in an evolutionary process. We draw parallels between the behavior of LLMs and the evolution of human culture, as the latter has been extensively studied by cognitive scientists for decades. Our approach involves leveraging Iterated Learning (IL), a Bayesian framework that elucidates how subtle biases are magnified during human cultural evolution, to explain some behaviors of LLMs. This paper outlines key characteristics of agents' behavior in the Bayesian-IL framework, including predictions that are supported by experimental verification with various LLMs. This theoretical framework could help to more effectively predict and guide the evolution of LLMs in desired directions.

Via

Access Paper or Ask Questions

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Mar 27, 2024

Weidong Xie, Lun Luo, Nanfei Ye, Yi Ren, Shaoyi Du, Minhang Wang, Jintao Xu, Rui Ai, Weihao Gu, Xieyuanli Chen

Figure 1 for ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Figure 2 for ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Figure 3 for ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Figure 4 for ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

Abstract:Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.

* 8 pages, 11 figures, conference

Via

Access Paper or Ask Questions

State-Constrained Zero-Sum Differential Games with One-Sided Information

Mar 05, 2024

Mukesh Ghimire, Lei Zhang, Zhe Xu, Yi Ren

Figure 1 for State-Constrained Zero-Sum Differential Games with One-Sided Information

Figure 2 for State-Constrained Zero-Sum Differential Games with One-Sided Information

Figure 3 for State-Constrained Zero-Sum Differential Games with One-Sided Information

Figure 4 for State-Constrained Zero-Sum Differential Games with One-Sided Information

Abstract:We study zero-sum differential games with state constraints and one-sided information, where the informed player (Player 1) has a categorical payoff type unknown to the uninformed player (Player 2). The goal of Player 1 is to minimize his payoff without violating the constraints, while that of Player 2 is to either violate the state constraints, or otherwise, to maximize the payoff. One example of the game is a man-to-man matchup in football. Without state constraints, Cardaliaguet (2007) showed that the value of such a game exists and is convex to the common belief of players. Our theoretical contribution is an extension of this result to differential games with state constraints and the derivation of the primal and dual subdynamic principles necessary for computing the behavioral strategies. Compared with existing works on imperfect-information dynamic games that focus on scalability and generalization, our focus is instead on revealing the mechanism of belief manipulation behaviors resulted from information asymmetry and state constraints. We use a simplified football game to demonstrate the utility of this work, where we reveal player positions and belief states in which the attacker should (or should not) play specific random fake moves to take advantage of information asymmetry, and compute how the defender should respond.

Via

Access Paper or Ask Questions

Benchmarking Large Multimodal Models against Common Corruptions

Jan 22, 2024

Jiawei Zhang, Tianyu Pang, Chao Du, Yi Ren, Bo Li, Min Lin

Figure 1 for Benchmarking Large Multimodal Models against Common Corruptions

Figure 2 for Benchmarking Large Multimodal Models against Common Corruptions

Figure 3 for Benchmarking Large Multimodal Models against Common Corruptions

Figure 4 for Benchmarking Large Multimodal Models against Common Corruptions

Abstract:This technical report aims to fill a deficiency in the assessment of large multimodal models (LMMs) by specifically examining the self-consistency of their outputs when subjected to common corruptions. We investigate the cross-modal interactions between text, image, and speech, encompassing four essential generation tasks: text-to-image, image-to-text, text-to-speech, and speech-to-text. We create a comprehensive benchmark, named MMCBench, that covers more than 100 popular LMMs (totally over 150 model checkpoints). A thorough evaluation under common corruptions is critical for practical deployment and facilitates a better understanding of the reliability of cutting-edge LMMs. The benchmarking code is available at https://github.com/sail-sg/MMCBench

* Technical report

Via

Access Paper or Ask Questions

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Jan 20, 2024

Zhenhui Ye, Tianyun Zhong, Yi Ren, Jiaqi Yang, Weichuang Li, Jiawei Huang, Ziyue Jiang, Jinzheng He, Rongjie Huang, Jinglin Liu(+4 more)

Figure 1 for Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Figure 2 for Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Figure 3 for Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Figure 4 for Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis

Abstract:One-shot 3D talking portrait generation aims to reconstruct a 3D avatar from an unseen image, and then animate it with a reference video or audio to generate a talking portrait video. The existing methods fail to simultaneously achieve the goals of accurate 3D avatar reconstruction and stable talking face animation. Besides, while the existing works mainly focus on synthesizing the head part, it is also vital to generate natural torso and background segments to obtain a realistic talking portrait video. To address these limitations, we present Real3D-Potrait, a framework that (1) improves the one-shot 3D reconstruction power with a large image-to-plane model that distills 3D prior knowledge from a 3D face generative model; (2) facilitates accurate motion-conditioned animation with an efficient motion adapter; (3) synthesizes realistic video with natural torso movement and switchable background using a head-torso-background super-resolution model; and (4) supports one-shot audio-driven talking face generation with a generalizable audio-to-motion model. Extensive experiments show that Real3D-Portrait generalizes well to unseen identities and generates more realistic talking portrait videos compared to previous methods. Video samples and source code are available at https://real3dportrait.github.io .

* ICLR 2024 (Spotlight). Project page: https://real3dportrait.github.io

Via

Access Paper or Ask Questions

Sample Relationship from Learning Dynamics Matters for Generalisation

Jan 16, 2024

Shangmin Guo, Yi Ren, Stefano V. Albrecht, Kenny Smith

Figure 1 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 2 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 3 for Sample Relationship from Learning Dynamics Matters for Generalisation

Figure 4 for Sample Relationship from Learning Dynamics Matters for Generalisation

Abstract:Although much research has been done on proposing new models or loss functions to improve the generalisation of artificial neural networks (ANNs), less attention has been directed to the impact of the training data on generalisation. In this work, we start from approximating the interaction between samples, i.e. how learning one sample would modify the model's prediction on other samples. Through analysing the terms involved in weight updates in supervised learning, we find that labels influence the interaction between samples. Therefore, we propose the labelled pseudo Neural Tangent Kernel (lpNTK) which takes label information into consideration when measuring the interactions between samples. We first prove that lpNTK asymptotically converges to the empirical neural tangent kernel in terms of the Frobenius norm under certain assumptions. Secondly, we illustrate how lpNTK helps to understand learning phenomena identified in previous work, specifically the learning difficulty of samples and forgetting events during learning. Moreover, we also show that using lpNTK to identify and remove poisoning training samples does not hurt the generalisation performance of ANNs.

* ICLR-2024

Via

Access Paper or Ask Questions

Pontryagin Neural Operator for Solving Parametric General-Sum Differential Games

Jan 03, 2024

Lei Zhang, Mukesh Ghimire, Zhe Xu, Wenlong Zhang, Yi Ren

Abstract:The values of two-player general-sum differential games are viscosity solutions to Hamilton-Jacobi-Isaacs (HJI) equations. Value and policy approximations for such games suffer from the curse of dimensionality (CoD). Alleviating CoD through physics-informed neural networks (PINN) encounters convergence issues when value discontinuity is present due to state constraints. On top of these challenges, it is often necessary to learn generalizable values and policies across a parametric space of games, e.g., for game parameter inference when information is incomplete. To address these challenges, we propose in this paper a Pontryagin-mode neural operator that outperforms existing state-of-the-art (SOTA) on safety performance across games with parametric state constraints. Our key contribution is the introduction of a costate loss defined on the discrepancy between forward and backward costate rollouts, which are computationally cheap. We show that the discontinuity of costate dynamics (in the presence of state constraints) effectively enables the learning of discontinuous values, without requiring manually supervised data as suggested by the current SOTA. More importantly, we show that the close relationship between costates and policies makes the former critical in learning feedback control policies with generalizable safety performance.

* Submitted to L4DC 2024

Via

Access Paper or Ask Questions

EnchantDance: Unveiling the Potential of Music-Driven Dance Movement

Dec 26, 2023

Bo Han, Yi Ren, Hao Peng, Teng Zhang, Zeyu Ling, Xiang Yin, Feilin Han

Abstract:The task of music-driven dance generation involves creating coherent dance movements that correspond to the given music. While existing methods can produce physically plausible dances, they often struggle to generalize to out-of-set data. The challenge arises from three aspects: 1) the high diversity of dance movements and significant differences in the distribution of music modalities, which make it difficult to generate music-aligned dance movements. 2) the lack of a large-scale music-dance dataset, which hinders the generation of generalized dance movements from music. 3) The protracted nature of dance movements poses a challenge to the maintenance of a consistent dance style. In this work, we introduce the EnchantDance framework, a state-of-the-art method for dance generation. Due to the redundancy of the original dance sequence along the time axis, EnchantDance first constructs a strong dance latent space and then trains a dance diffusion model on the dance latent space. To address the data gap, we construct a large-scale music-dance dataset, ChoreoSpectrum3D Dataset, which includes four dance genres and has a total duration of 70.32 hours, making it the largest reported music-dance dataset to date. To enhance consistency between music genre and dance style, we pre-train a music genre prediction network using transfer learning and incorporate music genre as extra conditional information in the training of the dance diffusion model. Extensive experiments demonstrate that our proposed framework achieves state-of-the-art performance on dance quality, diversity, and consistency.

Via

Access Paper or Ask Questions

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

Dec 19, 2023

Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

Abstract:Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting. While recognising the significance of CSS task, the prior studies have not thoroughly investigated the emotional expressiveness problems due to the scarcity of emotional conversational datasets and the difficulty of stateful emotion modeling. In this paper, we propose a novel emotional CSS model, termed ECSS, that includes two main components: 1) to enhance emotion understanding, we introduce a heterogeneous graph-based emotional context modeling mechanism, which takes the multi-source dialogue history as input to model the dialogue context and learn the emotion cues from the context; 2) to achieve emotion rendering, we employ a contrastive learning-based emotion renderer module to infer the accurate emotion style for the target utterance. To address the issue of data scarcity, we meticulously create emotional labels in terms of category and intensity, and annotate additional emotional information on the existing conversational dataset (DailyTalk). Both objective and subjective evaluations suggest that our model outperforms the baseline models in understanding and rendering emotions. These evaluations also underscore the importance of comprehensive emotional annotations. Code and audio samples can be found at: https://github.com/walker-hyf/ECSS.

* 9 pages, 4 figures, Accepted by AAAI'2024, Code and audio samples: https://github.com/walker-hyf/ECSS

Via

Access Paper or Ask Questions

Hacking Task Confounder in Meta-Learning

Dec 10, 2023

Jingyao Wang, Wenwen Qiang, Yi Ren, Zeen Song, Xingzhe Su, Changwen Zheng

Figure 1 for Hacking Task Confounder in Meta-Learning

Figure 2 for Hacking Task Confounder in Meta-Learning

Figure 3 for Hacking Task Confounder in Meta-Learning

Figure 4 for Hacking Task Confounder in Meta-Learning

Abstract:Meta-learning enables rapid generalization to new tasks by learning meta-knowledge from a variety of tasks. It is intuitively assumed that the more tasks a model learns in one training batch, the richer knowledge it acquires, leading to better generalization performance. However, contrary to this intuition, our experiments reveal an unexpected result: adding more tasks within a single batch actually degrades the generalization performance. To explain this unexpected phenomenon, we conduct a Structural Causal Model (SCM) for causal analysis. Our investigation uncovers the presence of spurious correlations between task-specific causal factors and labels in meta-learning. Furthermore, the confounding factors differ across different batches. We refer to these confounding factors as ``Task Confounders". Based on this insight, we propose a plug-and-play Meta-learning Causal Representation Learner (MetaCRL) to eliminate task confounders. It encodes decoupled causal factors from multiple tasks and utilizes an invariant-based bi-level optimization mechanism to ensure their causality for meta-learning. Extensive experiments on various benchmark datasets demonstrate that our work achieves state-of-the-art (SOTA) performance.

* 9 pages, 5 figures, 4 tables

Via

Access Paper or Ask Questions