Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhizhuo Yin

M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

May 13, 2025

Zhizhuo Yin, Yuk Hang Tsui, Pan Hui

Figure 1 for M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Figure 2 for M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Figure 3 for M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Figure 4 for M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis

Abstract:Generating full-body human gestures encompassing face, body, hands, and global movements from audio is a valuable yet challenging task in virtual avatar creation. Previous systems focused on tokenizing the human gestures framewisely and predicting the tokens of each frame from the input audio. However, one observation is that the number of frames required for a complete expressive human gesture, defined as granularity, varies among different human gesture patterns. Existing systems fail to model these gesture patterns due to the fixed granularity of their gesture tokens. To solve this problem, we propose a novel framework named Multi-Granular Gesture Generator (M3G) for audio-driven holistic gesture generation. In M3G, we propose a novel Multi-Granular VQ-VAE (MGVQ-VAE) to tokenize motion patterns and reconstruct motion sequences from different temporal granularities. Subsequently, we proposed a multi-granular token predictor that extracts multi-granular information from audio and predicts the corresponding motion tokens. Then M3G reconstructs the human gestures from the predicted tokens using the MGVQ-VAE. Both objective and subjective experiments demonstrate that our proposed M3G framework outperforms the state-of-the-art methods in terms of generating natural and expressive full-body human gestures.

* 9 Pages, 4 figures, submitted to NIPS 2025

Via

Access Paper or Ask Questions

Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

Feb 15, 2024

Anqi Wang, Zhizhuo Yin, Yulu Hu, Yuanyuan Mao, Pan Hui

Figure 1 for Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

Figure 2 for Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

Figure 3 for Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

Figure 4 for Exploring the Potential of Large Language Models in Artistic Creation: Collaboration and Reflection on Creative Programming

Abstract:Recently, the potential of large language models (LLMs) has been widely used in assisting programming. However, current research does not explore the artist potential of LLMs in creative coding within artist and AI collaboration. Our work probes the reflection type of artists in the creation process with such collaboration. We compare two common collaboration approaches: invoking the entire program and multiple subtasks. Our findings exhibit artists' different stimulated reflections in two different methods. Our finding also shows the correlation of reflection type with user performance, user satisfaction, and subjective experience in two collaborations through conducting two methods, including experimental data and qualitative interviews. In this sense, our work reveals the artistic potential of LLM in creative coding. Meanwhile, we provide a critical lens of human-AI collaboration from the artists' perspective and expound design suggestions for future work of AI-assisted creative tasks.

* 15 pages, 4 figures

Via

Access Paper or Ask Questions

APT-Pipe: An Automatic Prompt-Tuning Tool for Social Computing Data Annotation

Feb 08, 2024

Yiming Zhu, Zhizhuo Yin, Gareth Tyson, Ehsan-Ul Haq, Lik-Hang Lee, Pan Hui

Figure 1 for APT-Pipe: An Automatic Prompt-Tuning Tool for Social Computing Data Annotation

Figure 2 for APT-Pipe: An Automatic Prompt-Tuning Tool for Social Computing Data Annotation

Figure 3 for APT-Pipe: An Automatic Prompt-Tuning Tool for Social Computing Data Annotation

Figure 4 for APT-Pipe: An Automatic Prompt-Tuning Tool for Social Computing Data Annotation

Abstract:Recent research has highlighted the potential of LLM applications, like ChatGPT, for performing label annotation on social computing text. However, it is already well known that performance hinges on the quality of the input prompts. To address this, there has been a flurry of research into prompt tuning -- techniques and guidelines that attempt to improve the quality of prompts. Yet these largely rely on manual effort and prior knowledge of the dataset being annotated. To address this limitation, we propose APT-Pipe, an automated prompt-tuning pipeline. APT-Pipe aims to automatically tune prompts to enhance ChatGPT's text classification performance on any given dataset. We implement APT-Pipe and test it across twelve distinct text classification datasets. We find that prompts tuned by APT-Pipe help ChatGPT achieve higher weighted F1-score on nine out of twelve experimented datasets, with an improvement of 7.01% on average. We further highlight APT-Pipe's flexibility as a framework by showing how it can be extended to support additional tuning mechanisms.

* Just accepted by WWW 2024

Via

Access Paper or Ask Questions