Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Weixuan Yuan

Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading

May 19, 2025

Ming Ding, Rasmus Kyng, Federico Solda, Weixuan Yuan

Figure 1 for Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading

Figure 2 for Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading

Figure 3 for Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading

Figure 4 for Assessing GPT Performance in a Proof-Based University-Level Course Under Blind Grading

Abstract:As large language models (LLMs) advance, their role in higher education, particularly in free-response problem-solving, requires careful examination. This study assesses the performance of GPT-4o and o1-preview under realistic educational conditions in an undergraduate algorithms course. Anonymous GPT-generated solutions to take-home exams were graded by teaching assistants unaware of their origin. Our analysis examines both coarse-grained performance (scores) and fine-grained reasoning quality (error patterns). Results show that GPT-4o consistently struggles, failing to reach the passing threshold, while o1-preview performs significantly better, surpassing the passing score and even exceeding the student median in certain exercises. However, both models exhibit issues with unjustified claims and misleading arguments. These findings highlight the need for robust assessment strategies and AI-aware grading policies in education.

Via

Access Paper or Ask Questions

Why Are You Wrong? Counterfactual Explanations for Language Grounding with 3D Objects

May 09, 2025

Tobias Preintner, Weixuan Yuan, Qi Huang, Adrian König, Thomas Bäck, Elena Raponi, Niki van Stein

Abstract:Combining natural language and geometric shapes is an emerging research area with multiple applications in robotics and language-assisted design. A crucial task in this domain is object referent identification, which involves selecting a 3D object given a textual description of the target. Variability in language descriptions and spatial relationships of 3D objects makes this a complex task, increasing the need to better understand the behavior of neural network models in this domain. However, limited research has been conducted in this area. Specifically, when a model makes an incorrect prediction despite being provided with a seemingly correct object description, practitioners are left wondering: "Why is the model wrong?". In this work, we present a method answering this question by generating counterfactual examples. Our method takes a misclassified sample, which includes two objects and a text description, and generates an alternative yet similar formulation that would have resulted in a correct prediction by the model. We have evaluated our approach with data from the ShapeTalk dataset along with three distinct models. Our counterfactual examples maintain the structure of the original description, are semantically similar and meaningful. They reveal weaknesses in the description, model bias and enhance the understanding of the models behavior. Theses insights help practitioners to better interact with systems as well as engineers to improve models.

* Accepted at IJCNN 2025

Via

Access Paper or Ask Questions

Generation of Musical Timbres using a Text-Guided Diffusion Model

Apr 12, 2025

Weixuan Yuan, Qadeer Khan, Vladimir Golkov

Abstract:In recent years, text-to-audio systems have achieved remarkable success, enabling the generation of complete audio segments directly from text descriptions. While these systems also facilitate music creation, the element of human creativity and deliberate expression is often limited. In contrast, the present work allows composers, arrangers, and performers to create the basic building blocks for music creation: audio of individual musical notes for use in electronic instruments and DAWs. Through text prompts, the user can specify the timbre characteristics of the audio. We introduce a system that combines a latent diffusion model and multi-modal contrastive learning to generate musical timbres conditioned on text descriptions. By jointly generating the magnitude and phase of the spectrogram, our method eliminates the need for subsequently running a phase retrieval algorithm, as related methods do. Audio examples, source code, and a web app are available at https://wxuanyuan.github.io/Musical-Note-Generation/

* 10 pages, 5 figures

Via

Access Paper or Ask Questions