Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Music Generation Deep Learning Research Papers

Artificial Intelligence Agents in Music Analysis: An Integrative Perspective Based on Two Use Cases

Nov 17, 2025

Antonio Manuel Martínez-Heredia, Dolores Godrid Rodríguez, Andrés Ortiz García

Abstract:This paper presents an integrative review and experimental validation of artificial intelligence (AI) agents applied to music analysis and education. We synthesize the historical evolution from rule-based models to contemporary approaches involving deep learning, multi-agent architectures, and retrieval-augmented generation (RAG) frameworks. The pedagogical implications are evaluated through a dual-case methodology: (1) the use of generative AI platforms in secondary education to foster analytical and creative skills; (2) the design of a multiagent system for symbolic music analysis, enabling modular, scalable, and explainable workflows. Experimental results demonstrate that AI agents effectively enhance musical pattern recognition, compositional parameterization, and educational feedback, outperforming traditional automated methods in terms of interpretability and adaptability. The findings highlight key challenges concerning transparency, cultural bias, and the definition of hybrid evaluation metrics, emphasizing the need for responsible deployment of AI in educational environments. This research contributes to a unified framework that bridges technical, pedagogical, and ethical considerations, offering evidence-based guidance for the design and application of intelligent agents in computational musicology and music education.

* Extended version of the conference paper presented at SATMUS 2025

Via

Access Paper or Ask Questions

Deep learning for music generation. Four approaches and their comparative evaluation

Apr 03, 2025

Razvan Paroiu, Stefan Trausan-Matu

Figure 1 for Deep learning for music generation. Four approaches and their comparative evaluation

Figure 2 for Deep learning for music generation. Four approaches and their comparative evaluation

Figure 3 for Deep learning for music generation. Four approaches and their comparative evaluation

Abstract:This paper introduces four different artificial intelligence algorithms for music generation and aims to compare these methods not only based on the aesthetic quality of the generated music but also on their suitability for specific applications. The first set of melodies is produced by a slightly modified visual transformer neural network that is used as a language model. The second set of melodies is generated by combining chat sonification with a classic transformer neural network (the same method of music generation is presented in a previous research), the third set of melodies is generated by combining the Schillinger rhythm theory together with a classic transformer neural network, and the fourth set of melodies is generated using GPT3 transformer provided by OpenAI. A comparative analysis is performed on the melodies generated by these approaches and the results indicate that significant differences can be observed between them and regarding the aesthetic value of them, GPT3 produced the most pleasing melodies, and the newly introduced Schillinger method proved to generate better sounding music than previous sonification methods.

* U.P.B. Scientific Bulletin, Series C, Vol. 85, Issue 4, 2023

Via

Access Paper or Ask Questions

Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Sep 12, 2024

Wen Qing Lim, Jinhua Liang, Huan Zhang

Figure 1 for Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Figure 2 for Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Figure 3 for Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Figure 4 for Hierarchical Symbolic Pop Music Generation with Graph Neural Networks

Abstract:Music is inherently made up of complex structures, and representing them as graphs helps to capture multiple levels of relationships. While music generation has been explored using various deep generation techniques, research on graph-related music generation is sparse. Earlier graph-based music generation worked only on generating melodies, and recent works to generate polyphonic music do not account for longer-term structure. In this paper, we explore a multi-graph approach to represent both the rhythmic patterns and phrase structure of Chinese pop music. Consequently, we propose a two-step approach that aims to generate polyphonic music with coherent rhythm and long-term structure. We train two Variational Auto-Encoder networks - one on a MIDI dataset to generate 4-bar phrases, and another on song structure labels to generate full song structure. Our work shows that the models are able to learn most of the structural nuances in the training dataset, including chord and pitch frequency distributions, and phrase attributes.

Via

Access Paper or Ask Questions

Generative AI Augmented Induction-based Formal Verification

Jul 18, 2024

Aman Kumar, Deepak Narayan Gadde

Abstract:Generative Artificial Intelligence (GenAI) has demonstrated its capabilities in the present world that reduce human effort significantly. It utilizes deep learning techniques to create original and realistic content in terms of text, images, code, music, and video. Researchers have also shown the capabilities of modern Large Language Models (LLMs) used by GenAI models that can be used to aid hardware development. Formal verification is a mathematical-based proof method used to exhaustively verify the correctness of a design. In this paper, we demonstrate how GenAI can be used in induction-based formal verification to increase the verification throughput.

* To appear at the 37th IEEE International System-on-Chip Conference, Sep 16-19 2024, Dresden, Germany

Via

Access Paper or Ask Questions

Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Aug 29, 2024

Rodrigo Diaz, Carlos De La Vega Martin, Mark Sandler

Figure 1 for Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Figure 2 for Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Figure 3 for Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Figure 4 for Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods

Abstract:This paper presents an examination of State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings. Through experiments with datasets generated under different initial conditions and sample rates, we assess the capacity of these models to accurately model the complex behaviours observed in string dynamics. Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in non-linear cases for long-sequence modelling. We inform the design of these architectures with the structure of the problems at hand. Although challenges remain in extending model predictions beyond the training horizon (i.e., extrapolation), the focus of our investigation lies in the models' ability to generalise across different initial conditions within the training time interval. This research contributes insights into the physical modelling of dynamical systems (in particular those addressing musical acoustics) by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement. Our results highlight the efficacy of these models in simulating non-linear dynamics and emphasise their wide-ranging applicability in accurately modelling dynamical systems over extended sequences.

* Accepted to DAFx2024

Via

Access Paper or Ask Questions

Reducing Barriers to the Use of Marginalised Music Genres in AI

Jul 18, 2024

Nick Bryan-Kinns, Zijin Li

Abstract:AI systems for high quality music generation typically rely on extremely large musical datasets to train the AI models. This creates barriers to generating music beyond the genres represented in dominant datasets such as Western Classical music or pop music. We undertook a 4 month international research project summarised in this paper to explore the eXplainable AI (XAI) challenges and opportunities associated with reducing barriers to using marginalised genres of music with AI models. XAI opportunities identified included topics of improving transparency and control of AI models, explaining the ethics and bias of AI models, fine tuning large models with small datasets to reduce bias, and explaining style-transfer opportunities with AI models. Participants in the research emphasised that whilst it is hard to work with small datasets such as marginalised music and AI, such approaches strengthen cultural representation of underrepresented cultures and contribute to addressing issues of bias of deep learning models. We are now building on this project to bring together a global International Responsible AI Music community and invite people to join our network.

* In Proceedings of Explainable AI for the Arts Workshop 2024 (XAIxArts 2024) arXiv:2406.14485

Via

Access Paper or Ask Questions

A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Apr 06, 2024

Roopa Mayya, Vivekanand Venkataraman, Anwesh P R, Narayana Darapaneni

Figure 1 for A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Figure 2 for A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Figure 3 for A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Figure 4 for A Novel Bi-LSTM And Transformer Architecture For Generating Tabla Music

Abstract:Introduction: Music generation is a complex task that has received significant attention in recent years, and deep learning techniques have shown promising results in this field. Objectives: While extensive work has been carried out on generating Piano and other Western music, there is limited research on generating classical Indian music due to the scarcity of Indian music in machine-encoded formats. In this technical paper, methods for generating classical Indian music, specifically tabla music, is proposed. Initially, this paper explores piano music generation using deep learning architectures. Then the fundamentals are extended to generating tabla music. Methods: Tabla music in waveform (.wav) files are pre-processed using the librosa library in Python. A novel Bi-LSTM with an Attention approach and a transformer model are trained on the extracted features and labels. Results: The models are then used to predict the next sequences of tabla music. A loss of 4.042 and MAE of 1.0814 are achieved with the Bi-LSTM model. With the transformer model, a loss of 55.9278 and MAE of 3.5173 are obtained for tabla music generation. Conclusion: The resulting music embodies a harmonious fusion of novelty and familiarity, pushing the limits of music composition to new horizons.

Via

Access Paper or Ask Questions

Large Language Models: From Notes to Musical Form

Apr 18, 2024

Lilac Atassi

Figure 1 for Large Language Models: From Notes to Musical Form

Figure 2 for Large Language Models: From Notes to Musical Form

Figure 3 for Large Language Models: From Notes to Musical Form

Figure 4 for Large Language Models: From Notes to Musical Form

Abstract:While many topics of the learning-based approach to automated music generation are under active research, musical form is under-researched. In particular, recent methods based on deep learning models generate music that, at the largest time scale, lacks any structure. In practice, music longer than one minute generated by such models is either unpleasantly repetitive or directionless. Adapting a recent music generation model, this paper proposes a novel method to generate music with form. The experimental results show that the proposed method can generate 2.5-minute-long music that is considered as pleasant as the music used to train the model. The paper first reviews a recent music generation method based on language models (transformer architecture). We discuss why learning musical form by such models is infeasible. Then we discuss our proposed method and the experiments.

Via

Access Paper or Ask Questions

Large Language and Text-to-3D Models for Engineering Design Optimization

Jul 03, 2023

Thiago Rios, Stefan Menzel, Bernhard Sendhoff

Figure 1 for Large Language and Text-to-3D Models for Engineering Design Optimization

Figure 2 for Large Language and Text-to-3D Models for Engineering Design Optimization

Figure 3 for Large Language and Text-to-3D Models for Engineering Design Optimization

Figure 4 for Large Language and Text-to-3D Models for Engineering Design Optimization

Abstract:The current advances in generative AI for learning large neural network models with the capability to produce essays, images, music and even 3D assets from text prompts create opportunities for a manifold of disciplines. In the present paper, we study the potential of deep text-to-3D models in the engineering domain, with focus on the chances and challenges when integrating and interacting with 3D assets in computational simulation-based design optimization. In contrast to traditional design optimization of 3D geometries that often searches for the optimum designs using numerical representations, such as B-Spline surface or deformation parameters in vehicle aerodynamic optimization, natural language challenges the optimization framework by requiring a different interpretation of variation operators while at the same time may ease and motivate the human user interaction. Here, we propose and realize a fully automated evolutionary design optimization framework using Shap-E, a recently published text-to-3D asset network by OpenAI, in the context of aerodynamic vehicle optimization. For representing text prompts in the evolutionary optimization, we evaluate (a) a bag-of-words approach based on prompt templates and Wordnet samples, and (b) a tokenisation approach based on prompt templates and the byte pair encoding method from GPT4. Our main findings from the optimizations indicate that, first, it is important to ensure that the designs generated from prompts are within the object class of application, i.e. diverse and novel designs need to be realistic, and, second, that more research is required to develop methods where the strength of text prompt variations and the resulting variations of the 3D designs share causal relations to some degree to improve the optimization.

* 9 pages, 13 figures, IEEE conference template

Via

Access Paper or Ask Questions

Byte Pair Encoding for Symbolic Music

Jan 27, 2023

Nathan Fradet, Jean-Pierre Briot, Fabien Chhel, Amal El Fallah Seghrouchni, Nicolas Gutowski

Figure 1 for Byte Pair Encoding for Symbolic Music

Figure 2 for Byte Pair Encoding for Symbolic Music

Figure 3 for Byte Pair Encoding for Symbolic Music

Figure 4 for Byte Pair Encoding for Symbolic Music

Abstract:The symbolic music modality is nowadays mostly represented as discrete and used with sequential models such as Transformers, for deep learning tasks. Recent research put efforts on the tokenization, i.e. the conversion of data into sequences of integers intelligible to such models. This can be achieved by many ways as music can be composed of simultaneous tracks, of simultaneous notes with several attributes. Until now, the proposed tokenizations are based on small vocabularies describing the note attributes and time events, resulting in fairly long token sequences. In this paper, we show how Byte Pair Encoding (BPE) can improve the results of deep learning models while improving its performances. We experiment on music generation and composer classification, and study the impact of BPE on how models learn the embeddings, and show that it can help to increase their isotropy, i.e., the uniformity of the variance of their positions in the space.

* Source code at https://github.com/Natooz/BPE-Symbolic-Music

Via

Access Paper or Ask Questions

Topic:Music Generation Deep Learning Research Papers

Papers and Code