Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pierre Baldi

From Local to Global Order: A Theory of Neural Synaptic Balance

May 15, 2024

Pierre Baldi, Alireza Rahmansetayesh

Figure 1 for From Local to Global Order: A Theory of Neural Synaptic Balance

Figure 2 for From Local to Global Order: A Theory of Neural Synaptic Balance

Figure 3 for From Local to Global Order: A Theory of Neural Synaptic Balance

Figure 4 for From Local to Global Order: A Theory of Neural Synaptic Balance

Abstract:We develop a theory of neural synaptic balance and how it can emerge or be enforced in neural networks. For a given additive cost function $R$ (regularizer), a neuron is said to be in balance if the total cost of its input weights is equal to the total cost of its output weights. The basic example is provided by feedforward networks of ReLU units trained with $L_2$ regularizers, which exhibit balance after proper training. The theory explains this phenomenon and extends it in several directions. The first direction is the extension to bilinear and other activation functions. The second direction is the extension to more general regularizers, including all $L_p$ ($p>0$) regularizers. The third direction is the extension to non-layered architectures, recurrent architectures, convolutional architectures, as well as architectures with mixed activation functions. The theory is based on two local neuronal operations: scaling which is commutative, and balancing which is not commutative. Finally, and most importantly, given any initial set of weights, when local balancing operations are applied to each neuron in a stochastic manner, global order always emerges through the convergence of the stochastic balancing algorithm to the same unique set of balanced weights. The reason for this convergence is the existence of an underlying strictly convex optimization problem where the relevant variables are constrained to a linear, only architecture-dependent, manifold. The theory is corroborated through various simulations carried out on benchmark data sets. Scaling and balancing operations are entirely local and thus physically plausible in biological and neuromorphic networks.

Via

Access Paper or Ask Questions

Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Apr 22, 2024

Alexander Shmakov, Kevin Greif, Michael James Fenton, Aishik Ghosh, Pierre Baldi, Daniel Whiteson

Figure 1 for Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Figure 2 for Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Figure 3 for Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Figure 4 for Full Event Particle-Level Unfolding with Variable-Length Latent Variational Diffusion

Abstract:The measurements performed by particle physics experiments must account for the imperfect response of the detectors used to observe the interactions. One approach, unfolding, statistically adjusts the experimental data for detector effects. Recently, generative machine learning models have shown promise for performing unbinned unfolding in a high number of dimensions. However, all current generative approaches are limited to unfolding a fixed set of observables, making them unable to perform full-event unfolding in the variable dimensional environment of collider data. A novel modification to the variational latent diffusion model (VLD) approach to generative unfolding is presented, which allows for unfolding of high- and variable-dimensional feature spaces. The performance of this method is evaluated in the context of semi-leptonic top quark pair production at the Large Hadron Collider.

* Submission to SciPost

Via

Access Paper or Ask Questions

Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems

Mar 15, 2024

Antonios Alexos, Yu-Dai Tsai, Ian Domingo, Maryam Pishgar, Pierre Baldi

Figure 1 for Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems

Figure 2 for Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems

Figure 3 for Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems

Figure 4 for Neural Erosion: Emulating Controlled Neurodegeneration and Aging in AI Systems

Abstract:Creating controlled methods to simulate neurodegeneration in artificial intelligence (AI) is crucial for applications that emulate brain function decline and cognitive disorders. We use IQ tests performed by Large Language Models (LLMs) and, more specifically, the LLaMA 2 to introduce the concept of ``neural erosion." This deliberate erosion involves ablating synapses or neurons, or adding Gaussian noise during or after training, resulting in a controlled progressive decline in the LLMs' performance. We are able to describe the neurodegeneration in the IQ tests and show that the LLM first loses its mathematical abilities and then its linguistic abilities, while further losing its ability to understand the questions. To the best of our knowledge, this is the first work that models neurodegeneration with text data, compared to other works that operate in the computer vision domain. Finally, we draw similarities between our study and cognitive decline clinical studies involving test subjects. We find that with the application of neurodegenerative methods, LLMs lose abstract thinking abilities, followed by mathematical degradation, and ultimately, a loss in linguistic ability, responding to prompts incoherently. These findings are in accordance with human studies.

* 19 pages, 6 figures in the main text, 5 figures in the Appendix

Via

Access Paper or Ask Questions

Unraveling the Molecular Magic: AI Insights on the Formation of Extraordinarily Stretchable Hydrogels

Mar 08, 2024

Shahriar Hojjati Emmami, Ali Pilehvar Meibody, Lobat Tayebi, Mohammadamin Tavakoli, Pierre Baldi

Abstract:The deliberate manipulation of ammonium persulfate, methylenebisacrylamide, dimethyleacrylamide, and polyethylene oxide concentrations resulted in the development of a hydrogel with an exceptional stretchability, capable of extending up to 260 times its original length. This study aims to elucidate the molecular architecture underlying this unique phenomenon by exploring potential reaction mechanisms, facilitated by an artificial intelligence prediction system. Artificial intelligence predictor introduces a novel approach to interlinking two polymers, involving the formation of networks interconnected with linear chains following random chain scission. This novel configuration leads to the emergence of a distinct type of hydrogel, herein referred to as a "Span Network." Additionally, Fourier-transform infrared spectroscopy (FTIR) is used to investigate functional groups that may be implicated in the proposed mechanism, with ester formation confirmed among numerous hydroxyl end groups obtained from chain scission of PEO and carboxyl groups formed on hydrogel networks.

Via

Access Paper or Ask Questions

AttentionStitch: How Attention Solves the Speech Editing Problem

Mar 05, 2024

Antonios Alexos, Pierre Baldi

Figure 1 for AttentionStitch: How Attention Solves the Speech Editing Problem

Abstract:The generation of natural and high-quality speech from text is a challenging problem in the field of natural language processing. In addition to speech generation, speech editing is also a crucial task, which requires the seamless and unnoticeable integration of edited speech into synthesized speech. We propose a novel approach to speech editing by leveraging a pre-trained text-to-speech (TTS) model, such as FastSpeech 2, and incorporating a double attention block network on top of it to automatically merge the synthesized mel-spectrogram with the mel-spectrogram of the edited text. We refer to this model as AttentionStitch, as it harnesses attention to stitch audio samples together. We evaluate the proposed AttentionStitch model against state-of-the-art baselines on both single and multi-speaker datasets, namely LJSpeech and VCTK. We demonstrate its superior performance through an objective and a subjective evaluation test involving 15 human participants. AttentionStitch is capable of producing high-quality speech, even for words not seen during training, while operating automatically without the need for human intervention. Moreover, AttentionStitch is fast during both training and inference and is able to generate human-sounding edited speech.

* Accepted in Machine Learning for Audio workship in NeurIPS 2023

Via

Access Paper or Ask Questions

Evaluating the Performance of Large Language Models for Spanish Language in Undergraduate Admissions Exams

Dec 28, 2023

Sabino Miranda, Obdulia Pichardo-Lagunas, Bella Martínez-Seis, Pierre Baldi

Abstract:This study evaluates the performance of large language models, specifically GPT-3.5 and BARD (supported by Gemini Pro model), in undergraduate admissions exams proposed by the National Polytechnic Institute in Mexico. The exams cover Engineering/Mathematical and Physical Sciences, Biological and Medical Sciences, and Social and Administrative Sciences. Both models demonstrated proficiency, exceeding the minimum acceptance scores for respective academic programs to up to 75% for some academic programs. GPT-3.5 outperformed BARD in Mathematics and Physics, while BARD performed better in History and questions related to factual information. Overall, GPT-3.5 marginally surpassed BARD with scores of 60.94% and 60.42%, respectively.

* 11 pages, 1 figure. Submitted to a journal

Via

Access Paper or Ask Questions

Machine Learning-Enhanced Prediction of Surface Smoothness for Inertial Confinement Fusion Target Polishing Using Limited Data

Dec 16, 2023

Antonios Alexos, Junze Liu, Akash Tiwari, Kshitij Bhardwaj, Sean Hayes, Pierre Baldi, Satish Bukkapatnam, Suhas Bhandarkar

Abstract:In Inertial Confinement Fusion (ICF) process, roughly a 2mm spherical shell made of high density carbon is used as target for laser beams, which compress and heat it to energy levels needed for high fusion yield. These shells are polished meticulously to meet the standards for a fusion shot. However, the polishing of these shells involves multiple stages, with each stage taking several hours. To make sure that the polishing process is advancing in the right direction, we are able to measure the shell surface roughness. This measurement, however, is very labor-intensive, time-consuming, and requires a human operator. We propose to use machine learning models that can predict surface roughness based on the data collected from a vibration sensor that is connected to the polisher. Such models can generate surface roughness of the shells in real-time, allowing the operator to make any necessary changes to the polishing for optimal result.

* Accepted as Extended Abstract in AIM 2024

Via

Access Paper or Ask Questions

AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

Nov 02, 2023

Mohammadamin Tavakoli, Yin Ting T. Chiu, Alexander Shmakov, Ann Marie Carlton, David Van Vranken, Pierre Baldi

Figure 1 for AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

Figure 2 for AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

Figure 3 for AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

Figure 4 for AI for Interpretable Chemistry: Predicting Radical Mechanistic Pathways via Contrastive Learning

Abstract:Deep learning-based reaction predictors have undergone significant architectural evolution. However, their reliance on reactions from the US Patent Office results in a lack of interpretable predictions and limited generalization capability to other chemistry domains, such as radical and atmospheric chemistry. To address these challenges, we introduce a new reaction predictor system, RMechRP, that leverages contrastive learning in conjunction with mechanistic pathways, the most interpretable representation of chemical reactions. Specifically designed for radical reactions, RMechRP provides different levels of interpretation of chemical reactions. We develop and train multiple deep-learning models using RMechDB, a public database of radical reactions, to establish the first benchmark for predicting radical reactions. Our results demonstrate the effectiveness of RMechRP in providing accurate and interpretable predictions of radical reactions, and its potential for various applications in atmospheric chemistry.

Via

Access Paper or Ask Questions

Extended Symmetry Preserving Attention Networks for LHC Analysis

Sep 05, 2023

Michael James Fenton, Alexander Shmakov, Hideki Okawa, Yuji Li, Ko-Yang Hsiao, Shih-Chieh Hsu, Daniel Whiteson, Pierre Baldi

Abstract:Reconstructing unstable heavy particles requires sophisticated techniques to sift through the large number of possible permutations for assignment of detector objects to partons. An approach based on a generalized attention mechanism, symmetry preserving attention networks (SPANet), has been previously applied to top quark pair decays at the Large Hadron Collider, which produce six hadronic jets. Here we extend the SPANet architecture to consider multiple input streams, such as leptons, as well as global event features, such as the missing transverse momentum. In addition, we provide regression and classification outputs to supplement the parton assignment. We explore the performance of the extended capability of SPANet in the context of semi-leptonic decays of top quark pairs as well as top quark pairs produced in association with a Higgs boson. We find significant improvements in the power of three representative studies: search for ttH, measurement of the top quark mass and a search for a heavy Z' decaying to top quark pairs. We present ablation studies to provide insight on what the network has learned in each case.

Via

Access Paper or Ask Questions

Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Jul 21, 2023

Kolby Nottingham, Yasaman Razeghi, Kyungmin Kim, JB Lanier, Pierre Baldi, Roy Fox, Sameer Singh

Figure 1 for Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Figure 2 for Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Figure 3 for Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Figure 4 for Selective Perception: Optimizing State Descriptions with Reinforcement Learning for Language Model Actors

Abstract:Large language models (LLMs) are being applied as actors for sequential decision making tasks in domains such as robotics and games, utilizing their general world knowledge and planning abilities. However, previous work does little to explore what environment state information is provided to LLM actors via language. Exhaustively describing high-dimensional states can impair performance and raise inference costs for LLM actors. Previous LLM actors avoid the issue by relying on hand-engineered, task-specific protocols to determine which features to communicate about a state and which to leave out. In this work, we propose Brief Language INputs for DEcision-making Responses (BLINDER), a method for automatically selecting concise state descriptions by learning a value function for task-conditioned state descriptions. We evaluate BLINDER on the challenging video game NetHack and a robotic manipulation task. Our method improves task success rate, reduces input size and compute costs, and generalizes between LLM actors.

Via

Access Paper or Ask Questions