Large language models, such as the well-known ChatGPT, have brought about an unexpected revolution in the field of artificial intelligence. On the one hand, they have numerous practical applications and enormous potential still to be explored. On the other hand, they are also the subject of debate from scientific, philosophical, and social perspectives: there are doubts about the exact mechanisms of their functioning and their actual capacity for language comprehension, and their applications raise ethical dilemmas. In this chapter, we describe how this technology has been developed and the fundamentals of its operation, allowing us to better understand its capabilities and limitations and to introduce some of the main debates surrounding its development and use. -- Los grandes modelos de lenguaje, como el conocido ChatGPT, han supuesto una inesperada revoluci\'on en el \'ambito de la inteligencia artificial. Por un lado, cuentan con multitud de aplicaciones pr\'acticas y un enorme potencial todav\'ia por explorar. Por otro lado, son tambi\'en objeto de debate, tanto desde el punto de vista cient\'ifico y filos\'ofico como social: hay dudas sobre los mecanismos exactos de su funcionamiento y su capacidad real de comprensi\'on del lenguaje, y sus aplicaciones plantean dilemas \'eticos. En este cap\'itulo describimos c\'omo se ha llegado a esta tecnolog\'ia y los fundamentos de su funcionamiento, permiti\'endonos as\'i comprender mejor sus capacidades y limitaciones e introducir algunos de los principales debates que rodean su desarrollo y uso.
We revisit the fermion mass problem of the $SU(5)$ grand unified theory using machine learning techniques. The original $SU(5)$ model proposed by Georgi and Glashow is incompatible with the observed fermion mass spectrum. Two remedies are known to resolve this discrepancy, one is through introducing a new interaction via a 45-dimensional field, and the other via a 24-dimensional field. We investigate which modification is more natural, defining naturalness as proximity to the original Georgi-Glashow $SU(5)$ model. Our analysis shows that, in both supersymmetric and non-supersymmetric scenarios, the model incorporating the interaction with the 24-dimensional field is more natural under this criterion. We then generalise these models by introducing a continuous parameter $y$, which takes the value 3 for the 45-dimensional field and 1.5 for the 24-dimensional field. Numerical optimisation reveals that $y \approx 0.8$ yields the closest match to the original $SU(5)$ model, indicating that this value corresponds to the most natural model according to our definition.
Formal languages theory is useful for the study of natural language. In particular, it is of interest to study the adequacy of the grammatical formalisms to express syntactic phenomena present in natural language. First, it helps to draw hypothesis about the nature and complexity of the speaker-hearer linguistic competence, a fundamental question in linguistics and other cognitive sciences. Moreover, from an engineering point of view, it allows the knowledge of practical limitations of applications based on those formalisms. In this article I introduce the adequacy problem of grammatical formalisms for natural language, also introducing some formal language theory concepts required for this discussion. Then, I review the formalisms that have been proposed in history, and the arguments that have been given to support or reject their adequacy. ----- La teor\'ia de lenguajes formales es \'util para el estudio de los lenguajes naturales. En particular, resulta de inter\'es estudiar la adecuaci\'on de los formalismos gramaticales para expresar los fen\'omenos sint\'acticos presentes en el lenguaje natural. Primero, ayuda a trazar hip\'otesis acerca de la naturaleza y complejidad de las competencias ling\"u\'isticas de los hablantes-oyentes del lenguaje, un interrogante fundamental de la ling\"u\'istica y otras ciencias cognitivas. Adem\'as, desde el punto de vista de la ingenier\'ia, permite conocer limitaciones pr\'acticas de las aplicaciones basadas en dichos formalismos. En este art\'iculo hago una introducci\'on al problema de la adecuaci\'on de los formalismos gramaticales para el lenguaje natural, introduciendo tambi\'en algunos conceptos de la teor\'ia de lenguajes formales necesarios para esta discusi\'on. Luego, hago un repaso de los formalismos que han sido propuestos a lo largo de la historia, y de los argumentos que se han dado para sostener o refutar su adecuaci\'on.
This work proposes an algorithm for taking advantage of backpropagation gradients to determine feature importance at different stages of training. Additionally, we propose a way to represent the learning process qualitatively. Experiments were performed over the Wisconsin cancer dataset provided by sklearn, and results showed an interesting convergence of the so called "learning gradients" towards the most important features. --- Este trabajo propone el algoritmo de gradientes de aprendizaje para encontrar significado en las entradas de una red neuronal. Ademas, se propone una manera de evaluarlas por orden de importancia y representar el proceso de aprendizaje a traves de las etapas de entrenamiento. Los resultados obtenidos utilizan como referencia el conjunto de datos acerca de tumores malignos y benignos en Wisconsin. Esta referencia sirvio para detectar un patron en las variables mas importantes del modelo gracias, asi como su evolucion temporal.
In this paper we present a hybrid method for the automatic detection of dermatological pathologies in medical reports. We use a large language model combined with medical ontologies to predict, given a first appointment or follow-up medical report, the pathology a person may suffer from. The results show that teaching the model to learn the type, severity and location on the body of a dermatological pathology as well as in which order it has to learn these three features significantly increases its accuracy. The article presents the demonstration of state-of-the-art results for classification of medical texts with a precision of 0.84, micro and macro F1-score of 0.82 and 0.75, and makes both the method and the dataset used available to the community. -- En este art\'iculo presentamos un m\'etodo h\'ibrido para la detecci\'on autom\'atica de patolog\'ias dermatol\'ogicas en informes m\'edicos. Usamos un modelo de lenguaje amplio en espa\~nol combinado con ontolog\'ias m\'edicas para predecir, dado un informe m\'edico de primera cita o de seguimiento, la patolog\'ia del paciente. Los resultados muestran que el tipo, la gravedad y el sitio en el cuerpo de una patolog\'ia dermatol\'ogica, as\'i como en qu\'e orden tiene un modelo que aprender esas tres caracter\'isticas, aumentan su precisi\'on. El art\'iculo presenta la demostraci\'on de resultados comparables al estado del arte de clasificaci\'on de textos m\'edicos con una precisi\'on de 0.84, micro y macro F1-score de 0.82 y 0.75, y deja a disposici\'on de la comunidad tanto el m\'etodo como el conjunto de datos utilizado.
Machine learning systems that are "right for the wrong reasons" achieve high performance through shortcuts that collapse under distributional shift. We show this pathology has a precise causal origin: autoregressive training provides no gradient signal to distinguish association P(Y|X) from intervention P(Y|do(X)), a failure we formalize as Rung Collapse. When outcome-based learning reinforces correct answers obtained through incorrect causal models, the agent becomes entrenched in flawed reasoning, a phenomenon we term Aleatoric Entrenchment. We propose Epistemic Regret Minimization (ERM), a belief revision objective that penalizes errors in causal reasoning independently of task success, and embed it within a three-layer architecture with three contributions grounded in knowledge representation: (1) a Physical Grounding Theorem proving that actions satisfying actuator independence implement valid do-operations, bridging action languages and do-calculus; (2) ERM as a causal belief revision operator satisfying AGM postulates, preventing entrenchment even when the agent succeeds for the wrong reasons; and (3) a failure mode taxonomy that classifies recurring reasoning errors and injects domain-independent guards, enabling cross-domain transfer. We prove asymptotic recovery of the true interventional distribution with finite-sample bounds. Experiments on 1,360 causal trap scenarios across six frontier LLMs reveal that Rung Collapse persists even in reasoning-enhanced models (3.7% for GPT-5.2), that steerability exhibits inverse scaling where advanced models resist generic correction, and that targeted ERM feedback recovers 53-59% of entrenched errors where outcome-level feedback fails.
This paper focuses on solving a challenging problem of blind deconvolution demixing involving modulated inputs. Specifically, multiple input signals $s_n(t)$, each bandlimited to $B$ Hz, are modulated with known random sequences $r_n(t)$ that alter at rate $Q$. Each modulated signal is convolved with a different M tap channel of impulse response $h_n(t)$, and the outputs of each channel are added at a common receiver to give the observed signal $y(t)=\sum_{n=1}^N (r_n(t)\odot s_n(t))\circledast h_n(t)$, where $\odot$ is the point wise multiplication, and $\circledast$ is circular convolution. Given this observed signal $y(t)$, we are concerned with recovering $s_n(t)$ and $h_n(t)$. We employ deterministic subspace assumption for the input signal $s_n(t)$ and keep the channel impulse response $h_n(t)$ arbitrary. We show that if modulating sequence is altered at a rate $Q \geq N^2 (B+M)$ and sample complexity bound is obeyed then all the signals and the channels, $\{s_n(t),h_n(t)\}_{n=1}^N$, can be estimated from the observed mixture $y(t)$ using gradient descent algorithm. We have performed extensive simulations that show the robustness of our algorithm and used phase transitions to numerically investigate the theoretical guarantees provided by our algorithm.
We extend directed quantum circuit synthesis (DQCS) with reinforcement learning from purely discrete gate selection to parameterized quantum state preparation with continuous single-qubit rotations \(R_x\), \(R_y\), and \(R_z\). We compare two training regimes: a one-stage agent that jointly selects the gate type, the affected qubit(s), and the rotation angle; and a two-stage variant that first proposes a discrete circuit and subsequently optimizes the rotation angles with Adam using parameter-shift gradients. Using Gymnasium and PennyLane, we evaluate Proximal Policy Optimization (PPO) and Advantage Actor--Critic (A2C) on systems comprising two to ten qubits and on targets of increasing complexity with \(λ\) ranging from one to five. Whereas A2C does not learn effective policies in this setting, PPO succeeds under stable hyperparameters (one-stage: learning rate approximately \(5\times10^{-4}\) with a self-fidelity-error threshold of 0.01; two-stage: learning rate approximately \(10^{-4}\)). Both approaches reliably reconstruct computational basis states (between 83\% and 99\% success) and Bell states (between 61\% and 77\% success). However, scalability saturates for \(λ\) of approximately three to four and does not extend to ten-qubit targets even at \(λ=2\). The two-stage method offers only marginal accuracy gains while requiring around three times the runtime. For practicality under a fixed compute budget, we therefore recommend the one-stage PPO policy, provide explicit synthesized circuits, and contrast with a classical variational baseline to outline avenues for improved scalability.
Pairwise preference learning is central to machine learning, with recent applications in aligning language models with human preferences. A typical dataset consists of triplets $(x, y^+, y^-)$, where response $y^+$ is preferred over response $y^-$ for context $x$. The Bradley--Terry (BT) model is the predominant approach, modeling preference probabilities as a function of latent score differences. Standard practice assumes data follows this model and learns the latent scores accordingly. However, real data may violate this assumption, and it remains unclear what BT learning recovers in such cases. Starting from triplet comparison data, we formalize the preference information it encodes through the conditional preference distribution (CPRD). We give precise conditions for when BT is appropriate for modeling the CPRD, and identify factors governing sample efficiency -- namely, margin and connectivity. Together, these results offer a data-centric foundation for understanding what preference learning actually recovers.
Differential privacy (DP) provides a mathematical guarantee limiting what an adversary can learn about any individual from released data. However, achieving this protection typically requires adding noise, and noise can accumulate when many statistics are measured. Existing DP synthetic data methods treat all features symmetrically, spreading noise uniformly even when the data will serve a specific prediction task. We develop a prediction-centric approach operating in three regimes depending on available structural knowledge. In the causal regime, when the causal parents of $Y$ are known and distribution shift is expected, we target the parents for robustness. In the graphical regime, when a Bayesian network structure is available and the distribution is stable, the Markov blanket of $Y$ provides a sufficient feature set for optimal prediction. In the predictive regime, when no structural knowledge exists, we select features via differentially private methods without claiming to recover causal or graphical structure. We formalize this as PRISM, a mechanism that (i) identifies a predictive feature subset according to the appropriate regime, (ii) constructs targeted summary statistics, (iii) allocates budget to minimize an upper bound on prediction error, and (iv) synthesizes data via graphical-model inference. We prove end-to-end privacy guarantees and risk bounds. Empirically, task-aware allocation improves prediction accuracy compared to generic synthesizers. Under distribution shift, targeting causal parents achieves AUC $\approx 0.73$ while correlation-based selection collapses to chance ($\approx 0.49$).