To assess the ability of current AI systems to correctly solve research-level mathematics problems, we tested several AI systems on a set of ten problems in a broad range of mathematical fields; these problems arose naturally in the research process of the contributors. This document includes the problems, our methodology, and the results of our testing. We provide links to supplementary documents including the human solutions, the AI-generated solutions, and the referee reports and logs for the AI-generated solutions. The ten problems were contributed by the following mathematicians: (1) Dariusz Kalociński and Theodore A. Slaman, (2) Richard Schwartz, (3) Aleksa Milojevic and Benny Sudakov, (4) Larry Guth, (5) Oleg Butkovsky, Jonathan Mattingly, and Lorenzo Zambotti, (6) Joshua Evan Greene and Duncan McCoy, (7) Sucharit Sarkar, (8) Sam Payne and Jidong (Jayden) Wang, (9) Sylvie Corteel and John Lentfer, (10) Srivatsav Kunnawalkam Elayavalli.
This paper presents an implemented CARLA-VISSIM co-simulation framework for an urban corridor comprising approximately fifteen connected intersections centered on Martin Luther King Jr. Boulevard in Chattanooga, Tennessee. The system integrates CARLA 0.10.0 Unreal Engine 5 with PTV VISSIM 2026 through a bidirectional, step-synchronized interface that couples VISSIM's microscopic vehicle, pedestrian, and signal-controller logic with CARLA's high-fidelity 3D rendering. A LiDAR-derived elevation model and RoadRunner-based High Definition (HD) map provide terrain-accurate road geometry deployed consistently across both simulators. The framework incorporates explicit actor ownership, mirrored lifecycle management, coordinate reconciliation, and a latest-state-per-actor update policy, enabling stable interaction between VISSIM-controlled traffic and a CARLA-controlled ego vehicle. A corridor-scale case study demonstrates consistent traffic-signal mirroring, synchronized vehicle-pedestrian interactions, and stable mixed-authority operation under peak loads of approximately 100 vehicles and 100 pedestrians. The deployment captures the interaction of the five signalized intersections along MLK Street and their connecting upstream and downstream intersections, revealing synchronization challenges unique to multi-intersection corridors. Results indicate that this MLK-centered corridor provides an effective testbed for verifying cross-simulator consistency and that the proposed architecture supports reliable, perception-ready co-simulation for corridor-level traffic studies.
The recent successes of neural networks producing human-like language have caused significant stir in cognitive science, with many researchers arguing that classical puzzles about human cognition and challenges to artificial intelligence are being solved by neural networks. A notable case is the argument from systematicity due to Jerry Fodor and Zenon Pylyshyn, argues that humans display systematic biconditional dependencies. For example, someone can understand the sentence "John saw Mary" just in case that they understand the sentence "Mary saw John." Symbolic systems explain this systematicity of language and thought, while neural networks offer no immediate explanation. Several recent articles argue that this challenge has now been met by neural networks. In particular, Brenden Lake and Marco Baroni argue that their meta-learning for compositionality protocol matches and perhaps explains human systematicity. We demonstrate that these conclusions are premature. Among other results, we found that their model struggles to learn rules that are even slightly out of distribution compared to their training data. Furthermore, the model behaves unsystematically even on many within-distribution problems. We conclude that Fodor and Pylyshyn's challenge to neural networks remains unmet.
The Volterra signature extends the classical path signature by incorporating general matrix-valued kernel into its iterated integral structure, yielding a flexible notion of memory for time series. Its components can be viewed as successive Picard iterates of linear controlled Volterra equations, making their exact computation of additional mathematical interest. However, the kernel introduces substantial algorithmic challenges. We provide a resolution by first decomposing the Chen-type convolution relation established in [arXiv:2603.04525] into analytic and arithmetic parts, and then introducing several efficient algorithms: a general approximative scheme with quadratic complexity $O(J^2)$ in the number of time steps $J$, an FFT-based acceleration with complexity $O(J\log J)$ for convolution kernels on uniform grids, and an exact recursion with complexity $O(JR^2)$ for kernels admitting a state-space representation of dimension $R$; retaining standard signature complexity in the path dimension and truncation level $N$. We further show that the number of factors in matrix-valued kernels of the form $K(t,s)=\sum_p k_p(t-s)A_p$ do not increase the asymptotic complexity in $J$ and $N$. Finally, we derive a finite-difference predictor--corrector scheme for the associated Volterra signature kernel. All algorithms are implemented in the publicly available JAX-based package "tensordev".
While digitized corpora have transformed the study of intellectual transmission, current methods rely heavily on lexical text reuse detection, capturing verbatim quotations but fundamentally missing paraphrases and complex implicit engagement. This paper evaluates semantic search in 18th-century intellectual history through the reception of John Locke's foundational work. Using expert annotation grounded in a semantic taxonomy, we examine whether an off-the-shelf semantic search pipeline can surface meaning-level correspondences overlooked by lexical methods. Our results demonstrate that semantic search retrieves substantially more implicit receptions than lexical baselines. However, linguistic diagnostics also reveal a "lexical gatekeeping" effect, where retrieval remains partially constrained by surface vocabulary overlap. These findings highlight both the potential and the limitations of semantic retrieval for analyzing the circulation of ideas in large historical corpora. The data is available at https://github.com/COMHIS/locke-sim-data.
In the mid-twentieth century, mathematician and polymath John von Neumann created a computational system on an array of cells as a simple model of the human brain, where each cell had one of a finite set of roles or states that he predicted would be modelled by a diffusion process. In this work, we show that such a system, when developed in a modern deep learning setting, enables the construction of an artificial neuron having specialized roles that can be learnt. We refer to this neuron as the Von Neumann neuron, and the resulting neural network from such neurons result in a self-engineered design whose architecture is only dependent on the structure and locations of its inputs and outputs on this cellular array. The mathematical framework for these Von Neumann Networks (VNNs) is also constructed and shows that they are based on the extension of neural operators and the learning of Green's functions with convolutions on a cellular topology having a diffusion signature. We also prove that these VNNs are part of a more general computational system called Cellular Machines that are computationally universal. Initial experiments show that VNN based multi-layered perceptrons outperform their equivalent deep learning variant on basic tasks, while being more parameter efficient and are capable of learning new types of tasks. This includes the ability to solve for and construct an extension of the Von Neumann (hardware) architecture common to all modern computers to cells and suggests new opportunities that could be explored.
Alan Turing is considered as a founder of current computer science together with Kurt Godel, Alonzo Church and John von Neumann. In this paper multiple new research results are presented. It is demonstrated that there would not be Alan Turing's achievements without earlier seminal contributions by Georg Cantor in the set theory and foundations of mathematics. It is proposed to introduce the measure of undecidability of problems unsolvable by Turing machines based on probability distribution of its input data, i.e., to provide the degree of unsolvabilty based on the number of undecidable instances of input data versus decidable ones. It is proposed as well to extend the Turing's work on infinite logics and Oracle machines to a whole class of super-Turing models of computation. Next, the three new complexity classes for TM undecidable problems have been defined: U-complete (Universal complete), D-complete (Diagonalization complete) and H-complete (Hypercomputation complete) classes. The above has never been defined explicitly before by other scientists, and has been inspired by Cook/Levin NP-complete class for intractable problems. Finally, an equivalent to famous P is not equal to NP unanswered question for NP-complete class, has been answered negatively for U-complete class of complexity for undecidable problems.
The minimum-norm interpolator (MNI) framework has recently attracted considerable attention as a tool for understanding generalization in overparameterized models, such as neural networks. In this work, we study the MNI under a $2$-uniform convexity assumption, which is weaker than requiring the norm to be induced by an inner product, and it typically does not admit a closed-form solution. At a high level, we show that this condition yields an upper bound on the MNI bias in both linear and nonlinear models. We further show that this bound is sharp for overparameterized linear regression when the unit ball of the norm is in isotropic (or John's) position, and the covariates are isotropic, symmetric, i.i.d. sub-Gaussian, such as vectors with i.i.d. Bernoulli entries. Finally, under the same assumption on the covariates, we prove sharp generalization bounds for the $\ell_p$-MNI when $p \in \bigl(1 + C/\log d, 2\bigr]$. To the best of our knowledge, this is the first work to establish sharp bounds for non-Gaussian covariates in linear models when the norm is not induced by an inner product. This work is deeply inspired by classical works on $K$-convexity, and more modern work on the geometry of 2-uniform and isotropic convex bodies.
Autonomous underwater vehicles (AUVs) are increasingly used to survey coral reefs, yet efficiently locating specific coral species of interest remains difficult: target species are often sparsely distributed across the reef, and an AUV with limited battery life cannot afford to search everywhere. When detections of the target itself are too sparse to provide directional guidance, the robot benefits from an additional signal to decide where to look next. We propose using the visual environmental context -- the habitat features that tend to co-occur with a target species -- as that signal. Because context features are spatially denser and often vary more smoothly than target detections, we hypothesize that a reward function targeted at broader environmental context will enable adaptive planners to make better decisions on where to go next, even in regions where no target has yet been observed. Starting from a single labeled image, our method uses patch-level DINOv2 embeddings to perform one-shot detections of both the target species and its surrounding context online. We validate our approach using real imagery collected by an AUV at two reef sites in St. John, U.S. Virgin Islands, simulating the robot's motion offline. Our results demonstrate that one-shot detection combined with adaptive context modeling enables efficient autonomous surveying, sampling up to 75$\%$ of the target in roughly half the time required by exhaustive coverage when the target is sparsely distributed, and outperforming search strategies that only use target detections.
Injury prediction in pitching depends on precise biomechanical signals, yet gold-standard measurements come from expensive, stadium-installed multi-camera systems that are unavailable outside professional venues. We present a monocular video pipeline that recovers 18 clinically relevant biomechanics metrics from broadcast footage, positioning pose-derived kinematics as a scalable source for injury-risk modeling. Built on DreamPose3D, our approach introduces a drift-controlled global lifting module that recovers pelvis trajectory via velocity-based parameterization and sliding-window inference, lifting pelvis-rooted poses into global space. To address motion blur, compression artifacts, and extreme pitching poses, we incorporate a kinematics refinement pipeline with bone-length constraints, joint-limited inverse kinematics, smoothing, and symmetry constraints to ensure temporally stable and physically plausible kinematics. On 13 professional pitchers (156 paired pitches), 16/18 metrics achieve sub-degree agreement (MAE $< 1^{\circ}$). Using these metrics for injury prediction, an automated screening model achieves AUC 0.811 for Tommy John surgery and 0.825 for significant arm injuries on 7,348 pitchers. The resulting pose-derived metrics support scalable injury-risk screening, establishing monocular broadcast video as a viable alternative to stadium-scale motion capture for biomechanics.