Abstract:Constructing production-quality machine-learned interatomic potentials (MLIPs) requires balancing accuracy, dynamical stability, and computational throughput under constraints that are not captured by a single training loss. We introduce MLIPilot, an auto-research framework in which tool-calling large language models propose hypotheses, edit MLIP training code, launch HPC jobs, and accept or revert changes using a fixed, physically constrained scorecard. We evaluate MLIPilot on MACE potential optimization using both commercial and open-weight LLM agents, including GPT-5.5, GPT-4.1, Mistral-24B, and Qwen3-32B. The benchmarks span molecular and periodic settings: a QM7-derived dataset for which we generated B3LYP/6-31G(d) energies and forces, and a Cu EMT dataset with periodic copper supercells labeled by ASE's Effective Medium Theory calculator. Across these benchmarks, the strongest agents move initially constraint-violating baselines to accepted models by discovering useful training strategies, including output normalization, loss-function changes, progressive training schedules, and model-capacity adjustments. These results suggest that LLM agents can serve as autonomous operators for scientific machine-learning workflows when their search is constrained by domain-specific validation criteria, shifting part of MLIP development from manual trial-and-error toward auditable, automated experimentation.
Abstract:Quantum machine learning is often motivated by the idea that quantum systems can expose useful high-dimensional structure that is difficult to access with classical models. We isolate one central component of this claim: the fixed data-encoding map. Amplitude, angle, and basis encoding are evaluated as deterministic feature maps for classical supervised learning under matched output dimensionality and strong classical controls. The benchmark compares these encodings against raw linear models, random Fourier features, polynomial features, PCA, RBF SVMs, and shallow neural networks across diverse classical datasets. Rather than treating performance as a single endpoint, we analyze the geometry of each representation through effective rank, condition number, centered kernel alignment, predictive performance, and practical overhead. The resulting picture is mechanistic: amplitude encoding can remove magnitude information through unit-sphere normalization, angle encoding can become geometrically redundant with raw linear features, and basis encoding can impose a binary Hamming geometry that is poorly aligned with smooth decision structure. These findings do not argue against quantum computation, however, they show that fixed quantum-inspired encoding geometry alone is not a reliable source of machine-learning advantage on classical data.