Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Roberto Armellin

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

Jun 11, 2026

Yashdeep Chaudhary, Roberto Armellin, Harry Holt, Marco Sagliano

Abstract:This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

* Preprint. 39 pages, 16 figures

Via

Access Paper or Ask Questions

Time-Optimal Collision Avoidance Via a Greedy Polynomial Backward Sweep

May 31, 2026

Zeno Pavanello, Frank De Veld, Roberto Armellin

Abstract:Spacecraft collision avoidance for low-thrust satellites often requires determining not only how to maneuver, but also how late a maneuver can begin while still ensuring safety. This paper presents a greedy time-optimal (GTO) backward-sweep method to find the latest maneuver initiation time. The method starts from the nominal time of closest approach and iteratively propagates the maneuver backward in time, selecting at each step the thrust direction that locally minimizes the chosen danger metric. Differential algebra is used to efficiently propagate state sensitivities and update the time of closest approach online. The method is tested on a large dataset of conjunctions, using both miss distance and probability of collision as safety metrics. The approach achieves accurate results and only a small loss of optimality relative to an optimal-control benchmark, while retaining runtimes suitable for on-board implementation.

Via

Access Paper or Ask Questions

Tiny Recursive Models for Solving the J2-Perturbed Lambert Problem

May 30, 2026

Minduli Wijayatunga, Roberto Armellin

Abstract:This paper presents a fast, recursive neural solver for the J2-perturbed Lambert problem based on Tiny Recursive Models (TRM), termed the TRM-Perturbed Lambert (TRM-PL) model. TRM is a weight-shared architecture whose effective capacity emerges from iteration depth rather than parameter count: a compact reasoning module is applied repeatedly within a two-level latent hierarchy, refining a candidate departure velocity by simulating the J2 trajectory and correcting it from the resulting tracking error. This unifies initial-guess generation and iterative correction in a single, end-to-end differentiable architecture. The recursive refinement loop is a learned alternative to the homotopy and continuation schemes of classical perturbed-Lambert solvers: rather than following a hand-designed path from the Keplerian to the perturbed solution, the network learns its own sequence of corrections. We evaluate TRM-PL on three test cases of increasing difficulty: single-revolution low-Earth-orbit (LEO) transfers, multi-revolution LEO transfers, and multi-revolution Jovian transfers. Three training paradigms are compared: jointly learning the Lambert solution and the J2 correction; refining the Lambert initial velocity with target-position and J2-corrected velocity supervision; and refining it with target-position supervision alone. Across all cases, the refinement-only approaches are the most reliable. The position-supervised variant reduces the median terminal-position error from 21.7 km to 0.027 km on single-revolution LEO, from 340.9 km to 0.31 km on multi-revolution LEO, all with the same 2.3M-parameter architecture. A single Newton corrector iteration on the TRM-PL output tightens the Jovian median to 0.063 km, yielding compact models accurate enough for embedded deployment.

Via

Access Paper or Ask Questions

Sample-Free Safety Assessment of Neural Network Controllers via Taylor Methods

Feb 11, 2026

Adam Evans, Roberto Armellin

Abstract:In recent years, artificial neural networks have been increasingly studied as feedback controllers for guidance problems. While effective in complex scenarios, they lack the verification guarantees found in classical guidance policies. Their black-box nature creates significant concerns regarding trustworthiness, limiting their adoption in safety-critical spaceflight applications. This work addresses this gap by developing a method to assess the safety of a trained neural network feedback controller via automatic domain splitting and polynomial bounding. The methodology involves embedding the trained neural network into the system's dynamical equations, rendering the closed-loop system autonomous. The system flow is then approximated by high-order Taylor polynomials, which are subsequently manipulated to construct polynomial maps that project state uncertainties onto an event manifold. Automatic domain splitting ensures the polynomials are accurate over their relevant subdomains, whilst also allowing an extensive state-space to be analysed efficiently. Utilising polynomial bounding techniques, the resulting event values may be rigorously constrained and analysed within individual subdomains, thereby establishing bounds on the range of possible closed-loop outcomes from using such neural network controllers and supporting safety assessment and informed operational decision-making in real-world missions.

Via

Access Paper or Ask Questions

Can LLMs Do Rocket Science? Exploring the Limits of Complex Reasoning with GTOC 12

Feb 03, 2026

Iñaki del Campo, Pablo Cuervo, Victor Rodriguez-Fernandez, Roberto Armellin, Jack Yarndley

Abstract:Large Language Models (LLMs) have demonstrated remarkable proficiency in code generation and general reasoning, yet their capacity for autonomous multi-stage planning in high-dimensional, physically constrained environments remains an open research question. This study investigates the limits of current AI agents by evaluating them against the 12th Global Trajectory Optimization Competition (GTOC 12), a complex astrodynamics challenge requiring the design of a large-scale asteroid mining campaign. We adapt the MLE-Bench framework to the domain of orbital mechanics and deploy an AIDE-based agent architecture to autonomously generate and refine mission solutions. To assess performance beyond binary validity, we employ an "LLM-as-a-Judge" methodology, utilizing a rubric developed by domain experts to evaluate strategic viability across five structural categories. A comparative analysis of models, ranging from GPT-4-Turbo to reasoning-enhanced architectures like Gemini 2.5 Pro, and o3, reveals a significant trend: the average strategic viability score has nearly doubled in the last two years (rising from 9.3 to 17.2 out of 26). However, we identify a critical capability gap between strategy and execution. While advanced models demonstrate sophisticated conceptual understanding, correctly framing objective functions and mission architectures, they consistently fail at implementation due to physical unit inconsistencies, boundary condition errors, and inefficient debugging loops. We conclude that, while current LLMs often demonstrate sufficient knowledge and intelligence to tackle space science tasks, they remain limited by an implementation barrier, functioning as powerful domain facilitators rather than fully autonomous engineers.

* Proceedings of the AIAA SciTech 2026 Forum, January 2026
* Extended version of the paper presented at AIAA SciTech 2026 Forum. Includes futher experiments, corrections and new appendix

Via

Access Paper or Ask Questions