Abstract:A companion study introduced joint durability into the dimensional design of the Theo Jansen walking linkage and found its classical "holy numbers" Pareto-dominated, but it modelled the revolute joints as ideal, clearance-free pins, so its wear figures were relative rankings, not a prediction of in-service degradation. Here we relax that idealization. We build a forward-dynamic model of the Jansen leg in which a revolute joint becomes a clearance joint with a continuous normal contact law (Lankarani-Flores, hysteresis-damped) and Ambrosio friction, integrated as a constraint-stabilized differential-algebraic system, and couple it to the Archard law in a wear->clearance->impact feedback loop. Three findings emerge. First, neglecting clearance underestimates the peak joint load: the clearance model gives a peak contact force of ~104 N at the load-bearing pin against ~48 N for the ideal joint (~2x amplification), rising to ~426 N when two joints carry clearance at once. Second, the coupling is strongly impact-sensitive--single trajectories are non-monotonic and can reverse the design ranking, a chaos consistent with the literature--so designs must be compared statistically; over an ensemble of 16 randomized phases the optimized joint is robustly more durable, with per-cycle wear ~9-7x lower (peak force ~4x lower) at one clearance joint and still ~1.7x lower on both with two (p<0.01 throughout). Third, the wear is strongly non-uniform--it concentrates on a ~10 deg load arc--so assuming uniform clearance growth underestimates local clearance growth by ~36x. The clearance-free durability advantage thus survives the chaotic, multi-joint, non-uniformly-worn coupling in the ensemble mean. We deliver the first clearance-coupled forward-dynamic model of the Jansen leg and specify a falsifiable protocol to test each prediction.
Abstract:The Jansen linkage is a single-degree-of-freedom planar leg mechanism whose eleven "holy numbers" were evolved by Theo Jansen to optimize the foot-path gait alone, with no regard for the wear of its revolute joints. This paper introduces a durability objective into the design of the Jansen leg. A parametric forward-kinematic model (two-circle-intersection solver), an inverse-dynamic model (constraint-Jacobian / Lagrange-multiplier formulation of a seven-body, ten-joint system, independently cross-verified by a reduced-DOF energy method), and an Archard wear model are coupled to evaluate, for any set of link lengths, both gait quality and the per-cycle sliding wear at every pin. Because the wear is computed on ideal, clearance-free revolute joints, the resulting wear figures are a relative comparative ranking rather than an absolute life prediction. A bi-objective problem -- composite gait error versus total joint wear, subject to step-length, ground-clearance, duty-factor and assembly constraints -- is solved with NSGA-II. Under the adopted gait metric the classical Jansen design is Pareto-dominated: for a representative design, link-length adjustments within +/-29% simultaneously flatten the stance (-28%), smooth the stance velocity (-58%) and reduce total joint wear by ~56%. A sensitivity study shows the wear advantage is robust across a crank-speed x payload envelope (48%-56%) and identifies the link lengths that most strongly govern wear. A variance-based global (Sobol) analysis confirms that two link lengths dominate the wear variance, and a Monte-Carlo manufacturing-tolerance study shows the wear advantage degrades gracefully under realistic fabrication error. The framework provides a practical route to longer-lived walking linkages and a baseline for future wear-clearance-impact coupled studies.
Abstract:As Multimodal Large Language Models (MLLMs) mature, GUI agents are evolving from static interactions to complex navigation. While Reinforcement Learning (RL) has emerged as a promising paradigm for training MLLM agents on dynamic GUI tasks, its effective application faces a dilemma. Standard Offline RL often relies on static step-level data, neglecting global trajectory semantics such as task completion and execution quality. Conversely, Online RL captures the long-term dynamics but suffers from high interaction costs and potential environmental instability. To bridge this gap, we propose SOLAR-RL (Semi-Online Long-horizon Assignment Reinforcement Learning). Instead of relying solely on expensive online interactions, our framework integrates global trajectory insights directly into the offline learning process. Specifically, we reconstruct diverse rollout candidates from static data, detect the first failure point using per-step validity signals, and retroactively assign dense step-level rewards with target-aligned shaping to reflect trajectory-level execution quality, effectively simulating online feedback without interaction costs. Extensive experiments demonstrate that SOLAR-RL significantly improves long-horizon task completion rates and robustness compared to strong baselines, offering a sample-efficient solution for autonomous GUI navigation.
Abstract:Reinforcement learning (RL) has been widely used to train LLM agents for multi-turn interactive tasks, but its sample efficiency is severely limited by sparse rewards and long horizons. On-policy self-distillation (OPSD) alleviates this by providing dense token-level supervision from a privileged teacher that has access to ground-truth answers. However, such fixed privileged information cannot capture the diverse valid strategies in agent tasks, and naively combining OPSD with RL often leads to training collapse. To address these limitations, we introduce Skill-SD, a framework that turns the agent's own trajectories into dynamic training-only supervision. Completed trajectories are summarized into compact natural language skills that describe successful behaviors, mistakes, and workflows. These skills serve as dynamic privileged information conditioning only the teacher, while the student always acts under the plain task prompt and learns to internalize the guidance through distillation. To stabilize the training, we derive an importance-weighted reverse-KL loss to provide gradient-correct token-level distillation, and dynamically synchronize the teacher with the improving student. Experimental results on agentic benchmarks demonstrate that Skill-SD substantially outperforms the standard RL baseline, improving both vanilla GRPO (+14.0%/+10.9% on AppWorld/Sokoban) and vanilla OPD (+42.1%/+40.6%). Project page: https://k1xe.github.io/skill-sd/
Abstract:We present Genie Sim PanoRecon, a feed-forward Gaussian-splatting pipeline that delivers high-fidelity, low-cost 3D scenes for robotic manipulation simulation. The panorama input is decomposed into six non-overlapping cube-map faces, processed in parallel, and seamlessly reassembled. To guarantee geometric consistency across views, we devise a depth-aware fusion strategy coupled with a training-free depth-injection module that steers the monocular feed-forward network to generate coherent 3D Gaussians. The whole system reconstructs photo-realistic scenes in seconds and has been integrated into Genie Sim - a LLM-driven simulation platform for embodied synthetic data generation and evaluation - to provide scalable backgrounds for manipulation tasks. For code details, please refer to: https://github.com/AgibotTech/genie_sim/tree/main/source/geniesim_world.
Abstract:The development of robust and generalizable robot learning models is critically contingent upon the availability of large-scale, diverse training data and reliable evaluation benchmarks. Collecting data in the physical world poses prohibitive costs and scalability challenges, and prevailing simulation benchmarks frequently suffer from fragmentation, narrow scope, or insufficient fidelity to enable effective sim-to-real transfer. To address these challenges, we introduce Genie Sim 3.0, a unified simulation platform for robotic manipulation. We present Genie Sim Generator, a large language model (LLM)-powered tool that constructs high-fidelity scenes from natural language instructions. Its principal strength resides in rapid and multi-dimensional generalization, facilitating the synthesis of diverse environments to support scalable data collection and robust policy evaluation. We introduce the first benchmark that pioneers the application of LLM for automated evaluation. It leverages LLM to mass-generate evaluation scenarios and employs Vision-Language Model (VLM) to establish an automated assessment pipeline. We also release an open-source dataset comprising more than 10,000 hours of synthetic data across over 200 tasks. Through systematic experimentation, we validate the robust zero-shot sim-to-real transfer capability of our open-source dataset, demonstrating that synthetic data can server as an effective substitute for real-world data under controlled conditions for scalable policy training. For code and dataset details, please refer to: https://github.com/AgibotTech/genie_sim.