Abstract:Maximum entropy reinforcement learning (MaxEnt RL) has become a standard framework for sequential decision making, yet its standard Gaussian policy parameterization is inherently unimodal, limiting its ability to model complex multimodal action distributions. This limitation has motivated increasing interest in generative policies based on diffusion and flow matching as more expressive alternatives. However, incorporating such policies into MaxEnt RL is challenging for two main reasons: the likelihood and entropy of continuous-time generative policies are generally intractable, and multi-step sampling introduces both long-horizon backpropagation instability and substantial inference latency. To address these challenges, we propose Truncated Rectified Flow Policy (TRFP), a framework built on a hybrid deterministic-stochastic architecture. This design makes entropy-regularized optimization tractable while supporting stable training and effective one-step sampling through gradient truncation and flow straightening. Empirical results on a toy multigoal environment and 10 MuJoCo benchmarks show that TRFP captures multimodal behavior effectively, outperforms strong baselines on most benchmarks under standard sampling, and remains highly competitive under one-step sampling.
Abstract:Aerial Manipulator Systems (AMS) have garnered significant interest for their utility in aerial operations. Nonetheless, challenges related to the manipulator's limited stiffness and the coupling disturbance with manipulator movement persist. This paper introduces the Aerial Tendon-Driven Manipulator (ATDM), an innovative AMS that integrates a hexrotor Unmanned Aerial Vehicle (UAV) with a 4-degree-of-freedom (4-DOF) anthropomorphic tendon-driven manipulator. The design of the manipulator is anatomically inspired, emulating the human arm anatomy from the shoulder joint downward. To enhance the structural integrity and performance, finite element topology optimization and lattice optimization are employed on the links to replicate the radially graded structure characteristic of bone, this approach effectively reduces weight and inertia while simultaneously maximizing stiffness. A novel tensioning mechanism with adjustable tension is introduced to address cable relaxation, and a Tension-amplification tendon mechanism is implemented to increase the manipulator's overall stiffness and output. The paper presents a kinematic model based on virtual coupled joints, a comprehensive workspace analysis, and detailed calculations of output torques and stiffness for individual arm joints. The prototype arm has a total weight of 2.7 kg, with the end effector contributing only 0.818 kg. By positioning all actuators at the base, coupling disturbance are minimized. The paper includes a detailed mechanical design and validates the system's performance through semi-physical multi-body dynamics simulations, confirming the efficacy of the proposed design.