Abstract:We present AutoTraces, an autoregressive vision-language-trajectory model for robot trajectory forecasting in humam-populated environments, which harnesses the inherent reasoning capabilities of large language models (LLMs) to model complex human behaviors. In contrast to prior works that rely solely on textual representations, our key innovation lies in a novel trajectory tokenization scheme, which represents waypoints with point tokens as categorical and positional markers while encoding waypoint numerical values as corresponding point embeddings, seamlessly integrated into the LLM's space through a lightweight encoder-decoder architecture. This design preserves the LLM's native autoregressive generation mechanism while extending it to physical coordinate spaces, facilitates modeling of long-term interactions in trajectory data. We further introduce an automated chain-of-thought (CoT) generation mechanism that leverages a multimodal LLM to infer spatio-temporal relationships from visual observations and trajectory data, eliminating reliance on manual annotation. Through a two-stage training strategy, our AutoTraces achieves SOTA forecasting accuracy, particularly in long-horizon prediction, while exhibiting strong cross-scene generalization and supporting flexible-length forecasting.




Abstract:Wavefront sensing from an extended object is a challenging task since the phase to be sensed is disturbed by the phase generated from the structure of the extended object. To address this problem, a general wavefront sensor was proposed. The hardware of the sensor consists of a field lens, a collimating lens, a lenslet array, and a camera. The idea for its algorithm is to eliminate the phase caused by the extended object and reconstruct the point spread function through each lenslet. As a result, the scenario of wavefront sensing from an extended object has been converted to the conventional one from a point source. Numerical simulations and experiments both verify the feasibility and the accuracy of the proposed sensor.