Abstract:Industrial robotic manipulation demands reliable long-horizon execution across embodiments, tasks, and changing object distributions. While Vision-Language-Action models have demonstrated strong generalization, they remain fundamentally reactive. By optimizing the next action given the current observation without evaluating potential futures, they are brittle to the compounding failure modes of long-horizon tasks. Cortex 2.0 shifts from reactive control to plan-and-act by generating candidate future trajectories in visual latent space, scoring them for expected success and efficiency, then committing only to the highest-scoring candidate. We evaluate Cortex 2.0 on a single-arm and dual-arm manipulation platform across four tasks of increasing complexity: pick and place, item and trash sorting, screw sorting, and shoebox unpacking. Cortex 2.0 consistently outperforms state-of-the-art Vision-Language-Action baselines, achieving the best results across all tasks. The system remains reliable in unstructured environments characterized by heavy clutter, frequent occlusions, and contact-rich manipulation, where reactive policies fail. These results demonstrate that world-model-based planning can operate reliably in complex industrial environments.




Abstract:Recent works in field robotics highlighted the importance of resiliency against different types of terrains. Boreal forests, in particular, are home to many mobility-impeding terrains that should be considered for off-road autonomous navigation. Also, being one of the largest land biomes on Earth, boreal forests are an area where autonomous vehicles are expected to become increasingly common. In this paper, we address this issue by introducing BorealTC, a publicly available dataset for proprioceptive-based terrain classification (TC). Recorded with a Husky A200, our dataset contains 116 min of Inertial Measurement Unit (IMU), motor current, and wheel odometry data, focusing on typical boreal forest terrains, notably snow, ice, and silty loam. Combining our dataset with another dataset from the state-of-the-art, we evaluate both a Convolutional Neural Network (CNN) and the novel state space model (SSM)-based Mamba architecture on a TC task. Interestingly, we show that while CNN outperforms Mamba on each separate dataset, Mamba achieves greater accuracy when trained on a combination of both. In addition, we demonstrate that Mamba's learning capacity is greater than a CNN for increasing amounts of data. We show that the combination of two TC datasets yields a latent space that can be interpreted with the properties of the terrains. We also discuss the implications of merging datasets on classification. Our source code and dataset are publicly available online: https://github.com/norlab-ulaval/BorealTC.
Abstract:In the context of robotics, accurate ground truth positioning is essential for the development of Simultaneous Localization and Mapping (SLAM) and control algorithms. Robotic Total Stations (RTSs) provide accurate and precise reference positions in different types of outdoor environments, especially when compared to the limited accuracy of Global Navigation Satellite System (GNSS) in cluttered areas. Three RTSs give the possibility to obtain the six-Degrees Of Freedom (DOF) reference pose of a robotic platform. However, the uncertainty of every pose is rarely computed for trajectory evaluation. As evaluation algorithms are getting increasingly precise, it becomes crucial to take into account this uncertainty. We propose a method to compute this six-DOF uncertainty from the fusion of three RTSs based on Monte Carlo (MC) methods. This solution relies on point-to-point minimization to propagate the noise of RTSs on the pose of the robotic platform. Five main noise sources are identified to model this uncertainty: noise inherent to the instrument, tilt noise, atmospheric factors, time synchronization noise, and extrinsic calibration noise. Based on extensive experimental work, we compare the impact of each noise source on the prism uncertainty and the final estimated pose. Tested on more than 50 km of trajectories, our comparison highlighted the importance of the calibration noise and the measurement distance, which should be ideally under 75 m. Moreover, it has been noted that the uncertainty on the pose of the robot is not prominently affected by one particular noise source, compared to the others.




Abstract:Challenges inherent to autonomous wintertime navigation in forests include lack of reliable a Global Navigation Satellite System (GNSS) signal, low feature contrast, high illumination variations and changing environment. This type of off-road environment is an extreme case of situations autonomous cars could encounter in northern regions. Thus, it is important to understand the impact of this harsh environment on autonomous navigation systems. To this end, we present a field report analyzing teach-and-repeat navigation in a subarctic region while subject to large variations of meteorological conditions. First, we describe the system, which relies on point cloud registration to localize a mobile robot through a boreal forest, while simultaneously building a map. We experimentally evaluate this system in over 18.6 km of autonomous navigation in the teach-and-repeat mode. We show that dense vegetation perturbs the GNSS signal, rendering it unsuitable for navigation in forest trails. Furthermore, we highlight the increased uncertainty related to localizing using point cloud registration in forest corridors. We demonstrate that it is not snow precipitation, but snow accumulation that affects our system's ability to localize within the environment. Finally, we expose some lessons learned and challenges from our field campaign to support better experimental work in winter conditions.