Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maxim Velikanov

cVLA: Towards Efficient Camera-Space VLAs

Jul 02, 2025

Max Argus, Jelena Bratulic, Houman Masnavi, Maxim Velikanov, Nick Heppert, Abhinav Valada, Thomas Brox

Abstract:Vision-Language-Action (VLA) models offer a compelling framework for tackling complex robotic manipulation tasks, but they are often expensive to train. In this paper, we propose a novel VLA approach that leverages the competitive performance of Vision Language Models (VLMs) on 2D images to directly infer robot end-effector poses in image frame coordinates. Unlike prior VLA models that output low-level controls, our model predicts trajectory waypoints, making it both more efficient to train and robot embodiment agnostic. Despite its lightweight design, our next-token prediction architecture effectively learns meaningful and executable robot trajectories. We further explore the underutilized potential of incorporating depth images, inference-time techniques such as decoding strategies, and demonstration-conditioned action generation. Our model is trained on a simulated dataset and exhibits strong sim-to-real transfer capabilities. We evaluate our approach using a combination of simulated and real data, demonstrating its effectiveness on a real robotic system.

* 20 pages, 10 figures

Via

Access Paper or Ask Questions

Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method

Mar 12, 2023

Egor Chistov, Nikita Alutis, Maxim Velikanov, Dmitriy Vatolin

Figure 1 for Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method

Figure 2 for Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method

Figure 3 for Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method

Figure 4 for Color Mismatches in Stereoscopic Video: Real-World Dataset and Deep Correction Method

Abstract:We propose a real-world dataset of stereoscopic videos for color-mismatch correction. It includes real-world distortions achieved using a beam splitter. Our dataset is larger than any other for this task. We compared eight color-mismatch-correction methods on artificial and real-world datasets and showed that local methods are best suited to artificial distortions and that global methods are best suited to real-world distortions. Our efforts improved on the latest local neural-network method for color-mismatch correction in stereoscopic images, making it work faster and better on both artificial and real-world distortions.

* The code and datasets are at https://github.com/egorchistov/color-transfer/

Via

Access Paper or Ask Questions