Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shenhui Zhang

Watch Before You Answer: Learning from Visually Grounded Post-Training

Apr 06, 2026

Yuxuan Zhang, EunJeong Hwang, Huaisong Zhang, Penghui Du, Yiming Jia, Dongfu Jiang, Xuan He, Shenhui Zhang, Ping Nie, Peter West(+1 more)

Abstract:It is critical for vision-language models (VLMs) to comprehensively understand visual, temporal, and textual cues. However, despite rapid progress in multimodal modeling, video understanding performance still lags behind text-based reasoning. In this work, we find that progress is even worse than previously assumed: commonly reported long video understanding benchmarks contain 40-60% of questions that can be answered using text cues alone. Furthermore, we find that these issues are also pervasive in widely used post-training datasets, potentially undercutting the ability of post-training to improve VLM video understanding performance. Guided by this observation, we introduce VidGround as a simple yet effective solution: using only the actual visually grounded questions without any linguistic biases for post-training. When used in tandem with RL-based post-training algorithms, this simple technique improves performance by up to 6.2 points relative to using the full dataset, while using only 69.1% of the original post-training data. Moreover, we show that data curation with a simple post-training algorithm outperforms several more complex post-training techniques, highlighting that data quality is a major bottleneck for improving video understanding in VLMs. These results underscore the importance of curating post-training data and evaluation benchmarks that truly require visual grounding to advance the development of more capable VLMs. Project page: http://vidground.etuagi.com.

Via

Access Paper or Ask Questions

A Secure Data Sharing Framework for Robot Operating Systems Leveraging Ethereum

Aug 30, 2022

Shenhui Zhang, Wenkai Li, Xiaoqi Li, Boyi Liu, Yuqing Zhang, Chunjue Cao

Figure 1 for A Secure Data Sharing Framework for Robot Operating Systems Leveraging Ethereum

Figure 2 for A Secure Data Sharing Framework for Robot Operating Systems Leveraging Ethereum

Figure 3 for A Secure Data Sharing Framework for Robot Operating Systems Leveraging Ethereum

Figure 4 for A Secure Data Sharing Framework for Robot Operating Systems Leveraging Ethereum

Abstract:Robot Operating System (ROS) has brought the excellent potential for automation in various fields involving production tasks, productivity enhancement, and the simplification of human operations. However, ROS highly relies on communication but lacks secure data sharing mechanisms. Securing confidential data exchange between multi-robots presents significant challenges in multi-robot interactions. In this paper, we introduce AuthROS, a secure and convenient authorization framework for ROS nodes with absolute security and high availability based on a private Ethereum network and SM algorithms. To our best knowledge, AuthROS is the first secure data-sharing framework for robots loaded with ROS. This framework can meet the requirements for immutability and security of confidential data exchanged between ROS nodes. In addition, an authority-granting and identity-verification mechanism are proposed to execute atomically to ensure trustworthy data exchange without third-party. Both an SM2 key exchange and an SM4 plaintext encryption mechanism are proposed for data transmission security. A data digest uploading scheme is also implemented to improve the efficiency of data querying and uploading on the Ethereum network. Experimental results demonstrate that it can generate a digest from 800KB encrypted data in 6.34ms. Through security analysis, AuthROS achieves secure data exchange, data operations detection, and Node Forging attack protection.

* Currently, this manuscript has been submitted to JSS(Journal of Systems and Software)

Via

Access Paper or Ask Questions