Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingzhi Zhang

GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Jul 02, 2025

GLM-V Team, :, Wenyi Hong, Wenmeng Yu, Xiaotao Gu, Guo Wang, Guobing Gan, Haomiao Tang, Jiale Cheng, Ji Qi(+69 more)

Figure 1 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Figure 2 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Figure 3 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Figure 4 for GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Abstract:We present GLM-4.1V-Thinking, a vision-language model (VLM) designed to advance general-purpose multimodal understanding and reasoning. In this report, we share our key findings in the development of the reasoning-centric training framework. We first develop a capable vision foundation model with significant potential through large-scale pre-training, which arguably sets the upper bound for the final performance. We then propose Reinforcement Learning with Curriculum Sampling (RLCS) to unlock the full potential of the model, leading to comprehensive capability enhancement across a diverse range of tasks, including STEM problem solving, video understanding, content recognition, coding, grounding, GUI-based agents, and long document understanding. We open-source GLM-4.1V-9B-Thinking, which achieves state-of-the-art performance among models of comparable size. In a comprehensive evaluation across 28 public benchmarks, our model outperforms Qwen2.5-VL-7B on nearly all tasks and achieves comparable or even superior performance on 18 benchmarks relative to the significantly larger Qwen2.5-VL-72B. Notably, GLM-4.1V-9B-Thinking also demonstrates competitive or superior performance compared to closed-source models such as GPT-4o on challenging tasks including long document understanding and STEM reasoning, further underscoring its strong capabilities. Code, models and more information are released at https://github.com/THUDM/GLM-4.1V-Thinking.

Via

Access Paper or Ask Questions

Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Apr 08, 2023

Meng Wang, Tian Lin, Lianyu Wang, Aidi Lin, Ke Zou, Xinxing Xu, Yi Zhou, Yuanyuan Peng, Qingquan Meng, Yiming Qian(+14 more)

Figure 1 for Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Figure 2 for Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Figure 3 for Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Figure 4 for Uncertainty-inspired Open Set Learning for Retinal Anomaly Identification

Abstract:Failure to recognize samples from the classes unseen during training is a major limit of artificial intelligence (AI) in real-world implementation of retinal anomaly classification. To resolve this obstacle, we propose an uncertainty-inspired open-set (UIOS) model which was trained with fundus images of 9 common retinal conditions. Besides the probability of each category, UIOS also calculates an uncertainty score to express its confidence. Our UIOS model with thresholding strategy achieved an F1 score of 99.55%, 97.01% and 91.91% for the internal testing set, external testing set and non-typical testing set, respectively, compared to the F1 score of 92.20%, 80.69% and 64.74% by the standard AI model. Furthermore, UIOS correctly predicted high uncertainty scores, which prompted the need for a manual check, in the datasets of rare retinal diseases, low-quality fundus images, and non-fundus images. This work provides a robust method for real-world screening of retinal anomalies.

Via

Access Paper or Ask Questions

A Novel Data Segmentation Method for Data-driven Phase Identification

Nov 20, 2021

Han Pyo Lee, Mingzhi Zhang, Mesut Baran, Ning Lu, PJ Rehm, Edmond Miller, Matthew Makdad

Figure 1 for A Novel Data Segmentation Method for Data-driven Phase Identification

Figure 2 for A Novel Data Segmentation Method for Data-driven Phase Identification

Figure 3 for A Novel Data Segmentation Method for Data-driven Phase Identification

Figure 4 for A Novel Data Segmentation Method for Data-driven Phase Identification

Abstract:This paper presents a smart meter phase identification algorithm for two cases: meter-phase-label-known and meter-phase-label-unknown. To improve the identification accuracy, a data segmentation method is proposed to exclude data segments that are collected when the voltage correlation between smart meters on the same phase are weakened. Then, using the selected data segments, a hierarchical clustering method is used to calculate the correlation distances and cluster the smart meters. If the phase labels are unknown, a Connected-Triple-based Similarity (CTS) method is adapted to further improve the phase identification accuracy of the ensemble clustering method. The methods are developed and tested on both synthetic and real feeder data sets. Simulation results show that the proposed phase identification algorithm outperforms the state-of-the-art methods in both accuracy and robustness.

* 5 pages, 6 figures, 2022 PES General Meeting

Via

Access Paper or Ask Questions