Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Zheng

Beyond the Textual: Generating Coherent Visual Options for MCQs

Aug 26, 2025

Wanqiang Wang, Longzhu He, Wei Zheng

Abstract:Multiple-choice questions (MCQs) play a crucial role in fostering deep thinking and knowledge integration in education. However, previous research has primarily focused on generating MCQs with textual options, but it largely overlooks the visual options. Moreover, generating high-quality distractors remains a major challenge due to the high cost and limited scalability of manual authoring. To tackle these problems, we propose a Cross-modal Options Synthesis (CmOS), a novel framework for generating educational MCQs with visual options. Our framework integrates Multimodal Chain-of-Thought (MCoT) reasoning process and Retrieval-Augmented Generation (RAG) to produce semantically plausible and visually similar answer and distractors. It also includes a discrimination module to identify content suitable for visual options. Experimental results on test tasks demonstrate the superiority of CmOS in content discrimination, question generation and visual option generation over existing methods across various subjects and educational levels.

* EMNLP 2025

Via

Access Paper or Ask Questions

CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation

May 26, 2025

Guang Yang, Yu Zhou, Xiang Chen, Wei Zheng, Xing Hu, Xin Zhou, David Lo, Taolue Chen

Abstract:Trustworthy evaluation methods for code snippets play a crucial role in neural code generation. Traditional methods, which either rely on reference solutions or require executable test cases, have inherent limitation in flexibility and scalability. The recent LLM-as-Judge methodology offers a promising alternative by directly evaluating functional consistency between the problem description and the generated code. To systematically understand the landscape of these LLM-as-Judge methods, we conduct a comprehensive empirical study across three diverse datasets. Our investigation reveals the pros and cons of two categories of LLM-as-Judge methods: the methods based on general foundation models can achieve good performance but require complex prompts and lack explainability, while the methods based on reasoning foundation models provide better explainability with simpler prompts but demand substantial computational resources due to their large parameter sizes. To address these limitations, we propose CODE-DITING, a novel code evaluation method that balances accuracy, efficiency and explainability. We develop a data distillation framework that effectively transfers reasoning capabilities from DeepSeek-R1671B to our CODE-DITING 1.5B and 7B models, significantly enhancing evaluation explainability and reducing the computational cost. With the majority vote strategy in the inference process, CODE-DITING 1.5B outperforms all models with the same magnitude of parameters and achieves performance which would normally exhibit in a model with 5 times of parameter scale. CODE-DITING 7B surpasses GPT-4o and DeepSeek-V3 671B, even though it only uses 1% of the parameter volume of these large models. Further experiments show that CODEDITING is robust to preference leakage and can serve as a promising alternative for code evaluation.

Via

Access Paper or Ask Questions

Visual Perturbation and Adaptive Hard Negative Contrastive Learning for Compositional Reasoning in Vision-Language Models

May 21, 2025

Xin Huang, Ruibin Li, Tong Jia, Wei Zheng, Ya Wang

Abstract:Vision-Language Models (VLMs) are essential for multimodal tasks, especially compositional reasoning (CR) tasks, which require distinguishing fine-grained semantic differences between visual and textual embeddings. However, existing methods primarily fine-tune the model by generating text-based hard negative samples, neglecting the importance of image-based negative samples, which results in insufficient training of the visual encoder and ultimately impacts the overall performance of the model. Moreover, negative samples are typically treated uniformly, without considering their difficulty levels, and the alignment of positive samples is insufficient, which leads to challenges in aligning difficult sample pairs. To address these issues, we propose Adaptive Hard Negative Perturbation Learning (AHNPL). AHNPL translates text-based hard negatives into the visual domain to generate semantically disturbed image-based negatives for training the model, thereby enhancing its overall performance. AHNPL also introduces a contrastive learning approach using a multimodal hard negative loss to improve the model's discrimination of hard negatives within each modality and a dynamic margin loss that adjusts the contrastive margin according to sample difficulty to enhance the distinction of challenging sample pairs. Experiments on three public datasets demonstrate that our method effectively boosts VLMs' performance on complex CR tasks. The source code is available at https://github.com/nynu-BDAI/AHNPL.

* Accepted at the International Joint Conference on Artificial Intelligence (IJCAI 2025)

Via

Access Paper or Ask Questions

TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Mar 06, 2025

Haowei Sun, Xintao Yan, Zhijie Qiao, Haojie Zhu, Yihao Sun, Jiawei Wang, Shengyin Shen, Darian Hogue, Rajanikant Ananta, Derek Johnson(+7 more)

Figure 1 for TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Figure 2 for TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Figure 3 for TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Figure 4 for TeraSim: Uncovering Unknown Unsafe Events for Autonomous Vehicles through Generative Simulation

Abstract:Traffic simulation is essential for autonomous vehicle (AV) development, enabling comprehensive safety evaluation across diverse driving conditions. However, traditional rule-based simulators struggle to capture complex human interactions, while data-driven approaches often fail to maintain long-term behavioral realism or generate diverse safety-critical events. To address these challenges, we propose TeraSim, an open-source, high-fidelity traffic simulation platform designed to uncover unknown unsafe events and efficiently estimate AV statistical performance metrics, such as crash rates. TeraSim is designed for seamless integration with third-party physics simulators and standalone AV stacks, to construct a complete AV simulation system. Experimental results demonstrate its effectiveness in generating diverse safety-critical events involving both static and dynamic agents, identifying hidden deficiencies in AV systems, and enabling statistical performance evaluation. These findings highlight TeraSim's potential as a practical tool for AV safety assessment, benefiting researchers, developers, and policymakers. The code is available at https://github.com/mcity/TeraSim.

Via

Access Paper or Ask Questions

A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Nov 14, 2024

Ke Xu, Ziliang Wang, Wei Zheng, Yuhao Ma, Chenglin Wang, Nengxue Jiang, Cai Cao

Figure 1 for A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Figure 2 for A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Figure 3 for A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Figure 4 for A Centralized-Distributed Transfer Model for Cross-Domain Recommendation Based on Multi-Source Heterogeneous Transfer Learning

Abstract:Cross-domain recommendation (CDR) methods are proposed to tackle the sparsity problem in click through rate (CTR) estimation. Existing CDR methods directly transfer knowledge from the source domains to the target domain and ignore the heterogeneities among domains, including feature dimensional heterogeneity and latent space heterogeneity, which may lead to negative transfer. Besides, most of the existing methods are based on single-source transfer, which cannot simultaneously utilize knowledge from multiple source domains to further improve the model performance in the target domain. In this paper, we propose a centralized-distributed transfer model (CDTM) for CDR based on multi-source heterogeneous transfer learning. To address the issue of feature dimension heterogeneity, we build a dual embedding structure: domain specific embedding (DSE) and global shared embedding (GSE) to model the feature representation in the single domain and the commonalities in the global space,separately. To solve the latent space heterogeneity, the transfer matrix and attention mechanism are used to map and combine DSE and GSE adaptively. Extensive offline and online experiments demonstrate the effectiveness of our model.

* Published in: 2022 IEEE International Conference on Data Mining (ICDM) (The authors were affiliated Hangzhou NetEase Cloud Music Technology Co., Ltd.)

Via

Access Paper or Ask Questions

Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Jul 26, 2024

Hongbin Huang, Zhonglu Lin, Wei Zheng, Jinhu Zhang, Zhibin Liu, Wei Zhou, Yu Zhang

Figure 1 for Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Figure 2 for Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Figure 3 for Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Figure 4 for Morphing median fin enhances untethered bionic robotic tuna's linear acceleration and turning maneuverability

Abstract:Median fins of fish-like swimmers play a crucial role in linear acceleration and maneuvering processes. However, few research focused on untethered robotic fish experiments. Imitating the behaviour of real tuna, we developed a free-swimming bionic tuna with a foldable dorsal fin. The erection of dorsal fin, at proper conditions, can reduce head heave by 50%, enhance linear acceleration by 15.7%, increase turning angular velocity by 32.78%, and turning radius decreasing by 33.13%. Conversely, erecting the dorsal fin increases the wetted surface area, resulting in decreased maximum speed and efficiency during steady swimming phase. This finding partially explains why tuna erect their median fins during maneuvers or acceleration and fold them afterward to reduce drag. In addition, we verified that folding the median fins after acceleration does not significantly affect locomotion efficiency. This study supports the application of morphing median fins in undulating underwater robots and helps to further understand the impact of median fins on fish locomotion.

* 7 pages, 5 figures

Via

Access Paper or Ask Questions

Ball Mill Fault Prediction Based on Deep Convolutional Auto-Encoding Network

Nov 09, 2023

Xinkun Ai, Kun Liu, Wei Zheng, Yonggang Fan, Xinwu Wu, Peilong Zhang, LiYe Wang, JanFeng Zhu, Yuan Pan

Abstract:Ball mills play a critical role in modern mining operations, making their bearing failures a significant concern due to the potential loss of production efficiency and economic consequences. This paper presents an anomaly detection method based on Deep Convolutional Auto-encoding Neural Networks (DCAN) for addressing the issue of ball mill bearing fault detection. The proposed approach leverages vibration data collected during normal operation for training, overcoming challenges such as labeling issues and data imbalance often encountered in supervised learning methods. DCAN includes the modules of convolutional feature extraction and transposed convolutional feature reconstruction, demonstrating exceptional capabilities in signal processing and feature extraction. Additionally, the paper describes the practical deployment of the DCAN-based anomaly detection model for bearing fault detection, utilizing data from the ball mill bearings of Wuhan Iron & Steel Resources Group and fault data from NASA's bearing vibration dataset. Experimental results validate the DCAN model's reliability in recognizing fault vibration patterns. This method holds promise for enhancing bearing fault detection efficiency, reducing production interruptions, and lowering maintenance costs.

* 9 pages, 11 figures

Via

Access Paper or Ask Questions

How Biomimetic Morphing Dorsal Fin Affects the Swimming Performance of a Free-swimming Tuna Robot

Oct 19, 2023

Hongbing Huang, Zhonglu Lin, Wei Zheng, Jinhu Zhang, Wei Zhou, Yu Zhang

Figure 1 for How Biomimetic Morphing Dorsal Fin Affects the Swimming Performance of a Free-swimming Tuna Robot

Figure 2 for How Biomimetic Morphing Dorsal Fin Affects the Swimming Performance of a Free-swimming Tuna Robot

Figure 3 for How Biomimetic Morphing Dorsal Fin Affects the Swimming Performance of a Free-swimming Tuna Robot

Figure 4 for How Biomimetic Morphing Dorsal Fin Affects the Swimming Performance of a Free-swimming Tuna Robot

Abstract:It is well known that tuna fish in the ocean can dynamically morph their median fins to achieve optimal hydrodynamic performance, e.g. linear acceleration and maneuverability. In this study, based on the previous studies about the median fin's hydrodynamic effects focusing on tethered conditions, we continue to explore the hydrodynamic function of tuna morphing dorsal fin in free swimming conditions for better approaching real-life situations.Here, we developed a tuna-inspired robotic fish platform that can swim independently in three dimensions, equipped with a biomimetic morphing dorsal fin magnetically attached to the robotic fish. Based on the free-swimming robotic fish platform, we investigated how the erected dorsal fin affects the speed, cost of transport (COT), and robotic fish's yaw angle at different frequencies and amplitudes. The erected dorsal fin plays a positive role in improving the yaw stability of robotic fish. However, it shows little influence on the speed and COT in our test. This remains to be further investigated in the future.

* 10 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

Multichannel consecutive data cross-extraction with 1DCNN-attention for diagnosis of power transformer

Oct 11, 2023

Wei Zheng, Guogang Zhang, Chenchen Zhao, Qianqian Zhu

Abstract:Power transformer plays a critical role in grid infrastructure, and its diagnosis is paramount for maintaining stable operation. However, the current methods for transformer diagnosis focus on discrete dissolved gas analysis, neglecting deep feature extraction of multichannel consecutive data. The unutilized sequential data contains the significant temporal information reflecting the transformer condition. In light of this, the structure of multichannel consecutive data cross-extraction (MCDC) is proposed in this article in order to comprehensively exploit the intrinsic characteristic and evaluate the states of transformer. Moreover, for the better accommodation in scenario of transformer diagnosis, one dimensional convolution neural network attention (1DCNN-attention) mechanism is introduced and offers a more efficient solution given the simplified spatial complexity. Finally, the effectiveness of MCDC and the superior generalization ability, compared with other algorithms, are validated in experiments conducted on a dataset collected from real operation cases of power transformer. Additionally, the better stability of 1DCNN-attention has also been certified.

Via

Access Paper or Ask Questions

Fast Approximation of the Shapley Values Based on Order-of-Addition Experimental Designs

Sep 16, 2023

Liuqing Yang, Yongdao Zhou, Haoda Fu, Min-Qian Liu, Wei Zheng

Abstract:Shapley value is originally a concept in econometrics to fairly distribute both gains and costs to players in a coalition game. In the recent decades, its application has been extended to other areas such as marketing, engineering and machine learning. For example, it produces reasonable solutions for problems in sensitivity analysis, local model explanation towards the interpretable machine learning, node importance in social network, attribution models, etc. However, its heavy computational burden has been long recognized but rarely investigated. Specifically, in a $d$-player coalition game, calculating a Shapley value requires the evaluation of $d!$ or $2^d$ marginal contribution values, depending on whether we are taking the permutation or combination formulation of the Shapley value. Hence it becomes infeasible to calculate the Shapley value when $d$ is reasonably large. A common remedy is to take a random sample of the permutations to surrogate for the complete list of permutations. We find an advanced sampling scheme can be designed to yield much more accurate estimation of the Shapley value than the simple random sampling (SRS). Our sampling scheme is based on combinatorial structures in the field of design of experiments (DOE), particularly the order-of-addition experimental designs for the study of how the orderings of components would affect the output. We show that the obtained estimates are unbiased, and can sometimes deterministically recover the original Shapley value. Both theoretical and simulations results show that our DOE-based sampling scheme outperforms SRS in terms of estimation accuracy. Surprisingly, it is also slightly faster than SRS. Lastly, real data analysis is conducted for the C. elegans nervous system and the 9/11 terrorist network.

Via

Access Paper or Ask Questions