Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Topic:Handwriting Recognition

Can MLLMs Read Students' Minds? Unpacking Multimodal Error Analysis in Handwritten Math

Mar 26, 2026

Dingjie Song, Tianlong Xu, Yi-Fan Zhang, Hang Li, Zhiling Yan, Xing Fan, Haoyang Li, Lichao Sun, Qingsong Wen

Abstract:Assessing student handwritten scratchwork is crucial for personalized educational feedback but presents unique challenges due to diverse handwriting, complex layouts, and varied problem-solving approaches. Existing educational NLP primarily focuses on textual responses and neglects the complexity and multimodality inherent in authentic handwritten scratchwork. Current multimodal large language models (MLLMs) excel at visual reasoning but typically adopt an "examinee perspective", prioritizing generating correct answers rather than diagnosing student errors. To bridge these gaps, we introduce ScratchMath, a novel benchmark specifically designed for explaining and classifying errors in authentic handwritten mathematics scratchwork. Our dataset comprises 1,720 mathematics samples from Chinese primary and middle school students, supporting two key tasks: Error Cause Explanation (ECE) and Error Cause Classification (ECC), with seven defined error types. The dataset is meticulously annotated through rigorous human-machine collaborative approaches involving multiple stages of expert labeling, review, and verification. We systematically evaluate 16 leading MLLMs on ScratchMath, revealing significant performance gaps relative to human experts, especially in visual recognition and logical reasoning. Proprietary models notably outperform open-source models, with large reasoning models showing strong potential for error explanation. All evaluation data and frameworks are publicly available to facilitate further research.

* Accepted by the 27th International Conference on Artificial Intelligence in Education (AIED'26)

Via

Access Paper or Ask Questions

Supervised Contrastive Learning Framework for Electroencephalography-based Air-writing Recognition

Mar 20, 2026

Anant Jain, Ayush Tripathi

Abstract:Electroencephalography (EEG) - based air-writing recognition offers a human-computer interaction paradigm by decoding neural activity associated with handwriting movements. Despite its potential, reliable EEG-based air-writing recognition remains challenging due to low signal-to-noise ratio and pronounced inter-subject variability. In this study, we examine the use of supervised contrastive learning to improve representation learning for EEG-based air-writing recognition. The analysis is conducted on preprocessed EEG signals and independent component analysis (ICA)-derived neural components obtained from five participants, with trials segmented from -1 to 2 s relative to movement on-set. EEGNet and DeepConvNet architectures are evaluated under both conventional cross-entropy training and a supervised contrastive learning framework using a subject-dependent five-fold cross-validation scheme. The results indicate that supervised contrastive learning consistently improves classification accuracy across architectures and feature representations. For preprocessed EEG signals, the mean accuracy increases from 33.45% to 43.77% and from 29.14% to 38.06% with EEGNet and DeepConvNet, respectively. Using ICA components, higher mean accuracies of 49.21% and 43.32% are achieved with EEGNet and DeepConvNet, respectively. These results suggest that the supervised contrastive learning framework offers an efficient extension to existing EEG-based air-writing recognition approaches.

Via

Access Paper or Ask Questions

From Pen Strokes to Sleep States: Detecting Low-Recovery Days Using Sigma-Lognormal Handwriting Features

Mar 12, 2026

Chisa Tanaka, Andrew Vargo, Anna Scius-Bertrand, Andreas Fischer, Koichi Kise

Abstract:While handwriting has traditionally been studied for character recognition and disease classification, its potential to reflect day-to-day physiological fluctuations in healthy individuals remains unexplored. This study examines whether daily variations in sleep-related recovery states can be inferred from online handwriting dynamics. % We propose a personalized binary classification framework that detects low-recovery days using features derived from the Sigma-Lognormal model, which captures the neuromotor generation process of pen strokes. In a 28-day in-the-wild study involving 13 university students, handwriting was recorded three times daily, and nocturnal cardiac indicators were measured using a wearable ring. For each participant, the lowest (or highest) quartile of four sleep-related metrics -- HRV, lowest heart rate, average heart rate, and total sleep duration -- defined the positive class. Leave-One-Day-Out cross-validation showed that PR-AUC significantly exceeded the baseline (0.25) for all four variables after FDR correction, with the strongest performance observed for cardiac-related variables. Importantly, classification performance did not differ significantly across task types or recording timings, indicating that recovery-related signals are embedded in general movement dynamics. These results demonstrate that subtle within-person autonomic recovery fluctuations can be detected from everyday handwriting, opening a new direction for non-invasive, device-independent health monitoring.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions

Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead

Feb 04, 2026

Jindong Li, Dario Zanca, Vincent Christlein, Tim Hamann, Jens Barth, Peter Kämpf, Björn Eskofier

Abstract:Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but entails memory constraints. To address this, we propose Error-enhanced Contrastive Handwriting Recognition (ECHWR), a training framework designed to improve feature representation and recognition accuracy without increasing inference costs. ECHWR utilizes a temporary auxiliary branch that aligns sensor signals with semantic text embeddings during the training phase. This alignment is maintained through a dual contrastive objective: an in-batch contrastive loss for general modality alignment and a novel error-based contrastive loss that distinguishes between correct signals and synthetic hard negatives. The auxiliary branch is discarded after training, which allows the deployed model to keep its original, efficient architecture. Evaluations on the OnHW-Words500 dataset show that ECHWR significantly outperforms state-of-the-art baselines, reducing character error rates by up to 7.4% on the writer-independent split and 10.4% on the writer-dependent split. Finally, although our ablation studies indicate that solving specific challenges require specific architectural and objective configurations, error-based contrastive loss shows its effectiveness for handling unseen writing styles.

Via

Access Paper or Ask Questions

Symbol-Aware Reasoning with Masked Discrete Diffusion for Handwritten Mathematical Expression Recognition

Feb 03, 2026

Takaya Kawakatsu, Ryo Ishiyama

Abstract:Handwritten Mathematical Expression Recognition (HMER) requires reasoning over diverse symbols and 2D structural layouts, yet autoregressive models struggle with exposure bias and syntactic inconsistency. We present a discrete diffusion framework that reformulates HMER as iterative symbolic refinement instead of sequential generation. Through multi-step remasking, the proposal progressively refines both symbols and structural relations, removing causal dependencies and improving structural consistency. A symbol-aware tokenization and Random-Masking Mutual Learning further enhance syntactic alignment and robustness to handwriting diversity. On the MathWriting benchmark, the proposal achieves 5.56\% CER and 60.42\% EM, outperforming strong Transformer and commercial baselines. Consistent gains on CROHME 2014--2023 demonstrate that discrete diffusion provides a new paradigm for structure-aware visual recognition beyond generative modeling.

Via

Access Paper or Ask Questions

BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks

Dec 25, 2025

Md. Rakibul Islam, Md. Kamrozzaman Bhuiyan, Safwan Muntasir, Arifur Rahman Jawad, Most. Sharmin Sultana Samu

Abstract:Handwritten Text Recognition (HTR) is a well-established research area. In contrast, Handwritten Text Generation (HTG) is an emerging field with significant potential. This task is challenging due to the variation in individual handwriting styles. A large and diverse dataset is required to generate realistic handwritten text. However, such datasets are difficult to collect and are not readily available. Bengali is the fifth most spoken language in the world. While several studies exist for languages such as English and Arabic, Bengali handwritten text generation has received little attention. To address this gap, we propose a method for generating Bengali handwritten words. We developed and used a self-collected dataset of Bengali handwriting samples. The dataset includes contributions from approximately five hundred individuals across different ages and genders. All images were pre-processed to ensure consistency and quality. Our approach demonstrates the ability to produce diverse handwritten outputs from input plain text. We believe this work contributes to the advancement of Bengali handwriting generation and can support further research in this area.

* Accepted for publication in 2025 28th International Conference on Computer and Information Technology (ICCIT)

Via

Access Paper or Ask Questions

Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Dec 19, 2025

Shubham Kumar Nigam, Parjanya Aditya Shukla, Noel Shallum, Arnab Bhattacharya

Figure 1 for Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Figure 2 for Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Figure 3 for Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Figure 4 for Seeing Justice Clearly: Handwritten Legal Document Translation with OCR and Vision-Language Models

Abstract:Handwritten text recognition (HTR) and machine translation continue to pose significant challenges, particularly for low-resource languages like Marathi, which lack large digitized corpora and exhibit high variability in handwriting styles. The conventional approach to address this involves a two-stage pipeline: an OCR system extracts text from handwritten images, which is then translated into the target language using a machine translation model. In this work, we explore and compare the performance of traditional OCR-MT pipelines with Vision Large Language Models that aim to unify these stages and directly translate handwritten text images in a single, end-to-end step. Our motivation is grounded in the urgent need for scalable, accurate translation systems to digitize legal records such as FIRs, charge sheets, and witness statements in India's district and high courts. We evaluate both approaches on a curated dataset of handwritten Marathi legal documents, with the goal of enabling efficient legal document processing, even in low-resource environments. Our findings offer actionable insights toward building robust, edge-deployable solutions that enhance access to legal information for non-native speakers and legal professionals alike.

* Accepted in AILaw @ AAAI 2026 Conference

Via

Access Paper or Ask Questions

Theoretical Analysis of Power-law Transformation on Images for Text Polarity Detection

Nov 11, 2025

Narendra Singh Yadav, Pavan Kumar Perepu

Abstract:Several computer vision applications like vehicle license plate recognition, captcha recognition, printed or handwriting character recognition from images etc., text polarity detection and binarization are the important preprocessing tasks. To analyze any image, it has to be converted to a simple binary image. This binarization process requires the knowledge of polarity of text in the images. Text polarity is defined as the contrast of text with respect to background. That means, text is darker than the background (dark text on bright background) or vice-versa. The binarization process uses this polarity information to convert the original colour or gray scale image into a binary image. In the literature, there is an intuitive approach based on power-law transformation on the original images. In this approach, the authors have illustrated an interesting phenomenon from the histogram statistics of the transformed images. Considering text and background as two classes, they have observed that maximum between-class variance between two classes is increasing (decreasing) for dark (bright) text on bright (dark) background. The corresponding empirical results have been presented. In this paper, we present a theoretical analysis of the above phenomenon.

Via

Access Paper or Ask Questions

Psychological stress during Examination and its estimation by handwriting in answer script

Nov 08, 2025

Abhijeet Kumar, Chetan Agarwal, Pronoy B. Neogi, Mayank Goswami

Abstract:This research explores the fusion of graphology and artificial intelligence to quantify psychological stress levels in students by analyzing their handwritten examination scripts. By leveraging Optical Character Recognition and transformer based sentiment analysis models, we present a data driven approach that transcends traditional grading systems, offering deeper insights into cognitive and emotional states during examinations. The system integrates high resolution image processing, TrOCR, and sentiment entropy fusion using RoBERTa based models to generate a numerical Stress Index. Our method achieves robustness through a five model voting mechanism and unsupervised anomaly detection, making it an innovative framework in academic forensics.

* 10 Pages, 6 Figures and 1 Table

Via

Access Paper or Ask Questions

Layout Stroke Imitation: A Layout Guided Handwriting Stroke Generation for Style Imitation with Diffusion Model

Sep 19, 2025

Sidra Hanif, Longin Jan Latecki

Abstract:Handwriting stroke generation is crucial for improving the performance of tasks such as handwriting recognition and writers order recovery. In handwriting stroke generation, it is significantly important to imitate the sample calligraphic style. The previous studies have suggested utilizing the calligraphic features of the handwriting. However, they had not considered word spacing (word layout) as an explicit handwriting feature, which results in inconsistent word spacing for style imitation. Firstly, this work proposes multi-scale attention features for calligraphic style imitation. These multi-scale feature embeddings highlight the local and global style features. Secondly, we propose to include the words layout, which facilitates word spacing for handwriting stroke generation. Moreover, we propose a conditional diffusion model to predict strokes in contrast to previous work, which directly generated style images. Stroke generation provides additional temporal coordinate information, which is lacking in image generation. Hence, our proposed conditional diffusion model for stroke generation is guided by calligraphic style and word layout for better handwriting imitation and stroke generation in a calligraphic style. Our experimentation shows that the proposed diffusion model outperforms the current state-of-the-art stroke generation and is competitive with recent image generation networks.

Via

Access Paper or Ask Questions

Topic:Handwriting Recognition

Papers and Code