Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tingyu Liu

TextFake: Benchmarking AI-Generated Image Detection on Text-Rich Images

May 31, 2026

Yuning Zhang, Changtao Miao, Mingyu Liao, Tingyu Liu, Xinghao Wang, Tao Gong, Qi Chu, Nenghai Yu

Abstract:Recent AI-generated image (AIGI) detectors perform well on natural-image benchmarks, but their behavior on text-rich forgeries, such as fabricated screenshots, documents, and news pages prevalent in misinformation, remains untested. We introduce TextFake, a 20,000-image benchmark for text-rich AIGI detection spanning 28 languages, 4 topic categories, and 2 scene modalities. Fake images are synthesized via a four-stage pipeline that annotates real images along three controlled dimensions and generates counterparts through distribution-aligned structured prompting, ruling out covariate shortcuts. Zero-shot evaluation of 14 specialized detectors and 3 frontier VLM APIs reveals a large systematic gap: no method exceeds 80% accuracy, with some dropping over 60% from natural-image benchmarks. Diagnostic evaluations identify three failure modes: the Text Density Curse, where dense glyphs overwhelm low-level detectors; Cloaking via Rendering Fidelity, where stronger text rendering suppresses enerative artifacts; and Threshold Collapse, where routine perturbations drive detectors toward chance-level performance.

Via

Access Paper or Ask Questions

Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

Jul 08, 2024

Tingyu Liu, Jun Huang, Chenyi Weng

Figure 1 for Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

Figure 2 for Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

Figure 3 for Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

Figure 4 for Spatio-Temporal Encoding and Decoding-Based Method for Future Human Activity Skeleton Synthesis

Abstract:Inferring future activity information based on observed activity data is a crucial step to improve the accuracy of early activity prediction. Traditional methods based on generative adversarial networks(GAN) or joint learning frameworks can achieve good prediction accuracy under low observation ratios, but they usually have high computational costs. In view of this, this paper proposes a spatio-temporal encoding and decoding-based method for future human activity skeleton synthesis. Firstly, algorithms such as time control, discrete cosine transform, and low-pass filtering are used to cut or pad the skeleton sequences. Secondly, the encoder and decoder are responsible for extracting intermediate semantic encoding from observed skeleton sequences and inferring future sequences from the intermediate semantic encoding, respectively. Finally, joint displacement error, velocity error, and acceleration error, three higher-order kinematic features, are used as key components of the loss function to optimize model parameters. Experimental results show that the proposed future skeleton synthesis algorithm performs better than some existing algorithms. It generates skeleton sequences with smaller errors and fewer model parameters, effectively providing future information for early activity prediction.

Via

Access Paper or Ask Questions