Abstract:Can humans detect AI-generated financial documents better than machines? We present GPT4o-Receipt, a benchmark of 1,235 receipt images pairing GPT-4o-generated receipts with authentic ones from established datasets, evaluated by five state-of-the-art multimodal LLMs and a 30-annotator crowdsourced perceptual study. Our findings reveal a striking paradox: humans are better at seeing AI artifacts, yet worse at detecting AI documents. Human annotators exhibit the largest visual discrimination gap of any evaluator, yet their binary detection F1 falls well below Claude Sonnet 4 and below Gemini 2.5 Flash. This paradox resolves once the mechanism is understood: the dominant forensic signals in AI-generated receipts are arithmetic errors -- invisible to visual inspection but systematically verifiable by LLMs. Humans cannot perceive that a subtotal is incorrect; LLMs verify it in milliseconds. Beyond the human--LLM comparison, our five-model evaluation reveals dramatic performance disparities and calibration differences that render simple accuracy metrics insufficient for detector selection. GPT4o-Receipt, the evaluation framework, and all results are released publicly to support future research in AI document forensics.




Abstract:The price of carbon emission rights play a crucial role in carbon trading markets. Therefore, accurate prediction of the price is critical. Taking the Shanghai pilot market as an example, this paper attempted to design a carbon emission purchasing strategy for enterprises, and establish a carbon emission price prediction model to help them reduce the purchasing cost. To make predictions more precise, we built a hybrid deep learning model by embedding Generalized Autoregressive Conditional Heteroskedastic (GARCH) into the Gate Recurrent Unit (GRU) model, and compared the performance with those of other models. Then, based on the Iceberg Order Theory and the predicted price, we proposed the purchasing strategy of carbon emission rights. As a result, the prediction errors of the GARCH-GRU model with a 5-day sliding time window were the minimum values of all six models. And in the simulation, the purchasing strategy based on the GARCH-GRU model was executed with the least cost as well. The carbon emission purchasing strategy constructed by the hybrid deep learning method can accurately send out timing signals, and help enterprises reduce the purchasing cost of carbon emission permits.