Abstract:LLMs frequently generate fictitious yet convincing citations, often expressing high confidence even when the underlying reference is wrong. We study this failure across 9 models and 108{,}000 generated references, and find that author names fail far more often than other fields across all models and settings. Citation style has no measurable effect, while reasoning-oriented distillation degrades recall. Probes trained on one field transfer at near-chance levels to the others, suggesting that hallucination signals do not generalize across fields. Building on this finding, we apply elastic-net regularization with stability selection to neuron-level CETT values of Qwen2.5-32B-Instruct and identify a sparse set of field-specific hallucination neurons (FH-neurons). Causal intervention further confirms their role: amplifying these neurons increases hallucination, while suppressing them improves performance across fields, with larger gains in some fields. These results suggest a lightweight approach to detecting and mitigating citation hallucination using internal model signals alone.
Abstract:Vision-Language-Action (VLA) models enable general-purpose robotic policies by mapping visual observations and language instructions to low-level actions, but they often lack reliable introspection. A common practice is to compute a token-level uncertainty signal and take its mean over a rollout. However, mean aggregation can dilute short-lived but safety-critical uncertainty spikes in continuous control. In particular, successful rollouts may contain localized high-entropy segments due to benign noise or non-critical micro-adjustments, while failure rollouts can appear low-entropy for most timesteps and only exhibit brief spikes near the onset of failure. We propose a unified uncertainty quantification approach for predicting rollout success versus failure that (1) uses max-based sliding window pooling to preserve transient risk signals, (2) applies motion-aware stability weighting to emphasize high-frequency action oscillations associated with unstable behaviors, and (3) performs DoF-adaptive calibration via Bayesian Optimization to prioritize kinematically critical axes. Experiments on the LIBERO benchmark show that our method substantially improves failure prediction accuracy and yields more reliable signals for failure detection, which can support downstream human-in-the-loop interventions.




Abstract:The tree pruning process is the key to promoting fruits' growth and improving their productions due to effects on the photosynthesis efficiency of fruits and nutrition transportation in branches. Currently, pruning is still highly dependent on human labor. The workers' experience will strongly affect the robustness of the performance of the tree pruning. Thus, it is a challenge for workers and farmers to evaluate the pruning performance. Intended for a better solution to the problem, this paper presents a novel pruning classification strategy model called "OTSU-SVM" to evaluate the pruning performance based on the shadows of branches and leaves. This model considers not only the available illuminated area of the tree but also the uniformity of the illuminated area of the tree. More importantly, our group implements OTSU algorithm into the model, which highly reinforces robustness of the evaluation of this model. In addition, the data from the pear trees in the Yuhang District, Hangzhou is also used in the experiment. In this experiment, we prove that the OTSU-SVM has good accuracy with 80% and high performance in the evaluation of the pruning for the pear trees. It can provide more successful pruning if applied into the orchard. A successful pruning can broaden the illuminated area of individual fruit, and increase nutrition transportation from the target branch, dramatically elevating the weights and production of the fruits.