Abstract:Computational Fluid Dynamics (CFD)-driven training combines machine learning (ML) with CFD solvers to develop physically consistent closure models with improved predictive accuracy. In the original framework, each ML-generated candidate model is embedded in a CFD solver and evaluated against reference data, requiring hundreds to thousands of high-fidelity simulations and resulting in prohibitive computational cost for complex flows. To overcome this limitation, we propose an extended framework that integrates surrogate modeling into symbolic CFD-driven training in real time to reduce training cost. The surrogate model learns to approximate the errors of ML-generated models based on previous CFD evaluations and is continuously refined during training. Newly generated models are first assessed using the surrogate, and only those predicted to yield small errors or high uncertainty are subsequently evaluated with full CFD simulations. Discrete expressions generated by symbolic regression are mapped into a continuous space using averaged input-symbol values as inputs to a probabilistic surrogate model. To support multi-objective model training, particularly when fixed weighting of competing quantities is challenging, the surrogate is extended to a multi-output formulation by generalizing the kernel to a matrix form, providing one mean and variance prediction per training objective. Selection metrics based on these probabilistic outputs are used to identify an optimal training setup. The proposed surrogate-augmented CFD-driven training framework is demonstrated across a range of statistically one- and two-dimensional flows, including both single- and multi-expression model optimization. In all cases, the framework substantially reduces training cost while maintaining predictive accuracy comparable to that of the original CFD-driven approach.




Abstract:Prompting has emerged as a practical way to adapt frozen vision-language models (VLMs) for video anomaly detection (VAD). Yet, existing prompts are often overly abstract, overlooking the fine-grained human-object interactions or action semantics that define complex anomalies in surveillance videos. We propose ASK-Hint, a structured prompting framework that leverages action-centric knowledge to elicit more accurate and interpretable reasoning from frozen VLMs. Our approach organizes prompts into semantically coherent groups (e.g. violence, property crimes, public safety) and formulates fine-grained guiding questions that align model predictions with discriminative visual cues. Extensive experiments on UCF-Crime and XD-Violence show that ASK-Hint consistently improves AUC over prior baselines, achieving state-of-the-art performance compared to both fine-tuned and training-free methods. Beyond accuracy, our framework provides interpretable reasoning traces towards anomaly and demonstrates strong generalization across datasets and VLM backbones. These results highlight the critical role of prompt granularity and establish ASK-Hint as a new training-free and generalizable solution for explainable video anomaly detection.