Abstract:Medical RAG needs evidence-grounded claims, so plugging a claim-level NLI checker into retrieval-augmented RL is intuitive. \textbf{We find that the checker's \emph{output distribution} during training, not its held-out accuracy, decides whether it provides trainable gradient.} We compare four NLI checker back-ends as process rewards inside a GRPO-trained medical RAG agent (Qwen2.5-7B, replicated on Qwen3-4B and Llama-3.1-8B) across four held-out medical QA benchmarks. Three diagnostic findings emerge. \textbf{(i)} Signal collapse is log-prob-specific: LLM log-probability scoring labels over 97\% of claims neutral -- collapsing the RL gradient to zero -- while a calibrated MedNLI classifier scores the same pairs non-degenerately. \textbf{(ii)} Moderate signal beats strong signal on answer quality: a strong proprietary checker triggers a three-step reward-hacking cascade -- ultra-short answers, search avoidance, language collapse -- so a moderate-signal local classifier trains a higher-quality model (\textbf{+12\% BERTScore over zero-shot, no GPT dependency}). \textbf{(iii)} Signal strength is policy-dependent: the same checker registers as moderate on one policy but strong on another without triggering the cascade end-state. We frame these as boundary conditions for verifier-as-reward systems.
Abstract:This paper presents a comprehensive solution to address the critical challenge of liquid leaks in the oil and gas industry, leveraging advanced computer vision and deep learning methodologies. Employing You Only Look Once (YOLO) and Real-Time Detection Transformer (RT DETR) models, our project focuses on enhancing early identification of liquid leaks in key infrastructure components such as pipelines, pumps, and tanks. Through the integration of surveillance thermal cameras and sensors, the combined YOLO and RT DETR models demonstrate remarkable efficacy in the continuous monitoring and analysis of visual data within oil and gas facilities. YOLO's real-time object detection capabilities swiftly recognize leaks and their patterns, while RT DETR excels in discerning specific leak-related features, particularly in thermal images. This approach significantly improves the accuracy and speed of leak detection, ultimately mitigating environmental and financial risks associated with liquid leaks.