Abstract:Level crossing accidents remain a significant safety concern in modern railway systems, particularly under adverse weather conditions that degrade sensor performance. This review surveys state-of-the-art sensor technologies and fusion strategies for obstacle detection at railway level crossings, with a focus on robustness, detection accuracy, and environmental resilience. Individual sensors such as inductive loops, cameras, radar, and LiDAR offer complementary strengths but involve trade-offs, including material dependence, reduced visibility, and limited resolution in harsh environments. We analyze each modality's working principles, weather-induced vulnerabilities, and mitigation strategies, including signal enhancement and machine-learning-based denoising. We further review multi-sensor fusion approaches, categorized as data-level, feature-level, and decision-level architectures, that integrate complementary information to improve reliability and fault tolerance. The survey concludes with future research directions, including adaptive fusion algorithms, real-time processing pipelines, and weather-resilient datasets to support the deployment of intelligent, fail-safe detection systems for railway safety.
Abstract:We investigate joint direction-of-arrival (DoA) and rain-rate estimation for a uniform linear array operating under rain-induced multiplicative distortions. Building on a wavefront fluctuation model whose spatial correlation is governed by the rain-rate, we derive an angle-dependent covariance formulation and use it to synthesize training data. DoA estimation is cast as a multi-label classification problem on a discretized angular grid, while rain-rate estimation is formulated as a multi-class classification task. We then propose a multi-task deep CNN with a shared feature extractor and two task-specific heads, trained using an uncertainty-weighted objective to automatically balance the two losses. Numerical results in a two-source scenario show that the proposed network achieves lower DoA RMSE than classical baselines and provides accurate rain-rate classification at moderate-to-high SNRs.
Abstract:We investigate robust direction-of-arrival (DoA) estimation for sensor arrays operating in adverse weather conditions, where weather-induced distortions degrade estimation accuracy. Building on a physics-based $S$-matrix model established in prior work, we adopt a statistical characterization of random phase and amplitude distortions caused by multiple scattering in rain. Based on this model, we develop a measurement framework for uniform linear arrays (ULAs) that explicitly incorporates such distortions. To mitigate their impact, we exploit the Hermitian Toeplitz (HT) structure of the covariance matrix to reduce the number of parameters to be estimated. We then apply a generalized least squares (GLS) approach for calibration. Simulation results show that the proposed method effectively suppresses rain-induced distortions, improves DoA estimation accuracy, and enhances radar sensing performance in challenging weather conditions.




Abstract:Image captioning has been a longstanding challenge in vision-language research. With the rise of LLMs, modern Vision-Language Models (VLMs) generate detailed and comprehensive image descriptions. However, benchmarking the quality of such captions remains unresolved. This paper addresses two key questions: (1) How well do current VLMs actually perform on image captioning, particularly compared to humans? We built CapArena, a platform with over 6000 pairwise caption battles and high-quality human preference votes. Our arena-style evaluation marks a milestone, showing that leading models like GPT-4o achieve or even surpass human performance, while most open-source models lag behind. (2) Can automated metrics reliably assess detailed caption quality? Using human annotations from CapArena, we evaluate traditional and recent captioning metrics, as well as VLM-as-a-Judge. Our analysis reveals that while some metrics (e.g., METEOR) show decent caption-level agreement with humans, their systematic biases lead to inconsistencies in model ranking. In contrast, VLM-as-a-Judge demonstrates robust discernment at both the caption and model levels. Building on these insights, we release CapArena-Auto, an accurate and efficient automated benchmark for detailed captioning, achieving 94.3% correlation with human rankings at just $4 per test. Data and resources will be open-sourced at https://caparena.github.io.