Abstract:Automatic non-cooperative analysis of intercepted radar signals is essential for intelligent equipment in both military and civilian domains. Accurate modulation identification and parameter estimation enable effective signal classification, threat assessment, and the development of countermeasures. In this paper, we propose a symbolic approach for radar signal recognition and parameter estimation based on a vision-language model that combines context-free grammar with time-frequency representation of radar waveforms. The proposed model, called Sig2text, leverages the power of vision transformers for time-frequency feature extraction and transformer-based decoders for symbolic parsing of radar waveforms. By treating radar signal recognition as a parsing problem, Sig2text can effectively recognize and parse radar waveforms with different modulation types and parameters. We evaluate the performance of Sig2text on a synthetic radar signal dataset and demonstrate its effectiveness in recognizing and parsing radar waveforms with varying modulation types and parameters. The training code of the model is available at https://github.com/Na-choneko/sig2text.