In this paper, a general ISAC system where the base station (BS) communicates with multiple users and performs target detection is considered. Then, a sum communication rate maximization problem is formulated, subjected to the constraints of transmit power and the minimum sensing rates of users. To solve this problem, we develop a framework that leverages deep learning algorithms to provide a three-stage solution for ISAC beamforming. The three-stage beamforming optimization solution includes three modules: 1) an unsupervised learning based feature extraction algorithm is proposed to extract fixed-size latent features while keeping its essential information from the variable channel state information (CSI); 2) a reinforcement learning (RL) based beampattern optimization algorithm is proposed to search the desired beampattern according to the extracted features; 3) a supervised learning based beamforming reconstruction algorithm is proposed to reconstruct the beamforming vector from beampattern given by the RL agent. Simulation results demonstrate that the proposed three-stage solution outperforms the baseline RL algorithm by optimizing the intuitional beampattern rather than beamforming.