User behavior has been validated to be effective in revealing personalized preferences for commercial recommendations. However, few user-item interactions can be collected for new users, which results in a null space for their interests, i.e., the cold-start dilemma. In this paper, a two-tower framework, namely, the model-agnostic interest learning (MAIL) framework, is proposed to address the cold-start recommendation (CSR) problem for recommender systems. In MAIL, one unique tower is constructed to tackle the CSR from a zero-shot view, and the other tower focuses on the general ranking task. Specifically, the zero-shot tower first performs cross-modal reconstruction with dual auto-encoders to obtain virtual behavior data from highly aligned hidden features for new users; and the ranking tower can then output recommendations for users based on the completed data by the zero-shot tower. Practically, the ranking tower in MAIL is model-agnostic and can be implemented with any embedding-based deep models. Based on the co-training of the two towers, the MAIL presents an end-to-end method for recommender systems that shows an incremental performance improvement. The proposed method has been successfully deployed on the live recommendation system of NetEase Cloud Music to achieve a click-through rate improvement of 13% to 15% for millions of users. Offline experiments on real-world datasets also show its superior performance in CSR. Our code is available.