This work introduces a novel approach for the joint selection of model structure and parameter learning for nonlinear dynamical systems identification. Focusing on a specific Recurrent Neural Networks (RNNs) family, i.e., Nonlinear Auto-Regressive with eXogenous inputs Echo State Networks (NARXESNs), the method allows to simultaneously select the optimal model class and learn model parameters from data through a new set-membership (SM) based procedure. The results show the effectiveness of the approach in identifying parsimonious yet accurate models suitable for control applications. Moreover, the proposed framework enables a robust training strategy that explicitly accounts for bounded measurement noise and enhances model robustness by allowing data-consistent evaluation of simulation performance during parameter learning, a process generally NP-hard for models with autoregressive components.