Regret minimization methods are a powerful tool for learning approximate Nash equilibrium in two-player zero-sum imperfect information extensive-form games (IIEGs). We consider the problem in the interactive bandit-feedback setting where we don't know the dynamics of the IIEG. In general, only the interactive trajectory and the loss $(\ell^t)^Tx^t$ are revealed. To learn approximate Nash equilibrium, the regret minimizer is required to estimate the full-feedback loss gradient $\ell^t$ and minimize the regret. In this paper, we propose a generalized framework for this learning setting. We demonstrate that the most recent bandit regret minimization methods, including MCCFR, IXOMD, and Balanced OMD, can be analyzed as a particular case of our framework. It presents a theoretical framework for the design and the modular analysis of the bandit regret minimization methods. Precisely, it allows us to use any gradient estimator, any exploration strategy, any sampling strategy, coupled with any full-feedback regret minimization methods.