Orthogonal time frequency space (OTFS) modulation stands out as a promising waveform for sixth generation (6G) and beyond wireless communication systems, offering superior performance over conventional methods, particularly in high-mobility scenarios and dispersive channel conditions. Recent research has demonstrated that the reduced computational complexity of deep learning (DL)-based signal detection (SD) methods constitutes a compelling alternative to conventional techniques. In this study, low-complexity DL-based SD methods are proposed for a multiple-input multiple-output (MIMO)-OTFS system and examined under Nakagami-$m$ channel conditions. The symbols obtained from the receiver antennas are combined using maximum ratio combining (MRC) and detected with the help of a DL-based detector implemented with multi-layer perceptron (MLP), convolutional neural network (CNN), and residual network (ResNet). Complexity analysis reveals that the MLP architecture offers significantly lower computational complexity compared to CNN, ResNet, and classical methods such as maximum likelihood detection (MLD). Furthermore, numerical analyses have shown that the proposed DL-based detectors, despite their low complexity, achieve comparable bit error rate (BER) performance to that of a high-performance MLD under various system conditions.