Large MIMO systems rely on efficient downlink precoding to enhance data rates and improve connectivity through spatial multiplexing. However, currently employed linear precoding techniques, such as MMSE, significantly limit the achievable spectral efficiency. To meet practical error-rate targets, existing linear methods require an excessively high number of access point antennas relative to the number of supported users, leading to disproportionate increases in power consumption.Efficient non-linear processing frameworks for uplink MIMO transmissions, such as NL-COMM, have been proposed. However, downlink non-linear precoding methods, such as Vector Perturbation (VP), remain impractical for real-world deployment due to their exponentially increasing computational complexity with the number of supported streams. This work presents ViPer NL-COMM, the first practical algorithmic and implementation framework for VP-based precoding. ViPer NL-COMM extends the core principles of NL-COMM to the precoding problem, enabling scalable parallelization and real-time computational performance while maintaining the substantial spectral-efficiency benefits of VP precoding. ViPer NL-COMM consists of a novel mathematical framework and an FPGA prototype capable of supporting large MIMO configurations (up to 16x16), high-order modulation (256-QAM), and wide bandwidths (100 MHz) within practical power and resource budgets. System-level evaluations demonstrate that ViPer NL-COMM achieves target error rates using only half the number of transmit antennas required by linear precoding, yielding net power savings on the order of hundreds of Watts at the RF front end. Moreover, ViPer NL-COMM enables supporting more information streams than available AP antennas when the streams are of low-rate, paving the way for enhanced massive-connectivity scenarios in next-generation wireless networks.