Beyond-diagonal reconfigurable intelligent surfaces (BD-RISs) are an emerging RIS 2.0 technology for future wireless communication. However, BD-RISs are primarily passive without active amplification, suffering from severe multiplicative path loss. To address the concern of multiplicative path loss, in this work we investigate the active BD-RIS including the modeling, architecture design, and optimization. We first analyze the active BD-RIS using multiport network theory with scattering parameters and derive a physical and electromagnetic compliant active BD-RIS aided communication model. We also design two new active BD-RIS architectures, namely fully- and group-connected active BD-RISs. Based on the proposed model and architecture, we investigate the active BD-RIS aided single-input single-output system and derive the closed-form optimal solution and scaling law of the signal-to-noise ratio. We further investigate the active BD-RIS aided multiple-input multiple-output system and propose an iterative algorithm based on quadratically constrained quadratic programming to maximize the spectral efficiency. Numerical results are provided and show that the active BD-RIS can achieve higher spectral efficiency than the active/passive diagonal RIS and passive BD-RIS. For example, to achieve the same spectral efficiency, the number of elements required by active BD-RIS is less than half of that required by active diagonal RIS, showing the advantages of active BD-RIS.