Reconfigurable intelligent surfaces (RISs) and fluid antennas (FAs) are key technologies for enhancing spatial degrees of freedom in future wireless networks. However, channel acquisition in RIS-aided FA systems is challenging as cascaded links depend on time-varying antenna-port selections and RIS configurations, leading to high training overhead in conventional pilot-based methods. We propose a semi-blind estimation framework for this joint architecture to estimate channels and symbols concurrently. Two hierarchical transmission protocols are introduced, resulting in distinct tensor models. Protocol 1 uses a two-time-scale structure yielding a PARAFAC (PF) model, while Protocol 2 employs a single-time-scale structure with blockwise spatial variations, leading to a Nested PARAFAC2 (NPF) model. For both, we develop semi-blind receivers based on trilinear alternating least squares to jointly estimate user-to-RIS channels, RIS-to-BS channels, and transmitted symbols by exploiting spatio-temporal diversity from FA and RIS reconfiguration. We derive identifiability conditions and computational complexity, revealing a fundamental trade-off: the PF receiver (Protocol 1) more aggressively exploits joint RIS/FA reconfiguration for stronger robustness, whereas the NPF receiver (Protocol 2) offers a flexible, lower-complexity alternative. Simulations show the proposed receivers achieve accurate recovery with significantly reduced training overhead, demonstrating the effectiveness of tensor-based semi-blind processing for RIS-aided fluid antenna communications.