WiFi Channel State Information (CSI)-based human activity recognition (HAR) provides a privacy-preserving, device-free sensing solution for smart environments. However, its deployment on edge devices is severely constrained by domain shift, where recognition performance deteriorates under varying environmental and hardware conditions. This study presents maxVSTAR (maximally adaptive Vision-guided Sensing Technology for Activity Recognition), a closed-loop, vision-guided model adaptation framework that autonomously mitigates domain shift for edge-deployed CSI sensing systems. The proposed system integrates a cross-modal teacher-student architecture, where a high-accuracy YOLO-based vision model serves as a dynamic supervisory signal, delivering real-time activity labels for the CSI data stream. These labels enable autonomous, online fine-tuning of a lightweight CSI-based HAR model, termed Sensing Technology for Activity Recognition (STAR), directly at the edge. This closed-loop retraining mechanism allows STAR to continuously adapt to environmental changes without manual intervention. Extensive experiments demonstrate the effectiveness of maxVSTAR. When deployed on uncalibrated hardware, the baseline STAR model's recognition accuracy declined from 93.52% to 49.14%. Following a single vision-guided adaptation cycle, maxVSTAR restored the accuracy to 81.51%. These results confirm the system's capacity for dynamic, self-supervised model adaptation in privacy-conscious IoT environments, establishing a scalable and practical paradigm for long-term autonomous HAR using CSI sensing at the network edge.