Abstract:Existing cell-free integrated sensing and communication (CF-ISAC) beamforming algorithms predominantly rely on classical optimization techniques, which often entail high computational complexity and limited scalability. Meanwhile, recent learning-based approaches have difficulty capturing the global interactions and long-range dependencies among distributed access points (APs), communication users, and sensing targets. To address these limitations, we propose the first Set Transformer-based CF-ISAC beamforming framework (STCIB). By exploiting attention mechanisms, STCIB explicitly models global relationships among network entities, naturally handles unordered input sets, and preserves permutation invariance across APs, users, and targets. The proposed framework operates in an unsupervised manner, eliminating the need for labeled training data, and supports three design regimes: (i) sensing-centric, (ii) communication-centric, and (iii) joint ISAC optimization. We benchmark STCIB against a convolutional neural network (CNN) baseline and two state-of-the-art optimization algorithms: the convex-concave procedure algorithm (CCPA) and augmented Lagrangian manifold optimization (ALM-MO). Numerical results demonstrate that STCIB consistently outperforms the CNN, achieving substantially higher ISAC performance with only a negligible increase in runtime. For instance, in regime (iii), at $η$=0.4, STCIB improves the sensing and communication sum rates by 14.8 % and 31.6 %, respectively, relative to the CNN, while increasing runtime by only 0.26 %. Compared with CCPA and ALM-MO, STCIB offers significantly lower computational cost while maintaining modest performance gains. In regime (i), for a 3.0 bps/Hz communication threshold, the runtime of STCIB is only 0.1 % and 0.3 % of that required by CCPA and ALM-MO, respectively, while improving the sensing sum rate by 4.45 % and 5.9 %.