Abstract:Background: HIV and substance use represent interacting epidemics with shared psychological drivers - impulsivity and maladaptive coping. Dialectical behavior therapy (DBT) targets these mechanisms but faces scalability challenges. Generative artificial intelligence (GenAI) offers potential for delivering personalized DBT coaching at scale, yet rapid development has outpaced safety infrastructure. Methods: We developed Glow, a GenAI-powered DBT skills coach delivering chain and solution analysis for individuals at risk for HIV and substance use. In partnership with a Los Angeles community health organization, we conducted usability testing with clinical staff (n=6) and individuals with lived experience (n=28). Using the Helpful, Honest, and Harmless (HHH) framework, we employed user-driven adversarial testing wherein participants identified target behaviors and generated contextually realistic risk probes. We evaluated safety performance across 37 risk probe interactions. Results: Glow appropriately handled 73% of risk probes, but performance varied by agent. The solution analysis agent demonstrated 90% appropriate handling versus 44% for the chain analysis agent. Safety failures clustered around encouraging substance use and normalizing harmful behaviors. The chain analysis agent fell into an "empathy trap," providing validation that reinforced maladaptive beliefs. Additionally, 27 instances of DBT skill misinformation were identified. Conclusions: This study provides the first systematic safety evaluation of GenAI-delivered DBT coaching for HIV and substance use risk reduction. Findings reveal vulnerabilities requiring mitigation before clinical trials. The HHH framework and user-driven adversarial testing offer replicable methods for evaluating GenAI mental health interventions.