Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingxue Wang

Efficient Multi-Cohort Inference for Long-Term Effects and Lifetime Value in A/B Testing with User Learning

Apr 22, 2026

Dario Simionato, Andrea Tonon, Mingxue Wang, Weiguo Wang, Tong Gui, Xiaoyue Li

Abstract:In streaming platforms churn is extremely costly, yet A/B tests are typically evaluated using outcomes observed within a limited experimental horizon. Even when both short- and predicted long-term engagement metrics are considered, they may fail to capture how a treatment affects users' retention. Consequently, an intervention may appear beneficial in the short term and neutral in the long term while still generating lower total value than the control due to users churn. To address this limitation, we introduce a method that estimates long-term treatment effects (LTE) and residual lifetime value change ($ΔERLV$) in short multi-cohort A/B tests under user learning. To estimate time-varying treatment effects efficiently, we introduce an inverse-variance weighted estimator that combines multiple cohorts estimates, reducing variance relative to standard approaches in the literature. The estimated treatment trajectory is then modeled as a parametric decay to recover both the asymptotic treatment effect and the cumulative value generated over time. Our framework enables simultaneous evaluation of steady-state impact and residual user value within a single experiment. Empirical results show improved precision in estimating LTE and $ΔERLV$ and identify scenarios in which relying on either short-term or long-term metrics alone would lead to incorrect product decisions.

Via

Access Paper or Ask Questions

BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Oct 30, 2024

Bora Caglayan, Mingxue Wang, John D. Kelleher, Shen Fei, Gui Tong, Jiandong Ding, Puchao Zhang

Figure 1 for BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Figure 2 for BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Figure 3 for BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Figure 4 for BIS: NL2SQL Service Evaluation Benchmark for Business Intelligence Scenarios

Abstract:NL2SQL (Natural Language to Structured Query Language) transformation has seen wide adoption in Business Intelligence (BI) applications in recent years. However, existing NL2SQL benchmarks are not suitable for production BI scenarios, as they are not designed for common business intelligence questions. To address this gap, we have developed a new benchmark focused on typical NL questions in industrial BI scenarios. We discuss the challenges of constructing a BI-focused benchmark and the shortcomings of existing benchmarks. Additionally, we introduce question categories in our benchmark that reflect common BI inquiries. Lastly, we propose two novel semantic similarity evaluation metrics for assessing NL2SQL capabilities in BI applications and services.

* This paper has been accepted by ICSOC (International Conference on Service-Oriented Computing) 2024

Via

Access Paper or Ask Questions