Abstract:Personally identifiable information (PII) anonymization is a high-stakes task that poses a barrier to many open-science data sharing initiatives. While PII identification has made large strides in recent years, in practice, error thresholds and the recall/precision trade-off still limit the uptake of these anonymization pipelines. We present PIIvot, a lighter-weight framework for PII anonymization that leverages knowledge of the data context to simplify the PII detection problem. To demonstrate its effectiveness, we also contribute QATD-2k, the largest open-source real-world tutoring dataset of its kind, to support the demand for quality educational dialogue data.
Abstract:Online health communities (OHCs) offer the promise of connecting with supportive peers. Forming these connections first requires finding relevant peers - a process that can be time-consuming. Peer recommendation systems are a computational approach to make finding peers easier during a health journey. By encouraging OHC users to alter their online social networks, peer recommendations could increase available support. But these benefits are hypothetical and based on mixed, observational evidence. To experimentally evaluate the effect of peer recommendations, we conceptualize these systems as health interventions designed to increase specific beneficial connection behaviors. In this paper, we designed a peer recommendation intervention to increase two behaviors: reading about peer experiences and interacting with peers. We conducted an initial feasibility assessment of this intervention by conducting a 12-week field study in which 79 users of CaringBridge received weekly peer recommendations via email. Our results support the usefulness and demand for peer recommendation and suggest benefits to evaluating larger peer recommendation interventions. Our contributions include practical guidance on the development and evaluation of peer recommendation interventions for OHCs.