Introduction: The value of integrating federal HIV services data with HIV surveillance is currently unknown. Upstream and complete case capture is essential in preventing future HIV transmission. Methods: This study integrated Ryan White, Social Security Disability Insurance, Medicare, Children Health Insurance Programs and Medicaid demographic aggregates from 2005 to 2018 for people living with HIV and compared them with Centers for Disease Control and Prevention HIV surveillance by demographic aggregate. Surveillance Unknown, Service Known (SUSK) candidate aggregates were identified from aggregates where services aggregate volumes exceeded surveillance aggregate volumes. A distribution approach and a deep learning model series were used to identify SUSK candidate aggregates where surveillance cases exceeded services cases in aggregate. Results: Medicare had the most candidate SUSK aggregates. Medicaid may have candidate SUSK aggregates where cases approach parity with surveillance. Deep learning was able to detect candidate SUSK aggregates even where surveillance cases exceed service cases. Conclusions: Integration of CMS case level records with HIV surveillance records can increase case discovery and life course model quality; especially for cases who die after seeking HIV services but before they become surveillance cases. The ethical implications for both the availability and reuse of clinical HIV Data without the knowledge and consent of the persons described remains an opportunity for the development of big data ethics in public health research. Future work should develop big data ethics to support researchers and assure their subjects that information which describes them is not misused.
Introduction: The United States Federal Government operates one of the worlds largest medical insurance programs, Medicare, to ensure payment for clinical services for the elderly, illegal aliens and those without the ability to pay for their care directly. This paper evaluates the Medicare 2011 Transaction Data Set which details the transfer of funds from Medicare to private and public clinical care facilities for specific clinical services for the operational year 2011. Methods: Data mining was conducted to establish the relationships between reported and computed transaction values in the data set to better understand the drivers of Medicare transactions at a programmatic level. Results: The models averaged 88 for average model accuracy and 38 for average Kappa during training. Some reported classes are highly independent from the available data as their predictability remains stable regardless of redaction of supporting and contradictory evidence. DRG or procedure type appears to be unpredictable from the available financial transaction values. Conclusions: Overlay hypotheses such as charges being driven by the volume served or DRG being related to charges or payments is readily false in this analysis despite 28 million Americans being billed through Medicare in 2011 and the program distributing over 70 billion in this transaction set alone. It may be impossible to predict the dependencies and data structures the payer of last resort without data from payers of first and second resort. Political concerns about Medicare would be better served focusing on these first and second order payer systems as what Medicare costs is not dependent on Medicare itself.