Abstract:In the field of pharmacology, there is a notable absence of centralized, comprehensive, and up-to-date repositories of PK data. This poses a significant challenge for R&D as it can be a time-consuming and challenging task to collect all the required quantitative PK parameters from diverse scientific publications. This quantitative PK information is predominantly organized in tabular format, mostly available as XML, HTML, or PDF files within various online repositories and scientific publications, including supplementary materials. This makes tables one of the crucial components and information elements of scientific or regulatory documents as they are commonly utilized to present quantitative information. Extracting data from tables is typically a labor-intensive process, and alternative automated machine learning models may struggle to accurately detect and extract the relevant data due to the complex nature and diverse layouts of tabular data. The difficulty of information extraction and reading order detection is largely dependent on the structural complexity of the tables. Efforts to understand tables should prioritize capturing the content of table cells in a manner that aligns with how a human reader naturally comprehends the information. FARAD has been manually extracting tabular data and other information from literature and regulatory agencies for over 40 years. However, there is now an urgent need to automate this process due to the large volume of publications released daily. The accuracy of this task has become increasingly challenging, as manual extraction is tedious and prone to errors, especially given the staffing shortages we are currently facing. This necessitates the development of AI algorithms for table detection and extraction that are able to precisely handle cells organized according to the table structure, as indicated by column and/or row header information.
Abstract:The safe use of pharmaceuticals in food-producing animals is vital to protect animal welfare and human food safety. Adverse events (AEs) may signal unexpected pharmacokinetic or toxicokinetic effects, increasing the risk of violative residues in the food chain. This study introduces a predictive framework for classifying outcomes (Death vs. Recovery) using ~1.28 million reports (1987-2025 Q1) from the U.S. FDA's OpenFDA Center for Veterinary Medicine. A preprocessing pipeline merged relational tables and standardized AEs through VeDDRA ontologies. Data were normalized, missing values imputed, and high-cardinality features reduced; physicochemical drug properties were integrated to capture chemical-residue links. We evaluated supervised models, including Random Forest, CatBoost, XGBoost, ExcelFormer, and large language models (Gemma 3-27B, Phi 3-12B). Class imbalance was addressed, such as undersampling and oversampling, with a focus on prioritizing recall for fatal outcomes. Ensemble methods(Voting, Stacking) and CatBoost performed best, achieving precision, recall, and F1-scores of 0.95. Incorporating Average Uncertainty Margin (AUM)-based pseudo-labeling of uncertain cases improved minority-class detection, particularly in ExcelFormer and XGBoost. Interpretability via SHAP identified biologically plausible predictors, including lung, heart, and bronchial disorders, animal demographics, and drug physicochemical properties. These features were strongly linked to fatal outcomes. Overall, the framework shows that combining rigorous data engineering, advanced machine learning, and explainable AI enables accurate, interpretable predictions of veterinary safety outcomes. The approach supports FARAD's mission by enabling early detection of high-risk drug-event profiles, strengthening residue risk assessment, and informing regulatory and clinical decision-making.