Audit-Ready Healthcare Fraud Screening: Split-Safe Provider Aggregation and Explainable Boosted Risk Triage
Keywords:
Healthcare Fraud Detection, Provider-Level Screening, Explainable Machine Learning, Lightgbm, Treeshap, Risk TriageAbstract
Medical fraud and abnormal billing are often not clearly reflected in individual claim records, but rather in the cumulative abnormal behavior of the same service provider across multiple visits. Based on this characteristic, this paper defines the service provider, rather than a single claim, as the basic unit of risk screening and constructs a provider-level fraud screening process for auditing scenarios. Specifically, we first perform aggregation before data partitioning to minimize the risk of information leakage from the same provider across training and validation sets. Then, we train the LightGBM risk scoring model around audit-significant features such as claims volume, reimbursement, and out-of-pocket intensity, hospitalization duration statistics, duplicate claim characteristics, coding diversity, and beneficiary structure. To make the model output more suitable for actual review processes, further combine TreeSHAP interpretation, threshold scanning, and isotonic calibration, enabling the risk score to simultaneously serve priority ranking, manual review under capacity constraints, and clearer result interpretation. On the publicly available Healthcare Provider Fraud Detection dataset, based on provider-centric out-of-fold evaluation, the proposed method achieves good ranking performance, with an AUC of 0.939, an AUPRC of 0.699, and an F1 score of 0.666 at the selected threshold. The results also show that maximum hospitalization duration, reimbursement intensity, total claims volume, total out-of-pocket expenses, and beneficiary age structure are key risk signals. This provides a dense, interpretable, and auditable enactment method for beneficiary risk showing in health claims settings.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Southern Journal of Computer Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.