Audit-Ready Healthcare Fraud Screening: Split-Safe Provider Aggregation and Explainable Boosted Risk Triage

Authors

  • Iqra Hyder Shah Abdul Latif University, Khairpur Mirs, Sindh, Pakistan
  • Riaz Ahmed Shaikh Shah Abdul Latif University, Khairpur Mirs, Sindh, Pakistan
  • Rafaqat Hussain Arain Shah Abdul Latif University, Khairpur Mirs, Sindh, Pakistan
  • Zahid Hussain Shar Shah Abdul Latif University, Khairpur Mirs, Sindh, Pakistan
  • Basit Raza Shah Abdul Latif University, Khairpur Mirs, Sindh, Pakistan

Keywords:

Healthcare Fraud Detection, Provider-Level Screening, Explainable Machine Learning, Lightgbm, Treeshap, Risk Triage

Abstract

Medical fraud and abnormal billing are often not clearly reflected in individual claim records, but rather in the cumulative abnormal behavior of the same service provider across multiple visits. Based on this characteristic, this paper defines the service provider, rather than a single claim, as the basic unit of risk screening and constructs a provider-level fraud screening process for auditing scenarios. Specifically, we first perform aggregation before data partitioning to minimize the risk of information leakage from the same provider across training and validation sets. Then, we train the LightGBM risk scoring model around audit-significant features such as claims volume, reimbursement, and out-of-pocket intensity, hospitalization duration statistics, duplicate claim characteristics, coding diversity, and beneficiary structure. To make the model output more suitable for actual review processes, further combine TreeSHAP interpretation, threshold scanning, and isotonic calibration, enabling the risk score to simultaneously serve priority ranking, manual review under capacity constraints, and clearer result interpretation. On the publicly available Healthcare Provider Fraud Detection dataset, based on provider-centric out-of-fold evaluation, the proposed method achieves good ranking performance, with an AUC of 0.939, an AUPRC of 0.699, and an F1 score of 0.666 at the selected threshold. The results also show that maximum hospitalization duration, reimbursement intensity, total claims volume, total out-of-pocket expenses, and beneficiary age structure are key risk signals. This provides a dense, interpretable, and auditable enactment method for beneficiary risk showing in health claims settings.

Downloads

Published

2026-04-19