Developing SHAP interpretable machine learning models for assessing biopsychosocial risk in female drug users: a small sample study

AI Summary

Developed a biopsychosocial machine learning model that modestly but effectively classified physiological, psychological, dependency, social support, and self-control risks in female drug users.
Oversampling improved classification for most dimensions except social support; Random Forest and Logistic Regression were optimal classifiers after oversampling.
Applying SHAP enhanced interpretability by identifying key predictors for each dimension and total risk; study limited by small sample of 96, expansion planned.

Front Psychiatry. 2026 Apr 22;17:1736274. doi: 10.3389/fpsyt.2026.1736274. eCollection 2026.

ABSTRACT

BACKGROUND: While interventions for female drug users have received considerable attention, comprehensive and objective risk assessment tools-particularly those integrating biopsychosocial dimensions-remain lacking, despite their critical public health need.

OBJECTIVE: This study aimed to develop a comprehensive assessment model for evaluating the physiological, psychological, and social risks among female drug users. Methods: Based on the biopsychosocial model, variables of the five dimensions of physiological function, psychological and cognitive function, drug dependence, social support, and self-control were collected from 96 participants. Professionals rated these participants on the five dimensions and the total risk. These ratings and variables served as inputs and outputs for our classification model. We oversampled the data and evaluated the classification of 6 classifiers.

RESULTS: Firstly, the machine learning classification results for the 5-dimensional risk and total risk performed relatively well. Next, oversampling improved the classification performance for most dimensions (except social support risk) and the total risk assessment. After oversampling, the true strengths of different algorithms became more apparent, with Random Forest and Logistic regression emerging as the optimal classifier for multiple dimensions. Thirdly, by applying SHAP, a novel interpretability method, we identified key variables in each dimension and for the total risk, thereby enhancing the transparency of the model’s decisions.

CONCLUSION: The machine learning model, encompassing physiological, psychological, cognitive, drug refusal, social support, and self-control dimensions, modestly but effectively identified at-risk populations of female drug users. It should be noted that the current study is limited by the sample size of 96 participants; to address this, we plan to expand the sample in future research to overcome this constraint and reduce potential model overfitting. The developed model holds promise for researchers seeking to pinpoint at-risk female drug abusers, facilitating targeted interventions and corrections.

PMID:42100778 | PMC:PMC13148034 | DOI:10.3389/fpsyt.2026.1736274

Document this CPD