J Med Internet Res. 2025 May 12;27:e65397. doi: 10.2196/65397.
ABSTRACT
BACKGROUND: Domestic violence (DV) is a significant public health concern affecting the physical and mental well-being of numerous women, imposing a substantial health care burden. However, women facing DV often encounter barriers to seeking in-person help due to stigma, shame, and embarrassment. As a result, many survivors of DV turn to online health communities as a safe and anonymous space to share their experiences and seek support. Understanding the information needs of survivors of DV in online health communities through multiclass classification is crucial for providing timely and appropriate support.
OBJECTIVE: The objective was to develop a fine-tuned large language model (LLM) that can provide fast and accurate predictions of the information needs of survivors of DV from their online posts, enabling health care professionals to offer timely and personalized assistance.
METHODS: We collected 294 posts from Reddit subcommunities focused on DV shared by women aged ≥18 years who self-identified as experiencing intimate partner violence. We identified 8 types of information needs: shelters/DV centers/agencies; legal; childbearing; police; DV report procedure/documentation; safety planning; DV knowledge; and communication. Data augmentation was applied using GPT-3.5 to expand our dataset to 2216 samples by generating 1922 additional posts that imitated the existing data. We adopted a progressive training strategy to fine-tune GPT-3.5 for multiclass text classification using 2032 posts. We trained the model on 1 class at a time, monitoring performance closely. When suboptimal results were observed, we generated additional samples of the misclassified ones to give them more attention. We reserved 184 posts for internal testing and 74 for external validation. Model performance was evaluated using accuracy, recall, precision, and F1-score, along with CIs for each metric.
RESULTS: Using 40 real posts and 144 artificial intelligence-generated posts as the test dataset, our model achieved an F1-score of 70.49% (95% CI 60.63%-80.35%) for real posts, outperforming the original GPT-3.5 and GPT-4, fine-tuned Llama 2-7B and Llama 3-8B, and long short-term memory. On artificial intelligence-generated posts, our model attained an F1-score of 84.58% (95% CI 80.38%-88.78%), surpassing all baselines. When tested on an external validation dataset (n=74), the model achieved an F1-score of 59.67% (95% CI 51.86%-67.49%), outperforming other models. Statistical analysis revealed that our model significantly outperformed the others in F1-score (P=.047 for real posts; P<.001 for external validation posts). Furthermore, our model was faster, taking 19.108 seconds for predictions versus 1150 seconds for manual assessment.
CONCLUSIONS: Our fine-tuned LLM can accurately and efficiently extract and identify DV-related information needs through multiclass classification from online posts. In addition, we used LLM-based data augmentation techniques to overcome the limitations of a relatively small and imbalanced dataset. By generating timely and accurate predictions, we can empower health care professionals to provide rapid and suitable assistance to survivors of DV.
PMID:40354642 | DOI:10.2196/65397
AI-Assisted Evidence Search
Share Evidence Blueprint
Search Google Scholar