- Paragraph-level corpus of 832,220 paragraphs from Amnesty International, Human Rights Watch, US State Department, and Working Group on Enforced Disappearances, 1999–2023.
- Custom pipeline combining PDF extraction, crawlers, and language model-based text correction, reducing character errors from 8.3% to 1.7%.
- Each paragraph enriched with 23 metadata fields and NLP features such as NER, sentiment, classification, enabling topic modelling and cross-organisational analysis.
Data Brief. 2026 May 15;66:112854. doi: 10.1016/j.dib.2026.112854. eCollection 2026 Jun.
ABSTRACT
Human rights research increasingly employs computational text analysis, but existing datasets provide either document-level aggregations or event-level extractions from news sources, limiting fine-grained analysis of primary organizational reports. We present the Human Rights Violation Reporting Dataset, a comprehensive paragraph-level corpus comprising reports from Amnesty International, Human Rights Watch, the US State Department, and the Working Group on Enforced or Involuntary Disappearances, spanning 1999-2023. The dataset contains 832,220 paragraphs processed using a custom pipeline combining PDF extraction, custom crawlers, language model-based text correction (reducing character-level errors from 8.3% to 1.7%), and comprehensive natural language processing (NLP) enrichment, including named entity recognition, sentiment analysis, text classification, and content moderation. Each paragraph is enriched with 23 metadata fields enabling entity network analysis, topic modelling, cross-organizational comparison, and machine learning applications. Data are provided in comma-separated values (CSV) format, with standardized country-year identifiers compatible with existing political science datasets. This dataset enables previously impossible fine-grained computational analysis of how major human rights organizations document violations across time, space, and organizational contexts.
PMID:42220647 | PMC:PMC13217886 | DOI:10.1016/j.dib.2026.112854
AI Search
Share Evidence Blueprint

Search Google Scholar
Save as PDF

