Using Instruction-Tuned Large Language Models to Identify Indicators of Vulnerability in Police Incident Narratives

ObjectivesPolice routinely collect unstructured narrative reports of their interactions with civilians. These accounts have the potential to reveal the extent of police engagement with vulnerable populations. We test whether large language models (LLMs) can effectively replicate human qualitative co...

Full description

Saved in:  
Bibliographic Details
Authors: Relins, Sam (Author) ; Birks, Daniel (Author) ; Lloyd, Charlie (Author)
Format: Electronic Article
Language:English
Published: 2025
In: Journal of quantitative criminology
Year: 2025, Volume: 41, Issue: 4, Pages: 647-684
Online Access: Volltext (kostenfrei)
Check availability: HBZ Gateway
Keywords:
Description
Summary:ObjectivesPolice routinely collect unstructured narrative reports of their interactions with civilians. These accounts have the potential to reveal the extent of police engagement with vulnerable populations. We test whether large language models (LLMs) can effectively replicate human qualitative coding of these narratives—a task that would otherwise be highly resource intensive.MethodsUsing publicly available narrative reports from Boston Police Department, we compare human-generated and LLM-generated labels for four vulnerabilities: mental ill health, substance misuse, alcohol dependence, and homelessness. We assess multiple LLM sizes and prompting strategies, measure label variability through repeated prompts, and conduct counterfactual experiments to examine potential classification biases related to sex and race.ResultsLLMs demonstrate high agreement with human coders in identifying narratives without vulnerabilities, particularly when repeated classifications are unanimous or near-unanimous. Human-LLM agreement improves with larger models and tailored prompting strategies, though effectiveness varies by vulnerability type. These findings suggest a human-LLM collaborative approach, where LLMs screen the majority of cases whilst humans review ambiguous instances, would significantly reduce manual coding requirements. Counterfactual analyses indicate minimal influence of subject sex and race on LLM classifications beyond those expected by chance.ConclusionsLLMs can substantially reduce resource requirements for analyzing large narrative datasets, whilst enhancing coding specificity and transparency, and enabling new approaches to replication and comparative analysis. These advances present promising opportunities for criminology and related fields.
ISSN:1573-7799
DOI:10.1007/s10940-025-09611-z