RT Research Data
T1 Cambridge Law Corpus, 1550-2023
A1 Östling, Andreas
A2 Sargeant, Holli
A2 Xie, Huiyuan
A2 Bull, Ludwig
A2 Terenin, Alexander
A2 Jonsson, Leif
A2 Magnusson, Måns
A2 Steffek, Felix 1975-
LA English
PP Colchester
PB UK Data Service
YR 2024
UL https://krimdok.uni-tuebingen.de/Record/1881550559
AB The Cambridge Law Corpus (CLC) is a corpus designed for legal AI research. It consists of over 250,000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases as old as the 16th century. Together with the corpus, annotations on case outcomes for 638 cases, done by legal experts, are provided. The Word files were cleaned and transformed into an XML format. PDF files were converted to textual form via optical character recognition (OCR). The resulting text files were then converted to the XML standard format. Because of legal and ethical considerations, the full Cambridge Law Corpus (CLC) is only available for research purposes under restrictions and available via Related Resources. A smaller dataset consisting of 15 selected cases from the CLC is available on the University of Cambridge Apollo Data Repository which can be accessed via Related Resources. The Cambridge Law Corpus is a corpus designed for legal AI research. It consists of over 250,000 court cases from the UK. Most cases are from the 21st century, but the corpus includes cases dating from the 16th century. It was funded by the research project, Legal Systems and Artificial Intelligence, which was jointly supported by the UK’s Economic and Social Research Council, part of UKRI, and the Japanese Society and Technology Agency (JST), and involved collaboration between Cambridge University (the Centre for Business Research, Department of Computer Science and Faculty of Law) and Hitotsubashi University, Tokyo (the Graduate Schools of Law and Business Administration).
K1 Law
K1 legal decisions
K1 Courts
K1 legal records
K1 Forschungsdaten
DO 10.5255/UKDA-SN-856927