CANDID-III Dataset
288,776 anonymized adult chest x-ray dataset in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Corresponding radiology reports generated by FRANZCR radiologists were manually annotated for 45 common radiological findings mapped to Unified Medical Language System (UMLS) ontology. Each of the multiclassification annotations contains 4 types of labels, namely positive, uncertain, negative and not mentioned. 33,486 studies were manually labeled. 255,290 were labeled by deep learning models. Accuracy of the AI labeled portion of the dataset with respect to each label will be outlined in the published paper. In the provided dataset, image filenames contain patient index (enabling analysis requiring grouping of images by patients), as well as anonymized date of acquisition information where the temporal relationship between images is preserved. This dataset can be used for training and testing for deep learning algorithms for adult chest x rays.
To access the data, an ethics training process is required and is divided into 2 steps:
1. An online ethics course at https://globalhealthtrainingcentre.tghn.org/ethics-and-best-practices-sharing-individual-level-data-clinical-and-public-health-research/. You will need to register an account to be able to take the free online ethics course. Once you finished the course quiz, please send the course certificate to sijingfeng@gmail.com
2. Signing the Data Use Agreement. It can be accessed at Data Use Agreement- Unanonymised data.pdf. Once you signed the Data Use Agreement, please also send the signed copy to sijingfeng@gmail.com
After successfully completion of both of above steps, a private link to download the dataset will be sent.