CANDID-II Dataset
53,054 anonymized adult chest x-ray dataset in 1024 x 1024 pixel DICOM format with corresponding anonymized free-text reports from Dunedin Hospital, New Zealand between 2010 - 2020. Corresponding radiology reports generated by FRANZCR radiologists were manually annotated for 46 common radiological findings mapped to Unified Medical Language System (UMLS) and RadLex ontology. Each of the multiclassification annotations contains 4 types of labels, namely positive, uncertain, negative and not mentioned. In the provided dataset, image filenames contain patient index (enabling analysis requiring grouping of images by patients), as well as anonymized date of acquisition information where the temporal relationship between images is preserved. This dataset can be used for training and testing for deep learning algorithms for adult chest x rays.
Unfortunately, since Feb 2024, the New Zealand government is changing the data governance on datasets used for AI development and this affects the process of how the CANDID II dataset is to be accessed by the external users. Therefore, the CANDID II dataset is not available for access by users outside Health New Zealand. Further notice of access will be updated here should access by external users be reopened.