five

ORACLES-AI - ORal Annotated Clinical Lesion Evaluation dataset for Artificial Intelligence

收藏
DataCite Commons2026-02-28 更新2026-05-04 收录
下载链接:
https://osf.io/mvtw9/
下载链接
链接失效反馈
官方服务:
资源简介:
ORACLES-AI (ORal Annotated Clinical Lesion Evaluation dataset for Artificial Intelligence) is an annotated, image-based dataset developed to advance research, education, and computational modelling in the domain of oral mucosal health and disease. The dataset comprises high-quality intraoral clinical images collected from participants representing a broad spectrum of oral mucosal conditions, ranging from normal mucosa and variations from normal to Oral Potentially Malignant Disorders (OPMDs) and oral cancers. Each image is accompanied by expert-generated annotations and extensive, non-identifiable metadata, making ORACLES-AI a comprehensive and interpretable resource for both clinical and computational research applications. Data Collection and Methodology Participants were prospectively recruited, and intraoral images were acquired using standardized mobile phone–based intraoral photography protocols. Participants were enrolled from one of the spoke centers operating under a Hub-and-Spoke model established as part of an Indian Council of Medical Research (ICMR)–funded study. In this model, Ragas Dental College and Hospital, Chennai, functions as the central Hub, while private dental institutions and non-governmental organizations across different regions of India serve as Spokes responsible for data collection and community-level oral screening. The present dataset represents images and annotations obtained from one such Spoke center, collected in accordance with uniform methodological guidelines defined by the Hub institution. All intraoral images were captured following a copyrighted Standard Operating Procedure (SOP) for intraoral photography developed by the Hub to ensure consistency, reproducibility, and diagnostic usability across participating centers. Prior to imaging, camera lenses were cleaned, and photographs were taken at an approximate distance of 4–5 cm from the oral cavity, primarily under natural lighting conditions, with auxiliary lighting used when necessary. Retraction aids, including mouth mirrors or wooden retractors, were employed to optimize visualization. For each participant, eight standard intraoral sites were systematically photographed: dorsal tongue, ventral tongue, right buccal mucosa, left buccal mucosa, upper labial mucosa, lower labial mucosa, maxillary arch, and mandibular arch. Image quality was assessed based on predefined criteria, including centering, illumination, sharpness, and absence of motion blur or extraneous artifacts, and images not meeting these criteria were reacquired. The dataset is designed to be dynamic and will be periodically updated with additional participants and newly annotated images to enhance its coverage, diversity, and clinical representativeness. Annotation and Regional Attributes All images underwent expert review and annotation by Oral Pathologist and Public Health Dentistry specialists using the VGG Image Annotator (VIA), version 3.0.13. Regions of Interest (ROIs) were delineated using polygonal annotations to define lesion boundaries, mucosal sub-sites, and relevant diagnostic features such as surface texture, color variation, and border irregularity. Annotations were stored in JSON format to ensure compatibility with deep learning and computer vision frameworks. Each image file follows a structured naming convention in the format A_B_C.jpeg, where A represents a unique anonymized participant identifier, B denotes the site of data collection, and C specifies the intraoral region (e.g., DT for dorsal tongue, LB for left buccal mucosa). Data Structure The dataset is organized into four primary diagnostic categories: (1) normal mucosa, representing healthy oral tissues; (2) variations from normal, encompassing minor deviations from typical mucosal appearance; (3) Oral Potentially Malignant Disorders (OPMDs) and (4) oral cancer, comprising histopathologically confirmed squamous cell carcinoma cases. Each image is linked to its corresponding JSON annotation file and an associated metadata sheet in spreadsheet format containing demographic variables (age and sex), habit history (tobacco, alcohol, and areca nut use), and clinical diagnosis. All records are indexed using a unique anonymized identifier to ensure traceability while maintaining complete de-identification. Ethical and Legal Compliance The study and data collection were approved by the Institutional Ethics Committee of Ragas Dental College and Hospital, Chennai (Approval No: RIEC/20231021/PHD). Written informed consent was obtained from all participants prior to inclusion. All data were collected, processed, and shared in accordance with the Declaration of Helsinki, institutional data protection policies, and national ethical guidelines. Personal identifiers were removed, and all images were anonymized before dataset compilation and public sharing. Potential Applications and Reuse ORACLES-AI is intended to serve as a reproducible and clinically grounded benchmark resource for advancing research in artificial intelligence–based oral lesion detection, segmentation, and classification; explainable machine learning and visual reasoning models; standardization of mobile-based intraoral image acquisition and quality assessment; and clinical education and digital diagnostic training. By integrating expert annotations, interpretable regional information, and structured metadata, the dataset aims to bridge clinical expertise with computational innovation. Continued expansion of the dataset will support improved class balance, enhanced generalizability, and robust AI model development for early detection and risk stratification of OPMDs and oral cancers. Funding This intraoral image dataset is a part of a study funded by the Indian Council of Medical Research (ICMR), New Delhi, under the Investigator-Initiated Research Proposals – Small Extramural Grants, 2023 (Project ID: IIRP-2023-1049)
提供机构:
OSF Registries
创建时间:
2025-12-27
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作