AI-Detector Multi-Language Evaluation Dataset
收藏DataCite Commons2026-03-27 更新2026-05-04 收录
下载链接:
https://data.mendeley.com/datasets/rb3bzv9bnm/2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes the data relevant to an evaluation of 12 AI-detector tools' performance metrics. The dataset consists of three parts: AI-generated texts in three languages (English, Ukrainian and Russian); Real texts in three languages (English, Ukrainian and Russian); Results and relevant code. The AI-generated texts is a collection of news texts, generated by 48 LLMs divided by 21 folders with the LLM's creator names. Each subfolder is divided by relevant LLM and, where applicable, relevant model versions. Each LLM generated 3 texts in each language and the results are placed in relevant .odt files. For example, AI-Generated Texts/01_OpenAI/GPT/GPT-3.5/News/En-RandomNews-500words.odt is a path to three AI-generated news texts, generated in English by OpenAI's CPT-3.5. The real texts is a collection of news texts, retreived from 9 news agencies from 3 years: 2018, 2019, and 2024. Each year includes three mentioned languages each representing 3 news agencies operating in those languages with each file including 3 news articles from that agency of that year as .odt files. For example: Real texts/2018/EN/BBC_2018.odt represents three articles from the year 2018 by BBC, written in English. Results&Code includes the data.xlsx table, results.txt, Figures folder and Main.py. data.xlsx includes raw predictions and confidence scores of the 12 detectors and is divided into two parts: Sheet1 includes the detectors' predictions and confidence scores for the AI-generated texts, Sheet2 includes the detectors' predictions and confidence scores for the real news articles. Based on this table, Main.py calculates performance metrics and outputs the results either to console or into a file (results.txt in this case). Additionally, relevant figures can be generated (Figures folder). The following is required: scikit-learn, numpy, pandas, openpyxl, and matplotlib. The Figures folder contains 5 subfolders. Detector_Comparisons includes overall score and metric distributions, Negative_Distributions and Positive_Distributions include the individual detector distributions of real and AI-generated texts respectively, PR_Curves and ROC_Curves include overall curves for all of the detectors.
提供机构:
Mendeley Data
创建时间:
2026-03-27



