AI-Detector Multi-Language Evaluation Dataset
收藏Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/rb3bzv9bnm/1
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes the data relevant to an evaluation of 12 AI-detector tools' performance metrics. The dataset consists of three parts: AI-generated texts in three languages (English, Ukrainian and Russian); Real texts in three languages (English, Ukrainian and Russian); Results and relevant code. The AI-generated texts is a collection of news texts, generated by 48 LLMs divided by 21 folders with the LLM's creator names. Each subfolder is divided by relevant LLM and, where applicable, relevant model versions. Each LLM generated 3 texts in each language and the results are placed in relevant .odt files. For example, AI-Generated Texts/01_OpenAI/GPT/GPT-3.5/News/En-RandomNews-500words.odt is a path to three AI-generated news texts, generated in English by OpenAI's CPT-3.5. The real texts is a collection of news texts, retreived from 9 news agencies from 3 years: 2018, 2019, and 2024. Each year includes three mentioned languages each representing 3 news agencies operating in those languages with each file including 3 news articles from that agency of that year as .odt files. For example: Real texts/2018/EN/BBC_2018.odt represents three articles from the year 2018 by BBC, written in English. Results&Code includes the Raw Data and Metrics .xlsx table, which is divided into three parts: AI texts, which includes the detectors' predictions on the AI-generated texts, Real texts, which includes the detectors' predictions on the real news articles, and metrics, which includes and all relevant calculated values. The majority of the metrics are calculated inside the table using Excel's capabilities, but ROC AUC, AP and Log Loss are calculated using Python code, located in the relevant folder. The folder includes Main.py - the code used for calculating the metrics, using the Scikit-learn library, definitions.py a supplementary code with each detectors' responses, used in Main.py, PR_curves and ROC_curves folders, and out.txt with the results of the calculations, which are generated by Main.py. The execution of Main.py can be done in the following way: py .\Calculator.py > out.txt. You will need to have scikit-learn, numpy and matplotlib installed.



