five

RedZhu/KuaiMod

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/RedZhu/KuaiMod
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 size_categories: - 1K<n<10K --- # Benchmark Description ## Data Format This dataset consists of multiple samples, each containing the following fields: - `tag`: The label of the sample, indicating the category of the content, such as "pornographic". - `title`: The title of the video, usually containing the username and user ID. - `OCR`: Optical Character Recognition results, extracted text from images. - `ASR`: Automatic Speech Recognition results, extracted text from audio. - `images`: A list of image filenames, representing the cover and up to 8 frames extracted from the video. - `pid`: A unique identifier for the sample. ## Example ```json { "tag": "pornographic", "title": "@AUsername (A's ID)", "OCR": "Au|99. ELLYWTT|199. E|990. L|990. |199|199", "ASR": "it's good it summy bod night i'm good good you. |you want to number one.", "images": ["002_0.jpg", "002_1.jpg", "002_2.jpg", "002_3.jpg", "002_4.jpg", "002_5.jpg", "002_6.jpg", "002_7.jpg", "002_8.jpg"], "pid": "002" } ``` # Evaluation Method We provide evaluation scripts in our GitHub repository for text-based violation judgments. The evaluation process involves storing the textual judgment results in a JSONL file. Each line is a dictionary containing the following keys: - `tag`: The ground truth label of the instance. - `judgement`: The response from the model or algorithm. - For Binary classification, the judgement for each video should be `是` (positive) or `否` (violative). - For Multi-Class classification, the judgement for each video should be one of 17 tags. ## **Evaluation Steps** 1. Generate Prediction Results: Save the prediction results from the model or algorithm into a JSONL file. Each line should be formatted as follows: ```json {"tag": "Ground Truth Label", "judgement": "Model Judgement"} ``` 2. Run Evaluation Script: Use the [binary_eval.py](https://github.com/KuaiMod/KuaiMod.github.io/blob/main/evaluation/binary_eval.py) or [multi_cls_eval.py](https://github.com/KuaiMod/KuaiMod.github.io/blob/main/evaluation/multi_cls_eval.py) script to evaluate the prediction results. The script will calculate and output evaluation metrics. Example Command: ```shell python binary_eval.py/multi_cls_eval.py --input predictions.jsonl ``` Ensure that the predictions.jsonl file is correctly formatted and consistent with the sample format in the dataset.
提供机构:
RedZhu
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作