five

africa-intelligence/aya101-benchmarking

收藏
Hugging Face2024-10-01 更新2025-04-19 收录
下载链接:
https://hf-mirror.com/datasets/africa-intelligence/aya101-benchmarking
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Evaluation run of CohereForAI/aya-101 dataset_summary: "Dataset automatically created during the evaluation run of model\ \ [CohereForAI/aya-101](https://huggingface.co/CohereForAI/aya-101)\nThe dataset\ \ is composed of 5 configuration(s), each one corresponding to one of the evaluated\ \ task.\n\nThe dataset has been created from 2 run(s). Each run can be found as\ \ a specific split in each configuration, the split being named using the timestamp\ \ of the run.The \"train\" split is always pointing to the latest results.\n\nAn\ \ additional configuration \"results\" store all the aggregated results of the run.\n\ \nTo load the details from a run, you can for instance do the following:\n```python\n\ from datasets import load_dataset\ndata = load_dataset(\n\t\"africa-intelligence/aya101-benchmarking\"\ ,\n\tname=\"CohereForAI__aya-101__afrimgsm_direct_xho\",\n\tsplit=\"latest\"\n)\n\ ```\n\n## Latest results\n\nThese are the [latest results from run 2024-10-01T16-21-34.420635](https://huggingface.co/datasets/africa-intelligence/aya101-benchmarking/blob/main/CohereForAI/aya-101/results_2024-10-01T16-21-34.420635.json)\ \ (note that there might be results for other tasks in the repos if successive evals\ \ didn't cover the same tasks. You find each in the results and the \"latest\" split\ \ for each eval):\n\n```python\n{\n \"all\": {\n \"afrimgsm_direct_xho\"\ : {\n \"alias\": \"afrimgsm_direct_xho\",\n \"exact_match,remove_whitespace\"\ : 0.004,\n \"exact_match_stderr,remove_whitespace\": 0.004000000000000003,\n\ \ \"exact_match,flexible-extract\": 0.044,\n \"exact_match_stderr,flexible-extract\"\ : 0.012997373846574952\n },\n \"afrimgsm_direct_zul\": {\n \ \ \"alias\": \"afrimgsm_direct_zul\",\n \"exact_match,remove_whitespace\"\ : 0.0,\n \"exact_match_stderr,remove_whitespace\": 0.0,\n \ \ \"exact_match,flexible-extract\": 0.02,\n \"exact_match_stderr,flexible-extract\"\ : 0.008872139507342683\n },\n \"afrimmlu_direct_xho\": {\n \ \ \"alias\": \"afrimmlu_direct_xho\",\n \"acc,none\": 0.316,\n \ \ \"acc_stderr,none\": 0.020812359515855857,\n \"f1,none\":\ \ 0.3121412403731796,\n \"f1_stderr,none\": \"N/A\"\n },\n \ \ \"afrimmlu_direct_zul\": {\n \"alias\": \"afrimmlu_direct_zul\"\ ,\n \"acc,none\": 0.298,\n \"acc_stderr,none\": 0.02047511809298895,\n\ \ \"f1,none\": 0.30077002468766567,\n \"f1_stderr,none\":\ \ \"N/A\"\n },\n \"afrixnli_en_direct_xho\": {\n \"alias\"\ : \"afrixnli_en_direct_xho\",\n \"acc,none\": 0.5366666666666666,\n \ \ \"acc_stderr,none\": 0.020374439597383796,\n \"f1,none\"\ : 0.4396227279523235,\n \"f1_stderr,none\": \"N/A\"\n },\n \ \ \"afrixnli_en_direct_zul\": {\n \"alias\": \"afrixnli_en_direct_zul\"\ ,\n \"acc,none\": 0.5433333333333333,\n \"acc_stderr,none\"\ : 0.020352577627018392,\n \"f1,none\": 0.4400411624098575,\n \ \ \"f1_stderr,none\": \"N/A\"\n }\n },\n \"afrimgsm_direct_xho\"\ : {\n \"alias\": \"afrimgsm_direct_xho\",\n \"exact_match,remove_whitespace\"\ : 0.004,\n \"exact_match_stderr,remove_whitespace\": 0.004000000000000003,\n\ \ \"exact_match,flexible-extract\": 0.044,\n \"exact_match_stderr,flexible-extract\"\ : 0.012997373846574952\n },\n \"afrimgsm_direct_zul\": {\n \"alias\"\ : \"afrimgsm_direct_zul\",\n \"exact_match,remove_whitespace\": 0.0,\n \ \ \"exact_match_stderr,remove_whitespace\": 0.0,\n \"exact_match,flexible-extract\"\ : 0.02,\n \"exact_match_stderr,flexible-extract\": 0.008872139507342683\n\ \ },\n \"afrimmlu_direct_xho\": {\n \"alias\": \"afrimmlu_direct_xho\"\ ,\n \"acc,none\": 0.316,\n \"acc_stderr,none\": 0.020812359515855857,\n\ \ \"f1,none\": 0.3121412403731796,\n \"f1_stderr,none\": \"N/A\"\n\ \ },\n \"afrimmlu_direct_zul\": {\n \"alias\": \"afrimmlu_direct_zul\"\ ,\n \"acc,none\": 0.298,\n \"acc_stderr,none\": 0.02047511809298895,\n\ \ \"f1,none\": 0.30077002468766567,\n \"f1_stderr,none\": \"N/A\"\n\ \ },\n \"afrixnli_en_direct_xho\": {\n \"alias\": \"afrixnli_en_direct_xho\"\ ,\n \"acc,none\": 0.5366666666666666,\n \"acc_stderr,none\": 0.020374439597383796,\n\ \ \"f1,none\": 0.4396227279523235,\n \"f1_stderr,none\": \"N/A\"\n\ \ },\n \"afrixnli_en_direct_zul\": {\n \"alias\": \"afrixnli_en_direct_zul\"\ ,\n \"acc,none\": 0.5433333333333333,\n \"acc_stderr,none\": 0.020352577627018392,\n\ \ \"f1,none\": 0.4400411624098575,\n \"f1_stderr,none\": \"N/A\"\n\ \ }\n}\n```" repo_url: https://huggingface.co/CohereForAI/aya-101 leaderboard_url: '' point_of_contact: '' configs: - config_name: CohereForAI__aya-101__afrimgsm_direct_xho data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrimgsm_direct_xho_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrimgsm_direct_xho_2024-10-01T16-21-34.420635.jsonl' - config_name: CohereForAI__aya-101__afrimgsm_direct_zul data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrimgsm_direct_zul_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrimgsm_direct_zul_2024-10-01T16-21-34.420635.jsonl' - config_name: CohereForAI__aya-101__afrimmlu_direct_xho data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrimmlu_direct_xho_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrimmlu_direct_xho_2024-10-01T16-21-34.420635.jsonl' - config_name: CohereForAI__aya-101__afrimmlu_direct_zul data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrimmlu_direct_zul_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrimmlu_direct_zul_2024-10-01T16-21-34.420635.jsonl' - config_name: CohereForAI__aya-101__afrixnli_en_direct_xho data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrixnli_en_direct_xho_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrixnli_en_direct_xho_2024-10-01T16-21-34.420635.jsonl' - config_name: CohereForAI__aya-101__afrixnli_en_direct_zul data_files: - split: 2024_10_01T16_21_34.420635 path: - '**/samples_afrixnli_en_direct_zul_2024-10-01T16-21-34.420635.jsonl' - split: latest path: - '**/samples_afrixnli_en_direct_zul_2024-10-01T16-21-34.420635.jsonl' --- # Dataset Card for Evaluation run of CohereForAI/aya-101 <!-- Provide a quick summary of the dataset. --> Dataset automatically created during the evaluation run of model [CohereForAI/aya-101](https://huggingface.co/CohereForAI/aya-101) The dataset is composed of 5 configuration(s), each one corresponding to one of the evaluated task. The dataset has been created from 2 run(s). Each run can be found as a specific split in each configuration, the split being named using the timestamp of the run.The "train" split is always pointing to the latest results. An additional configuration "results" store all the aggregated results of the run. To load the details from a run, you can for instance do the following: ```python from datasets import load_dataset data = load_dataset( "africa-intelligence/aya101-benchmarking", name="CohereForAI__aya-101__afrimgsm_direct_xho", split="latest" ) ``` ## Latest results These are the [latest results from run 2024-10-01T16-21-34.420635](https://huggingface.co/datasets/africa-intelligence/aya101-benchmarking/blob/main/CohereForAI/aya-101/results_2024-10-01T16-21-34.420635.json) (note that there might be results for other tasks in the repos if successive evals didn't cover the same tasks. You find each in the results and the "latest" split for each eval): ```python { "all": { "afrimgsm_direct_xho": { "alias": "afrimgsm_direct_xho", "exact_match,remove_whitespace": 0.004, "exact_match_stderr,remove_whitespace": 0.004000000000000003, "exact_match,flexible-extract": 0.044, "exact_match_stderr,flexible-extract": 0.012997373846574952 }, "afrimgsm_direct_zul": { "alias": "afrimgsm_direct_zul", "exact_match,remove_whitespace": 0.0, "exact_match_stderr,remove_whitespace": 0.0, "exact_match,flexible-extract": 0.02, "exact_match_stderr,flexible-extract": 0.008872139507342683 }, "afrimmlu_direct_xho": { "alias": "afrimmlu_direct_xho", "acc,none": 0.316, "acc_stderr,none": 0.020812359515855857, "f1,none": 0.3121412403731796, "f1_stderr,none": "N/A" }, "afrimmlu_direct_zul": { "alias": "afrimmlu_direct_zul", "acc,none": 0.298, "acc_stderr,none": 0.02047511809298895, "f1,none": 0.30077002468766567, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_xho": { "alias": "afrixnli_en_direct_xho", "acc,none": 0.5366666666666666, "acc_stderr,none": 0.020374439597383796, "f1,none": 0.4396227279523235, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_zul": { "alias": "afrixnli_en_direct_zul", "acc,none": 0.5433333333333333, "acc_stderr,none": 0.020352577627018392, "f1,none": 0.4400411624098575, "f1_stderr,none": "N/A" } }, "afrimgsm_direct_xho": { "alias": "afrimgsm_direct_xho", "exact_match,remove_whitespace": 0.004, "exact_match_stderr,remove_whitespace": 0.004000000000000003, "exact_match,flexible-extract": 0.044, "exact_match_stderr,flexible-extract": 0.012997373846574952 }, "afrimgsm_direct_zul": { "alias": "afrimgsm_direct_zul", "exact_match,remove_whitespace": 0.0, "exact_match_stderr,remove_whitespace": 0.0, "exact_match,flexible-extract": 0.02, "exact_match_stderr,flexible-extract": 0.008872139507342683 }, "afrimmlu_direct_xho": { "alias": "afrimmlu_direct_xho", "acc,none": 0.316, "acc_stderr,none": 0.020812359515855857, "f1,none": 0.3121412403731796, "f1_stderr,none": "N/A" }, "afrimmlu_direct_zul": { "alias": "afrimmlu_direct_zul", "acc,none": 0.298, "acc_stderr,none": 0.02047511809298895, "f1,none": 0.30077002468766567, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_xho": { "alias": "afrixnli_en_direct_xho", "acc,none": 0.5366666666666666, "acc_stderr,none": 0.020374439597383796, "f1,none": 0.4396227279523235, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_zul": { "alias": "afrixnli_en_direct_zul", "acc,none": 0.5433333333333333, "acc_stderr,none": 0.020352577627018392, "f1,none": 0.4400411624098575, "f1_stderr,none": "N/A" } } ``` ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** [More Information Needed] - **Funded by [optional]:** [More Information Needed] - **Shared by [optional]:** [More Information Needed] - **Language(s) (NLP):** [More Information Needed] - **License:** [More Information Needed] ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** [More Information Needed] - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses <!-- Address questions around how the dataset is intended to be used. --> ### Direct Use <!-- This section describes suitable use cases for the dataset. --> [More Information Needed] ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> [More Information Needed] ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> [More Information Needed] ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> [More Information Needed] ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> [More Information Needed] #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> [More Information Needed] ### Annotations [optional] <!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. --> #### Annotation process <!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. --> [More Information Needed] #### Who are the annotators? <!-- This section describes the people or systems who created the annotations. --> [More Information Needed] #### Personal and Sensitive Information <!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. --> [More Information Needed] ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> [More Information Needed] ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional] <!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. --> **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional] <!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. --> [More Information Needed] ## More Information [optional] [More Information Needed] ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact [More Information Needed]

# CohereForAI/aya-101模型评估运行的数据集卡片 <!-- 提供数据集的简要概述 --> 该数据集是在模型[CohereForAI/aya-101](https://huggingface.co/CohereForAI/aya-101)的评估运行期间自动创建的。数据集包含5个配置项,每个配置项对应一项被评估的任务。 数据集由2次运行生成。每次运行可在各配置项中作为特定分割找到,分割名称采用运行的时间戳。“train”分割始终指向最新结果。 附加配置“results”存储所有运行的聚合结果。 若要加载某次运行的详情,可参考以下示例代码: python from datasets import load_dataset data = load_dataset( "africa-intelligence/aya101-benchmarking", name="CohereForAI__aya-101__afrimgsm_direct_xho", split="latest" ) ## 最新结果 这些是[2024-10-01T16-21-34.420635运行的最新结果](https://huggingface.co/datasets/africa-intelligence/aya101-benchmarking/blob/main/CohereForAI/aya-101/results_2024-10-01T16-21-34.420635.json)(注意:若连续评估未覆盖相同任务,仓库中可能存在其他任务的结果。可在results配置项及各评估的“latest”分割中找到): python { "all": { "afrimgsm_direct_xho": { "alias": "afrimgsm_direct_xho", "exact_match,remove_whitespace": 0.004, "exact_match_stderr,remove_whitespace": 0.004000000000000003, "exact_match,flexible-extract": 0.044, "exact_match_stderr,flexible-extract": 0.012997373846574952 }, "afrimgsm_direct_zul": { "alias": "afrimgsm_direct_zul", "exact_match,remove_whitespace": 0.0, "exact_match_stderr,remove_whitespace": 0.0, "exact_match,flexible-extract": 0.02, "exact_match_stderr,flexible-extract": 0.008872139507342683 }, "afrimmlu_direct_xho": { "alias": "afrimmlu_direct_xho", "acc,none": 0.316, "acc_stderr,none": 0.020812359515855857, "f1,none": 0.3121412403731796, "f1_stderr,none": "N/A" }, "afrimmlu_direct_zul": { "alias": "afrimmlu_direct_zul", "acc,none": 0.298, "acc_stderr,none": 0.02047511809298895, "f1,none": 0.30077002468766567, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_xho": { "alias": "afrixnli_en_direct_xho", "acc,none": 0.5366666666666666, "acc_stderr,none": 0.020374439597383796, "f1,none": 0.4396227279523235, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_zul": { "alias": "afrixnli_en_direct_zul", "acc,none": 0.5433333333333333, "acc_stderr,none": 0.020352577627018392, "f1,none": 0.4400411624098575, "f1_stderr,none": "N/A" } }, "afrimgsm_direct_xho": { "alias": "afrimgsm_direct_xho", "exact_match,remove_whitespace": 0.004, "exact_match_stderr,remove_whitespace": 0.004000000000000003, "exact_match,flexible-extract": 0.044, "exact_match_stderr,flexible-extract": 0.012997373846574952 }, "afrimgsm_direct_zul": { "alias": "afrimgsm_direct_zul", "exact_match,remove_whitespace": 0.0, "exact_match_stderr,remove_whitespace": 0.0, "exact_match,flexible-extract": 0.02, "exact_match_stderr,flexible-extract": 0.008872139507342683 }, "afrimmlu_direct_xho": { "alias": "afrimmlu_direct_xho", "acc,none": 0.316, "acc_stderr,none": 0.020812359515855857, "f1,none": 0.3121412403731796, "f1_stderr,none": "N/A" }, "afrimmlu_direct_zul": { "alias": "afrimmlu_direct_zul", "acc,none": 0.298, "acc_stderr,none": 0.02047511809298895, "f1,none": 0.30077002468766567, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_xho": { "alias": "afrixnli_en_direct_xho", "acc,none": 0.5366666666666666, "acc_stderr,none": 0.020374439597383796, "f1,none": 0.4396227279523235, "f1_stderr,none": "N/A" }, "afrixnli_en_direct_zul": { "alias": "afrixnli_en_direct_zul", "acc,none": 0.5433333333333333, "acc_stderr,none": 0.020352577627018392, "f1,none": 0.4400411624098575, "f1_stderr,none": "N/A" } } ## 数据集详情 ### 数据集描述 <!-- 提供数据集的详细概述 --> - **策展方:** [信息待补充] - **资助方(可选):** [信息待补充] - **共享方(可选):** [信息待补充] - **语言(自然语言处理):** [信息待补充] - **许可证:** [信息待补充] ### 数据集来源(可选) <!-- 提供数据集的基本链接 --> - **仓库:** [信息待补充] - **论文(可选):** [信息待补充] - **演示(可选):** [信息待补充] ## 用途 <!-- 说明数据集的预期用途 --> ### 直接用途 <!-- 描述数据集的适用场景 --> [信息待补充] ### 超出范围的用途 <!-- 描述误用、恶意使用及数据集不适用的场景 --> [信息待补充] ## 数据集结构 <!-- 描述数据集字段、分割创建标准、数据点间关系等 --> [信息待补充] ## 数据集创建 ### 策展理由 <!-- 创建数据集的动机 --> [信息待补充] ### 源数据 <!-- 描述源数据类型(如新闻文本、社交媒体帖子等) --> #### 数据收集与处理 <!-- 描述数据收集与处理流程(筛选标准、过滤方法、工具等) --> [信息待补充] #### 源数据生产者是谁? <!-- 描述源数据的创建者(人或系统)及可用的人口统计信息 --> [信息待补充] ### 标注(可选) <!-- 若数据集包含非初始收集的标注,描述相关信息 --> #### 标注流程 <!-- 描述标注工具、标注数据量、指南、一致性统计等 --> [信息待补充] #### 标注者是谁? <!-- 描述标注者(人或系统) --> [信息待补充] #### 个人及敏感信息 <!-- 说明数据集是否包含个人/敏感信息,及匿名化处理(若有) --> [信息待补充] ## 偏差、风险与局限性 <!-- 说明技术及社会技术层面的局限性 --> [信息待补充] ### 建议 <!-- 针对偏差、风险及局限性的建议 --> 用户应了解数据集的风险、偏差和局限性。需更多信息以提供进一步建议。 ## 引用(可选) <!-- 若有相关论文或博客,提供APA及BibTeX格式引用 --> **BibTeX:** [信息待补充] **APA:** [信息待补充] ## 术语表(可选) <!-- 解释帮助理解数据集的术语及计算方法 --> [信息待补充] ## 更多信息(可选) [信息待补充] ## 数据集卡片作者(可选) [信息待补充] ## 数据集卡片联系人 [信息待补充]
提供机构:
africa-intelligence
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作