tomg-group-umd/CLRS-Text-train

Name: tomg-group-umd/CLRS-Text-train
Creator: tomg-group-umd
Published: 2024-07-14 14:46:50
License: 暂无描述

Hugging Face2024-07-14 更新2024-07-22 收录

下载链接：

https://hf-mirror.com/datasets/tomg-group-umd/CLRS-Text-train

下载链接

链接失效反馈

官方服务：

资源简介：

CLRS文本训练数据集包含用于CLRS-30文本的训练数据，这些数据是通过GitHub代码生成的。该数据集涵盖了30种算法，包括活动选择、贝尔曼-福特算法、广度优先搜索、二分搜索、桥、冒泡排序、DAG最短路径、深度优先搜索、迪杰斯特拉算法、卡达内最大子数组、弗洛伊德-沃舍尔算法、格雷厄姆扫描、堆排序、插入排序、贾维斯行进、KMP匹配器、最长公共子序列、矩阵链乘法、最小值、克鲁斯卡尔最小生成树、普里姆最小生成树、朴素字符串匹配器、最优二叉搜索树、快速选择、快速排序、线段相交、强连通分量、任务调度和拓扑排序。数据集发布在Apache-2.0许可证下。

The CLRS Text Training Datasets is a dataset containing training data for 30 algorithms, in English, under the Apache-2.0 license. The dataset size is between 1M and 10M, featuring questions, answers, and algorithm names, all as string types. It includes a training set with 2150000 samples, totaling 2150691651 bytes, with a download size of 531486090 bytes.

提供机构：

tomg-group-umd

原始信息汇总

CLRS Text Training Datasets

概述

语言: 英语
许可证: Apache-2.0
数据集大小: 1M<n<10M
数据集名称: CLRS Text Training Datasets

数据集信息

特征

question: 字符串类型
answer: 字符串类型
algo_name: 字符串类型

分割

train:
- 字节数: 2150691651
- 样本数: 2150000

下载与数据集大小

下载大小: 531486090
数据集大小: 2150691651

配置

config_name: default
- data_files:
  - split: train
  - path: data/train-*

算法列表

activity_selector
articulation_points
bellman_ford
bfs
binary_search
bridges
bubble_sort
dag_shortest_paths
dfs
dijkstra
find_maximum_subarray_kadane
floyd_warshall
graham_scan
heapsort
insertion_sort
jarvis_march
kmp_matcher
lcs_length
matrix_chain_order
minimum
mst_kruskal
mst_prim
naive_string_matcher
optimal_bst
quickselect
quicksort
segments_intersect
strongly_connected_components
task_scheduling
topological_sort

5,000+

优质数据集

54 个

任务类型

进入经典数据集