BEE-spoke-data/survivorslib-law-books
收藏Hugging Face2024-10-08 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/BEE-spoke-data/survivorslib-law-books
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: odc-by
size_categories:
- n<1K
task_categories:
- text-generation
- fill-mask
dataset_info:
features:
- name: section
dtype: string
- name: filename
dtype: string
- name: text
dtype: string
splits:
- name: validation
num_bytes: 134490
num_examples: 1
- name: test
num_bytes: 3845881
num_examples: 2
- name: train
num_bytes: 60701376
num_examples: 46
download_size: 64556994
dataset_size: 64681747
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
# law books (nougat-small)
A decent chunk of: https://www.survivorlibrary.com/index.php/8-category/173-library-law
<pre>(ki) <font color="#859900"><b>➜ </b></font><font color="#2AA198"><b>primerdata-for-LLMs</b></font> python push_dataset_from_text.py /home/pszemraj/Dropbox/programming-projects/primerdata-for-LLMs/utils/output-hf-nougat-space/law -e .md -r BEE-spoke-data/survivorslib-law-books
INFO:__main__:Looking for files with extensions: ['md']
Processing md files: 100%|███████████████████████████████| 46/46 [00:00<00:00, 778.32it/s]
INFO:__main__:Found 46 text files.
INFO:__main__:Performing train-test split...
INFO:__main__:Performing validation-test split...
INFO:__main__:Train size: 43
INFO:__main__:Validation size: 1
INFO:__main__:Test size: 2
INFO:__main__:Pushing dataset</pre>
提供机构:
BEE-spoke-data
原始信息汇总
数据集概述
数据集信息
- 特征:
section: 字符串类型filename: 字符串类型text: 字符串类型
- 分割:
train:- 字节数: 73734751.97826087
- 样本数: 43
validation:- 字节数: 1714761.6739130435
- 样本数: 1
test:- 字节数: 3429523.347826087
- 样本数: 2
- 下载大小: 42120770 字节
- 数据集大小: 78879037.00000001 字节
配置
- 配置名称: default
- 数据文件:
train:data/train-*validation:data/validation-*test:data/test-*
- 数据文件:
许可
- 许可: odc-by
任务类别
- 任务类别:
- 文本生成
- 填充掩码
语言
- 语言: 英语
大小类别
- 大小类别: n<1K



